Loading...
Search for: data-mining
0.013 seconds
Total 208 records

    The utilization of patients’ information to improve the performance of radiotherapy centers: A data-driven approach

    , Article Computers and Industrial Engineering ; Volume 172 , 2022 ; 03608352 (ISSN) Moradi, S ; Najafi, M ; Mesgari, S ; Zolfagharinia, H ; Sharif University of Technology
    Elsevier Ltd  2022
    Abstract
    The high demand for radiotherapy services, combined with the limited capacity of available resources, patient unpunctuality, and series of appointments, makes Patient Appointment Scheduling (PAS) in radiotherapy centers very challenging. Although most centers use a First-Come-First-Serve (FCFS) policy for appointment scheduling, this approach does not consider patients’ behaviors, and consequently, it performs poorly. This type of inappropriate scheduling usually leads to inefficiency at the center and/or patient dissatisfaction. This study provides a data-driven approach to patient appointment scheduling to deal with the challenges mentioned above, and it considers patients’ histories of... 

    Big (Bio)chemical data mining using chemometric methods: a need for chemists

    , Article Angewandte Chemie (International ed. in English) ; Volume 61, Issue 44 , 2022 , Pages e201801134- ; 15213773 (ISSN) Parastar, H ; Tauler, R ; Sharif University of Technology
    NLM (Medline)  2022
    Abstract
    This Review summarizes how big (bio)chemical data (BBCD) can be analyzed with multivariate chemometric methods and highlights some of the important challenges faced by modern analytical researches. Here, the potential of chemometric methods to solve BBCD problems that are being encountered in chromatographic, spectroscopic and hyperspectral imaging measurements will be discussed, with an emphasis on their applications to omics sciences. In addition, insights and perspectives on how to address the analysis of BBCD are provided along with a discussion of the procedures necessary to obtain more reliable qualitative and quantitative results. In this Review, the importance of "big data" and of... 

    Detecting diseases in medical prescriptions using data mining methods

    , Article BioData Mining ; Volume 15, Issue 1 , 2022 ; 17560381 (ISSN) Nazari Nezhad, S ; Zahedi, M. H ; Farahani, E ; Sharif University of Technology
    BioMed Central Ltd  2022
    Abstract
    Every year, the health of millions of people around the world is compromised by misdiagnosis, which sometimes could even lead to death. In addition, it entails huge financial costs for patients, insurance companies, and governments. Furthermore, many physicians’ professional life is adversely affected by unintended errors in prescribing medication or misdiagnosing a disease. Our aim in this paper is to use data mining methods to find knowledge in a dataset of medical prescriptions that can be effective in improving the diagnostic process. In this study, using 4 single classification algorithms including decision tree, random forest, simple Bayes, and K-nearest neighbors, the disease and its... 

    Graph centrality algorithms for hardware trojan detection at gate-level netlists

    , Article International Journal of Engineering, Transactions A: Basics ; Volume 35, Issue 7 , 2022 , Pages 1375-1387 ; 17281431 (ISSN) Hashemi, M ; Momeni, A ; Pashrashid, A ; Mohammadi, S ; Sharif University of Technology
    Materials and Energy Research Center  2022
    Abstract
    The rapid growth in the supply chain of electronic devices has led companies to purchase Intellectual Property or Integrated Circuits from unreliable sources. This dispersion in the design to fabrication stages of IP/IC has led to new attacks called hardware Trojans. Hardware Trojans can bargain information, reduce performance, or cause failure. Various methods have been introduced to detect or prevent hardware Trojans. Machine learning methods are one of these. Selecting the type and number of input variables in the learning algorithm has an important role in the performance of the learning model. Some previous hardware Trojan detection studies have used structural gate-level features to... 

    Diagnosis of brucellosis disease using data mining: A case study on patients of a hospital in Tehran

    , Article Journal of Microbiological Methods ; Volume 199 , 2022 ; 01677012 (ISSN) Sebt, M. V ; Jafari, S ; Khavaninzadeh, M ; Shavandi, A ; Sharif University of Technology
    Elsevier B.V  2022
    Abstract
    Background: Brucellosis is a common zoonotic infection of humans from livestock. This bacterial infection is acquired from infected animals and their products. The pathogen of this disease is a genus of bacilli called Brucella, and no effective vaccine has been discovered yet for the prevention of human brucellosis. Objectives: The present study is mainly conducted to diagnose brucellosis accurately and timely, using Data Mining techniques. Based on the knowledge discovered with Data Mining and opinions of specialist physicians, this study aims to propose instructions for diagnosing brucellosis. Materials and methods: The dataset used in this study contains 340 samples and is extracted from... 

    Intelligent flight-data-recorders; a step toward a new generation of learning aircraft

    , Article 8th International Conference on Control, Decision and Information Technologies, CoDIT 2022, 17 May 2022 through 20 May 2022 ; 2022 , Pages 1545-1549 ; 9781665496070 (ISBN) Malaek, S. M ; Alipour, E ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2022
    Abstract
    To understand how aerial accidents occur, the installation of flight data recorders (FDR) and cockpit voice recorders (CVR) have become mandatory by law. However, such devices play a passive role and are used once an accident has occurred. Recent advances in machine learning techniques and their application in solving engineering problems are the keys to creating a more active role for both FDR as well as CVR. Here, we investigate a new approach to bringing intelligence to an FDR (called I-FDR). Through a continuous data-mining process, an I-FDR could bring better situational awareness to the flight crew. An I-FDR, similar to the FDR, records all pertinent flying parameters. In addition,... 

    A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection

    , Article Expert Systems with Applications ; Volume 164 , 2021 ; 09574174 (ISSN) Maleki, N ; Zeinali, Y ; Akhavan Niaki, S. T ; Sharif University of Technology
    Elsevier Ltd  2021
    Abstract
    Lung cancer is one of the most common diseases for human beings everywhere throughout the world. Early identification of this disease is the main conceivable approach to enhance the possibility of patients’ survival. In this paper, a k-Nearest-Neighbors technique, for which a genetic algorithm is applied for the efficient feature selection to reduce the dataset dimensions and enhance the classifier pace, is employed for diagnosing the stage of patients’ disease. To improve the accuracy of the proposed algorithm, the best value for k is determined using an experimental procedure. The implementation of the proposed approach on a lung cancer database reveals 100% accuracy. This implies that one... 

    How realistic is static traffic assignment? Analyzing automatic number-plate recognition data and image processing of real-time traffic maps for investigation

    , Article Transportation Research Interdisciplinary Perspectives ; Volume 9 , 2021 ; 25901982 (ISSN) Mirzahossein, H ; Gholampour, I ; Sedghi, M ; Zhu, L ; Sharif University of Technology
    Elsevier Ltd  2021
    Abstract
    Travel demand information in the form of the origin–destination (OD) matrix plays an essential role in studying urban traffic management and network design. The present study takes a novel step toward urban traffic analysis using data mining of processed images of real-time traffic maps as a location-based data model, in which the data were analyzed by software programs such as KNIME and Python workspaces and comparing the results with the conventional traffic assignment results. Thus, we investigated a real-time OD matrix based on the trip-per-vehicle by automatic number-plate recognition (ANPR) cameras for the congestion charge zone (CCZ) of Tehran, Iran. The obtained matrix was assigned... 

    Evaluation and improvement of energy consumption prediction models using principal component analysis based feature reduction

    , Article Journal of Cleaner Production ; Volume 279 , 2021 ; 09596526 (ISSN) Parhizkar, T ; Rafieipour, E ; Parhizkar, A ; Sharif University of Technology
    Elsevier Ltd  2021
    Abstract
    The building sector is a major source of energy consumption and greenhouse gas emissions in urban regions. Several studies have explored energy consumption prediction, and the value of the knowledge extracted is directly related to the quality of the data used. The massive growth in the scale of data affects data quality and poses a challenge to traditional data mining methods, as these methods have difficulties coping with such large amounts of data. Expanded algorithms need to be utilized to improve prediction performance considering the ever-increasing large data sets. In this paper, a preprocessing method to remove noisy features is coupled with predication methods to improve the... 

    Persian sentiment analysis of an online store independent of pre-processing using convolutional neural network with fastText embeddings

    , Article PeerJ Computer Science ; Volume 7 , 2021 , Pages 1-22 ; 23765992 (ISSN) Shumaly, S ; Yazdinejad, M ; Guo, Y ; Sharif University of Technology
    PeerJ Inc  2021
    Abstract
    Sentiment analysis plays a key role in companies, especially stores, and increasing the accuracy in determining customers’ opinions about products assists to maintain their competitive conditions. We intend to analyze the users’ opinions on the website of the most immense online store in Iran; Digikala. However, the Persian language is unstructured which makes the pre-processing stage very difficult and it is the main problem of sentiment analysis in Persian. What exacerbates this problem is the lack of available libraries for Persian pre-processing, while most libraries focus on English. To tackle this, approximately 3 million reviews were gathered in Persian from the Digikala website using... 

    Event classification from the Urdu language text on social media

    , Article PeerJ Computer Science ; Volume 7 , 2021 ; 23765992 (ISSN) Awan, M. D. A ; Kajla, N. I ; Firdous, A ; Husnain, M ; Missen, M. M. S ; Sharif University of Technology
    PeerJ Inc  2021
    Abstract
    The real-time availability of the Internet has engaged millions of users around the world. The usage of regional languages is being preferred for effective and ease of communication that is causing multilingual data on social networks and news channels. People share ideas, opinions, and events that are happening globally i.e., sports, inflation, protest, explosion, and sexual assault, etc. in regional (local) languages on social media. Extraction and classification of events from multilingual data have become bottlenecks because of resource lacking. In this research paper, we presented the event classification task for the Urdu language text existing on social media and the news channels by... 

    A new and optimal architecture for applying association rule mining algorithms on grid

    , Article 2007 IADIS European Conference on Data Mining, DM 2007, part of the 1st IADIS Multi Conference on Computer Science and Information Systems, MCCSIS 2007, 3 July 2007 through 8 July 2007 ; 2020 , Pages 230-232 Bouyer, A ; Arasteh, B ; Karimi, M. B ; Hoseyni, M. J ; Bouyer, A ; Movaghar, A ; Sharif University of Technology
    IADIS Press  2020
    Abstract
    Grid computing is the on-demand sharing of computing resources with in a tightly-coupled network to solve certain problems. One of the main topics is knowledge discovery and data mining. By using grid computing, we can solve this problem. To achieve these very ambitious goals, we present an architecture for applying Association rule mining algorithms on grid environment. Association rule mining seeks to discover associations among transactions encoded in a database on each machine and then send result to related coordinator. This new architecture is powerful and rapid. We tried to compare our method with a serial processing. Our experimental results show that by using the new architecture on... 

    PyGTED: Python application for computing graph traversal edit distance

    , Article Journal of Computational Biology ; Volume 27, Issue 3 , 2020 , Pages 436-439 Ebrahimpour Boroojeny, A ; Shrestha, A ; Sharifi Zarchi, A ; Gallagher, S. R ; Sahinalp, S. C ; Chitsaz, H ; Sharif University of Technology
    Mary Ann Liebert Inc  2020
    Abstract
    Graph Traversal Edit Distance (GTED) is a measure of distance (or dissimilarity) between two graphs introduced. This measure is based on the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. GTED was motivated by and provides the first mathematical formalism for sequence coassembly and de novo variation detection in bioinformatics. Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space,... 

    An online portfolio selection algorithm using clustering approaches and considering transaction costs

    , Article Expert Systems with Applications ; Volume 159 , November , 2020 Khedmati, M ; Azin, P ; Sharif University of Technology
    Elsevier Ltd  2020
    Abstract
    This paper presents an online portfolio selection algorithm based on pattern matching principle where it makes a decision on the optimal portfolio in each period and updates the optimal portfolio at the beginning of each period. The proposed method consists of two steps: i) sample selection, ii) portfolio optimization. First, in the sample selection, clustering algorithms including k-means, k-medoids, spectral and hierarchical clustering are applied to discover time windows (TW) similar to the recent time window. Then, after finding the similar time windows and predicting the market behavior of the next day, the optimum function along with the transaction cost is used in the portfolio... 

    Insights into TripAdvisor's online reviews: The case of Tehran's hotels

    , Article Tourism Management Perspectives ; Volume 34 , April , 2020 Khorsand, R ; Rafiee, M ; Kayvanfar, V ; Sharif University of Technology
    Elsevier B. V  2020
    Abstract
    User-generated data in TripAdvisor.com consists of considerable amount of useful information that can help managers to provide better services to their customers. This study aims to forecast a new user's rate to a hotel based on information of the hotel and user. To do so, all reviews on all hotels of Tehran on TripAdvisor.com as real data are selected and 8 different supervised machine learning models are applied to the data to select the best method including K-nearest neighbors (KNN), Naïve Bayes, decision tree, logistic regression, support vector machine, neural network, random forest, and gradient boosting. KNN algorithm which uses similarity and distance measures for classification is... 

    Mass transfer studies in RDC column by the coupling effects of perforated structure and reactive extraction of Mo(VI) and W(VI) from sulfate solution

    , Article International Communications in Heat and Mass Transfer ; Volume 118 , 2020 Shakib, B ; Torkaman, R ; Torab Mostaedi, M ; Asadollahzadeh, M ; Sharif University of Technology
    Elsevier Ltd  2020
    Abstract
    Solvent extraction of molybdenum (VI) and tungsten (VI) by synergism of the mixture of D2EHPA and TBP were conducted in the RDC extractor with perforated structure. The initial aqueous pH, extractant concentration, ammonium hydroxide concentration as a stripping agent, and synergistic enhancement factor were optimized in the initial experiments before collecting column data. In the RDC column, the influence of agitation rate and inlet phase velocities were examined on the hydrodynamic velocities, mass transfer data, and distribution coefficients. The finding data indicated that the impact of agitation speed on the distribution coefficients of molybdenum and tungsten is considerable in... 

    Sentiment analysis on stock social media for stock price movement prediction

    , Article Engineering Applications of Artificial Intelligence ; Volume 85 , 2019 , Pages 569-578 ; 09521976 (ISSN) Derakhshan, A ; Beigy, H ; Sharif University of Technology
    Elsevier Ltd  2019
    Abstract
    The opinions of other people are an essential piece of information for making informed decisions. With the increase in using the Internet, today the web becomes an excellent source of user's viewpoints in different domains. However, in one hand, the growing volume of opinionated text and on the other hand, complexity caused by contrast in user opinion, makes it almost impossible to read all of these reviews and make an informed decision. These requirements have encouraged a new line of research on mining user reviews, which is called opinion mining. User's viewpoints could change during the time, and this is an important issue for companies. One of the most challenging problems of opinion... 

    Extraction of surface curvatures from tool path data and prediction of cutting forces in the finish milling of sculptured surfaces

    , Article Journal of Manufacturing Processes ; Volume 45 , 2019 , Pages 273-289 ; 15266125 (ISSN) Ghorbani, M ; Movahhedy, M. R ; Sharif University of Technology
    Elsevier Ltd  2019
    Abstract
    To achieve higher accuracy in the prediction of cutting forces in finish milling process of sculptured parts, the curvatures of in-process workpiece surface are necessary to be taken into consideration in the calculation of cutter-workpiece engagement boundaries. These curvatures, however, are not readily available in tool path data, as the main existing source of geometrical information of the process. In this paper, first, a new straightforward algorithm is proposed to extract the principal curvatures and principal directions of the in-process workpiece surface from ball-end finish milling tool path data. These quantities, then, are employed to calculate the boundaries of engagement... 

    Binary classification of imbalanced datasets: The case of CoIL challenge 2000

    , Article Expert Systems with Applications ; Volume 128 , 2019 , Pages 169-186 ; 09574174 (ISSN) Khalilpour Darzi, M. R ; Akhavan Niaki, S. T ; Khedmati, M ; Sharif University of Technology
    Elsevier Ltd  2019
    Abstract
    This paper presents some approaches based on data mining techniques to solve the prediction task of Computational Intelligence and Learning (CoIL) Challenge 2000. The prediction task of the contest is a direct mailing problem and the goal is to improve its response rate. The main issue in this competition is the incompatibility of the dataset in which the distribution of the classes of the target attribute is highly unbalanced. This in turn causes high error rate in identifying the minority class samples. Three different level methods including data-level, algorithm-level, and hybrid method are used to overcome this issue. The specificity, sensitivity, precision-recall, and ROC criteria are... 

    A bi-objective hybrid optimization algorithm to reduce noise and data dimension in diabetes diagnosis using support vector machines

    , Article Expert Systems with Applications ; Volume 127 , 2019 , Pages 47-57 ; 09574174 (ISSN) Alirezaei, M ; Akhavan Niaki, S. T ; Akhavan Niaki, S. A ; Sharif University of Technology
    Elsevier Ltd  2019
    Abstract
    Diabetes mellitus is a medical condition examined by data miners for reasons such as significant health complications in affected people, the economic impact on healthcare networks, and so on. In order to find the main causes of this disease, researchers look into the patient's lifestyle, hereditary information, etc. The goal of data mining in this context is to find patterns that make early detection of the disease and proper treatment easier. Due to the high volume of data involved in therapeutic contexts and disease diagnosis, provision of the intended treatment method become almost impossible over a short period of time. This justifies the use of pre-processing techniques and data...