Loading...
Search for: data-mining
0.008 seconds
Total 208 records

    Customer Churn Prediction in Telecom Industry

    , M.Sc. Thesis Sharif University of Technology Talebinejad Maimandi, Mojtaba (Author) ; Bagheri, Saeed (Supervisor) ; Namvar, Mehrzad (Co-Supervisor)
    Abstract
    One of the important tasks of the customer relationship management of any company is to control the customer churn rate. Because according to research, retaining existing customers for a company costs less than attracting new customers. One of the most important tasks in order to control customer churn is to anticipate customers who intend to leave the company. This thesis proposes new method for customer churn prediction in mobile telecommunication industry. Mobile operators are structured in such a way that they can record and store a lot of data. In this dissertation, the most appropriate operator data was selected using data mining methods and various features related to customer churn... 

    A Novel Density-Based Cluster Validity Index in Data Mining

    , M.Sc. Thesis Sharif University of Technology Rahmani, Sajjad (Author) ; Akhavan Niaki, Taghi (Supervisor)
    Abstract
    Ⅾue to the absenⅽe of the ⅼabeⅼs or so−ⅽaⅼⅼeⅾ target variabⅼe، ⅽⅼustering vaⅼiⅾation، ⅾespite ⅽⅼassifiⅽation, is not that straightforward. So، the ⅽⅼuster evaⅼuation is a ⅽhaⅼⅼenging task both in researⅽh projeⅽts anⅾ appⅼiⅽations. Whiⅼe ⅿany ⅽⅼustering vaⅼiⅾity inⅾiⅽes are aⅾⅾresseⅾ in the ⅼiterature, ⅿost of theⅿ, even those wiⅾeⅼy useⅾ in the appⅼiⅽation, ⅽannot hanⅾⅼe arbitrary shapes. In this paper, a noveⅼ ⅽⅼustering vaⅼiⅾity inⅾex is proposeⅾ, whiⅽh is ⅿuⅽh ⅿore powerfuⅼ in ⅽapturing the ⅾata’s reaⅼ struⅽture anⅾ ⅾeaⅼing with arbitrary shapes. An aⅼⅿost noveⅼ separation ⅿeasure is proposeⅾ to represent the signifiⅽant or insignifiⅽant separation regarⅾing the ⅽⅼuster’s struⅽture... 

    Customer Churn Prediction in the Iran Insurance Industry

    , M.Sc. Thesis Sharif University of Technology Etemad Hosseini, Amir Hossein (Author) ; Aslani, Shirin (Supervisor) ; Arian, Hamid Reza (Supervisor)
    Abstract
    Insurance companies in Iran operate in a completely competitive and dynamic environment. Because customer acquisition in these companies is significantly more expensive than customer retention, with timely forecasting of churning customers, they can manage their customers more effectively. In this study, in order to predict customer churn in the insurance industry, the data of one of the Iranian insurance companies that has more than two million insurers were used. In order to identify important data and variables, previous studies were reviewed, and on the other hand, the Central Insurance Regulations of the Islamic Republic of Iran, as well as the information of the insurance contracts of... 

    Detection and Estimation of Key Parameters in Traffic Models Using Data Mining Tools

    , M.Sc. Thesis Sharif University of Technology Moadab, Amir Hossein (Author) ; Khedmati, Majid (Supervisor)
    Abstract
    Nowadays, investigating the factors affecting traffic models from different aspects such as metropolitan planning according to the present conditions can help high-level decision-makers and also, at the micro-level, help the travelers to make appropriate decisions for scheduling affairs, route selection, and vehicle type selection. Given the importance of this topic, a framework will be presented in this study that will evaluate the impact of some identified factors such as travel distance, climate, and urban events, and then all these factors will be presented in mathematical formulas. In the end, based on the model, the travel time will be predicted. In this framework, gene expression... 

    Customer Segmentation and Evaluation Based on Loyalty

    , M.Sc. Thesis Sharif University of Technology Khatti, Zahra (Author) ; Kianfar, Farhad (Supervisor)
    Abstract
    Businesses need to narrow down their segmentation and divide customers into smaller groups because some low-loyalty customers have the potential to become loyal customers. Therefore, we have focused on non-loyal customers. This study proposes a framework in which customers with low loyalty value are examined in four six-month time periods and an overall two-year time period and divided into smaller periods. First, a clustering was performed with K-Means algorithm and RFM model, then the migration pattern of customers from one cluster to another was identified using the sequential pattern technique. The transition of customers and their potentiality was determined and according to these... 

    Evaluation of Data Mining in Salinity Prediction and Evaporation Estimation

    , M.Sc. Thesis Sharif University of Technology Mahjoobi, Emad (Author) ; Agha Mohammad Hossein Tajrishi, Massoud (Supervisor)
    Abstract
    Since physical variables are almost dependent, uncertain and have significant time and spatial changes, nature of hydrological and environmental systems is so complicated, nonlinear and dynamic. Many models have been developed for studying and analyzing various phenomena in these systems. Recently data mining approaches have been used as new methods for modeling of complicated engineering systems. In this study, performance of several data mining tools in analyzing two phenomena in water quality management have been evaluated. These algorithms are Multilayer Perceptron Neural Network (MLP), Radial Basis Function Neural Network (RBFN), Support Vector Machines (SVMs) in the field of Artificial... 

    Stock Price Prediction with Machine Learning Methods by Market and Fundamental Data

    , M.Sc. Thesis Sharif University of Technology Moosaabadi, Hassan (Author) ; Habibi, Jafar (Supervisor)
    Abstract
    With the rapid development of the economy, more people have started investing in the stock market. Predicting price changes can reduce the risk of investing in stocks. Technical data such as price and volume in the stock market is usually used to predict stock prices, and less often other types of data such as market data or fundamental data are used. In this study, we want to determine what impact each of the available data types has on stock prices. For example, data of buy and sell for per capita, capital inflows and outflows for small and large natural and legal investors, information related to the stocks themselves, indicators, fundamental data such as earnings per share (EPS) and... 

    Clustering and outlier detection using isoperimetric number of trees

    , Article Pattern Recognition ; Volume 46, Issue 12 , December , 2013 , Pages 3371-3382 ; 00313203 (ISSN) Daneshgar, A ; Javadi, R ; Shariat Razavi, S. B ; Sharif University of Technology
    2013
    Abstract
    We propose a graph-based data clustering algorithm which is based on exact clustering of a minimum spanning tree in terms of a minimum isoperimetry criteria. We show that our basic clustering algorithm runs in O(nlogn) and with post-processing in almost O(nlogn) (average case) and O(n2) (worst case) time where n is the size of the data-set. It is also shown that our generalized graph model, which also allows the use of potentials at vertices, can be used to extract an extra piece of information related to anomalous data patterns and outliers. In this regard, we propose an algorithm that extracts outliers in parallel to data clustering. We also provide a comparative performance analysis of... 

    Erratum to A linear genetic programming approach for the prediction of solar global radiation

    , Article Neural Computing and Applications ; Volume 23, Issue 3-4 , 2013 , Pages 1205- ; 09410643 (ISSN) Shavandi, H ; Saeidi Ramiyani, S ; Sharif University of Technology
    2013

    New ensemble method for classification of data streams

    , Article 2011 1st International eConference on Computer and Knowledge Engineering, ICCKE 2011, Mashhad, 13 October 2011 through 14 October 2011 ; 2011 , Pages 264-269 ; 9781467357135 (ISBN) Sobhani, P ; Beigy, H ; Sharif University of Technology
    Abstract
    Classification of data streams has become an important area of data mining, as the number of applications facing these challenges increases. In this paper, we propose a new ensemble learning method for data stream classification in presence of concept drift. Our method is capable of detecting changes and adapting to new concepts which appears in the stream  

    A novel method to find appropriate ε for DBSCAN

    , Article Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 24 March 2010 through 26 March 2010 ; Volume 5990 LNAI, Issue PART 1 , 2010 , Pages 93-102 ; 03029743 (ISSN) ; 3642121446 (ISBN) Esmaelnejad, J ; Habibi, J ; Hassas Yeganeh, S ; Sharif University of Technology
    2010
    Abstract
    Clustering is one of the most useful methods of data mining, in which a set of real or abstract objects are categorized into clusters. The DBSCAN clustering method, one of the most famous density based clustering methods, categorizes points in dense areas into same clusters. In DBSCAN a point is said to be dense if the ε-radius circular area around it contains at least MinPts points. To find such dense areas, region queries are fired. Two points are defined as density connected if the distance between them is less than ε and at least one of them is dense. Finally, density connected parts of the data set extracted as clusters. The significant issue of such a method is that its parameters (ε... 

    HNP3: A hierarchical nonparametric point process for modeling content diffusion over social media

    , Article 16th IEEE International Conference on Data Mining, ICDM 2016, 12 December 2016 through 15 December 2016 ; 2017 , Pages 943-948 ; 15504786 (ISSN); 9781509054725 (ISBN) Hosseini, S. A ; Khodadadi, A ; Arabzadeh, A ; Rabiee, H. R ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2017
    Abstract
    This paper introduces a novel framework for modeling temporal events with complex longitudinal dependency that are generated by dependent sources. This framework takes advantage of multidimensional point processes for modeling time of events. The intensity function of the proposed process is a mixture of intensities, and its complexity grows with the complexity of temporal patterns of data. Moreover, it utilizes a hierarchical dependent nonparametric approach to model marks of events. These capabilities allow the proposed model to adapt its temporal and topical complexity according to the complexity of data, which makes it a suitable candidate for real world scenarios. An online inference... 

    A streaming algorithm for 2-center with outliers in high dimensions

    , Article Computational Geometry: Theory and Applications ; Volume 60 , 2017 , Pages 26-36 ; 09257721 (ISSN) Hatami, B ; Zarrabi Zadeh, H ; Sharif University of Technology
    Abstract
    We study the 2-center problem with outliers in high-dimensional data streams. Given a stream of points in arbitrary d dimensions, the goal is to find two congruent balls of minimum radius covering all but at most z points. We present a (1.8+ε)-approximation streaming algorithm, improving over the previous (4+ε)-approximation algorithm available for the problem. The space complexity and update time of our algorithm are poly(d,z,1/ε), independent of the size of the stream. © 2016 Elsevier B.V  

    Application of association rule mining in supplier selection criteria

    , Article World Academy of Science, Engineering and Technology ; Volume 40 , 2009 , Pages 358-362 ; 2010376X (ISSN) Haery, A ; Salmasi, N ; Modarres Yazdi, M ; Iranmanesh, H ; Sharif University of Technology
    2009
    Abstract
    In this paper the application of rule mining in order to review the effective factors on supplier selection is reviewed in the following three sections 1) criteria selecting and information gathering 2) performing association rule mining 3) validation and constituting rule base. Afterwards a few of applications of rule base is explained. Then, a numerical example is presented and analyzed by Clementine software. Some of extracted rules as well as the results are presented at the end  

    Speech accent profiles: Modeling and synthesis

    , Article IEEE Signal Processing Magazine ; Volume 26, Issue 3 , 2009 , Pages 69-74 ; 10535888 (ISSN) Vaseghi, S ; Yan, Q ; Ghorshi, A ; Sharif University of Technology
    2009
    Abstract
    A discussion regarding speech accents will be given while describing a set of statistical signal processing methods for the modeling, analysis, synthesis, and morphing of English language accents. Accent morphing deals with the changing of the accent of a speech to a different accent. Accent itself is a distinctive pattern of pronunciation of speech within a community of people who belong to a national, geographic, or socioeconomic grouping. Then, the signal processing methodology for speech accent processing will be reviewed while the concept of an accent profile has been presented  

    Using web mining in the analysis of housing prices: a case study of tehran

    , Article 5th International Conference on Web Research, ICWR 2019, 24 April 2019 through 25 April 2019 ; 2019 , Pages 55-60 ; 9781728114316 (ISBN) Annamoradnejad, R ; Annamoradnejad, I ; Safarrad, T ; Habibi, J ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2019
    Abstract
    There have been many previous works to determine the determinants of housing prices. All of these works relied on a relatively small set of data, mostly collected with the help of real estate agencies. In this work, we used web mining methods to generate a big, organized dataset from a popular national brokerage website. The dataset contains structural characteristics of more than 139,000 apartments, alongside their location and price. We provided our full dataset for the article, so that other researchers can reproduce our results or conduct further analyses. Using this dataset, we analyzed housing prices of Tehran in order to identify its major determinants. To this aim, we examine the... 

    PyGTED: Python application for computing graph traversal edit distance

    , Article Journal of Computational Biology ; Volume 27, Issue 3 , 2020 , Pages 436-439 Ebrahimpour Boroojeny, A ; Shrestha, A ; Sharifi Zarchi, A ; Gallagher, S. R ; Sahinalp, S. C ; Chitsaz, H ; Sharif University of Technology
    Mary Ann Liebert Inc  2020
    Abstract
    Graph Traversal Edit Distance (GTED) is a measure of distance (or dissimilarity) between two graphs introduced. This measure is based on the minimum edit distance between two strings formed by the edge labels of respective Eulerian traversals of the two graphs. GTED was motivated by and provides the first mathematical formalism for sequence coassembly and de novo variation detection in bioinformatics. Many problems in applied machine learning deal with graphs (also called networks), including social networks, security, web data mining, protein function prediction, and genome informatics. The kernel paradigm beautifully decouples the learning algorithm from the underlying geometric space,... 

    Circluster: storing cluster shapes for clustering

    , Article 2008 4th International IEEE Conference Intelligent Systems, IS 2008, Varna, 6 September 2008 through 8 September 2008 ; Volume 3 , 2008 , Pages 1114-1119 ; 9781424417391 (ISBN) Shirali Shahreza, S ; Hassas Yeganeh, S ; Abolhassani, H ; Habibi, J ; Sharif University of Technology
    2008
    Abstract
    One of the important problems in knowledge discovery from data is clustering. Clustering is the problem of partitioning a set of data using unsupervised techniques. An important characteristic of a clustering technique is the shape of the cluster it can find. Clustering methods which are capable to find simple cluster shapes are usually fast but inaccurate for complex data sets. Ones capable to find complex cluster shapes are usually not fast but accurate. In this paper, we propose a simple clustering technique named circlusters. Circlusters are circles partitioned into different radius sectors. Circlusters can be used to create hybrid approaches with density based or partitioning based... 

    A novel algorithm for using GA in concept weighting for text mining

    , Article WSEAS Transactions on Computers ; Volume 5, Issue 12 , 2006 , Pages 2992-2999 ; 11092750 (ISSN) Zaefarian, R ; Akhgar, B ; Siddiqi, J. I ; Zaefarian, G ; Gruzdz, A ; Ihnatowicz, A ; Sharif University of Technology
    2006
    Abstract
    The importance of good weighting methodology in information retrieval methods - the method that affects the most useful features of a document or query representative - is examined.. Weighting features is the thing that many information retrieval systems are regarding as being of minor importance as compared to find the feature and the experiments are confirming this. There are different methods for the term weighting such as TF*IDF and Information Gain Ratio which have been used in information retrieval systems, the paper provides a brief review of the related literature. This paper explores using GA for concept weighting which is a novel application to the field of text mining It proposes... 

    Enhanced cross-diamond-hexagonal search algorithms for fast block motion estimation

    , Article IEEE Conference on Advanced Video and Signal Based Surveillance, AVSS 2005, Como, 15 September 2005 through 16 September 2005 ; Volume 2005 , 2005 , Pages 558-563 ; 0780393856 (ISBN); 9780780393851 (ISBN) Moradi, A ; Dianat, R ; Kasaei, S ; Manzuri Shalmani, M. T ; Sharif University of Technology
    2005
    Abstract
    This paper proposes two enhanced cross-diamond-hexagonal search algorithms to solve the motion-estimation problem in video coding. These algorithms differ from each other by their second step search only, and both of them employ cross-shaped pattern in first step. Proposed method is an improvement over CDHS, eliminating some checking points of CDHS algorithm. Experimental results show that the proposed methods perform faster than the diamond search (DS) and CDHS, whereas similar quality is preserved. © 2005 IEEE