Loading...

Clustering Data Streams using Core-Sets

Abdi, Masoud | 2015

876 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 47452 (02)
  4. University: Sharif University of Technology
  5. Department: Mathematical Sciences
  6. Advisor(s): Zarei, Alireza
  7. Abstract:
  8. We design a new algorithm for clustering data streams in any fixed di- mension, that use the framework of core-set to summarize data, in order to reduce the complexity of computation. Clustering is to separate data into distinct sets called clusters, which objects in the same cluster has the most similarity and objects in the different clusters has the least similarity.This problem has many application in the areas such as: machine learning,image processing, financial and stock transactions. Data stream has recently emerged as an important concept because in many applications, data is inherently streaming over a network or the data base is extremely large and sequential access is way faster than random access therefore it is widely used. Many algorithms considered data stream and extended themselves to adapt with data streams.In this thesis we use the notion of core-set and a clustering algorithm to construct an algorithm that present a clustering of data stream with a limited memory storage and the amortized time to insert a new data is of constant order. We also use couple of concept including: minimum volume enclosing ellipsoid(MVEE), maximum volume inscribed ellipsoid(MVIE) to build the algorithm.Theoretical and practical analysis of the presented algorithm show that the algorithm can be efficient in both time and memory management and can represent the structure of the data set that is very important in many applications
  9. Keywords:
  10. Clustering ; Computational Geometry ; Data Stream Clustering

 Digital Object List