Loading...

Performance Evaluation and Improvement of Duplicate Question Detection in Developers’ Online Q&A Community

Daliri, Majid | 2021

403 Viewed
  1. Type of Document: M.Sc. Thesis
  2. Language: Farsi
  3. Document No: 54011 (19)
  4. University: Sharif University of Technology
  5. Department: Computer Engineering
  6. Advisor(s): Habibi, Jafar
  7. Abstract:
  8. In this research, we study one of the challenges in the field of software engineering, namely the detection of diplicate questions in Stackoverflow, the Q&A community of programmers. The works done in this area has problems such as complexity and reduced performance over time. The proposed solution is based on machine learning and modern representation learning methods. Representation is done with two approaches, domain specific learning and transfer learning. Fasttext and GloVe, the two word embeddings used in domain specific learning, and in transfer learning, the embedding of the universal sentence encoder has been used. Support vector machine and multilayer perceptron used as classifiers. Classification is based on the degree of semantic similarity in four classes including Duplicate, Direct link, Indirect link and Isolated. This problem is also considered in the reduced state as a binary classification problem with two duplicate and non-duplicate classes. The results show the acceptable performance of both the proposed methods and the relative superiority of the transfer learning method
  9. Keywords:
  10. Transfer Learning ; Deep Learning ; Word Embedding ; Duplicate Question Detection ; Deep Multilayer Perceptron ; Text Processing ; Representation Learning

 Digital Object List

 Bookmark

No TOC