Persian word embedding evaluation benchmarks

Zahedi, M. S; Bokaei, M. H Shoeleh, F Yadollahi, M. M Doostmohammadi, E Farhoodi, M Sharif University of Technology

Please enable javascript in your browser.

Persian word embedding evaluation benchmarks

Zahedi, M. S ; Sharif University of Technology | 2018

685 Viewed

Type of Document: Article
DOI: 10.1109/ICEE.2018.8472549
Publisher: Institute of Electrical and Electronics Engineers Inc , 2018
Abstract:
Recently, there has been renewed interest in semantic word representation also called word embedding, in a wide variety of natural language processing tasks requiring sophisticated semantic and syntactic information. The quality of word embedding methods is usually evaluated based on English language benchmarks. Nevertheless, only a few studies analyze word embedding for low resource languages such as Persian. In this paper, we perform such an extensive word embedding evaluation in Persian language based on a set of lexical semantics tasks named analogy, concept categorization, and word semantic relatedness. For these evaluation tasks, we provide three benchmark data sets to show the strengths and weakness of five well-known embedding models which are trained on Wikipedia corpus. The experimental results indicates that FastText (sg) and Word2Vec(cbow) outperform other models. © 2018 IEEE
Keywords:
Evaluation Benchmark ; FastText ; Word2Vec ; Benchmarking ; Natural language processing systems ; Petroleum reservoir evaluation ; Semantics ; GloVe ; Low resource languages ; Semantic relatedness ; Syntactic information ; Word embedding ; Word representations ; Quality control
Source: 26th Iranian Conference on Electrical Engineering, ICEE 2018, 8 May 2018 through 10 May 2018 ; 2018 , Pages 1583-1588 ; 9781538649169 (ISBN)
URL: https://ieeexplore.ieee.org/document/8472549

Friend's email
Your name
Your email
enter code