Sharif Digital Repository / Sharif University of Technology / Search result

A computational grammar for Persian based on GPSG

, Article Language Resources and Evaluation ; Volume 45, Issue 4 , May , 2011 , Pages 387-408 ; 1574020X (ISSN) Bahrani, M ; Sameti, H ; Manshadi, M. H ; Sharif University of Technology

Abstract

In this paper, we present our attempts to design and implement a large-coverage computational grammar for the Persian language based on the Generalized Phrase Structured Grammar (GPSG) model. This grammatical model was developed for continuous speech recognition (CSR) applications, but is suitable for other applications that need the syntactic analysis of Persian. In this work, we investigate various syntactic structures relevant to the modern Persian language, and then describe these structures according to a phrase structure model. Noun (N), Verb (V), Adjective (ADJ), Adverb (ADV), and Preposition (P) are considered basic syntactic categories, and X-bar theory is used to define Noun...

Implementation and evaluation of statistical parametric speech synthesis methods for the Persian language

, Article IEEE International Workshop on Machine Learning for Signal Processing, 18 September 2011 through 21 September 2011 ; September , 2011 , Page(s): 1 - 6 ; 9781457716232 (ISBN) Bahaadini, S ; Sameti, H ; Khorram, S ; Sharif University of Technology

2011

Abstract

Scattered and little research in the field of Persian speech synthesis systems has been performed during the last ten years. Comprehensive framework that properly implements and adapts statistical speech synthesis methods for Persian has not been conducted yet. In this paper, recent statistical parametric speech synthesis methods including CLUSTERGEN, traditional HMM-based speech synthesis and its STRAIGHT version, are implemented and adapted for Persian language. CCR test is carried out to compare these methods with each other and with unit selection method. Listeners Score samples based on CMOS. The methods were ranked by averaging the CCR scores. The results show that STRAIGHT-based...

Lessons from building a Persian written corpus: Peykare

, Article Language Resources and Evaluation ; Volume 45, Issue 2 , 2011 , Pages 143-164 ; 1574020X (ISSN) Bijankhan, M ; Sheykhzadegan, J ; Bahrani, M ; Ghayoomi, M ; Sharif University of Technology

2011

Abstract

This paper addresses some of the issues learned during the course of building a written language resource, called 'Peykare', for the contemporary Persian. After defining five linguistic varieties and 24 different registers based on these linguistic varieties, we collected texts for Peykare to do a linguistic analysis, including cross-register differences. For tokenization of Persian, we propose a descriptive generalization to normalize orthographic variations existing in texts. To annotate Peykare, we use EAGLES guidelines which result to have a hierarchy in the part-of-speech tags. To this aim, we apply a semi-automatic approach for the annotation methodology. In the paper, we also give a...

Arabic/Persian text steganography utilizing similar letters with different codes

, Article Arabian Journal for Science and Engineering ; Volume 35, Issue 1 , 2010 , Pages 213-222 ; 13198025 (ISSN) Shirali Shahreza, M. H ; Shirali Shahreza, M ; Sharif University of Technology

Abstract

Sending information secretly and communicating covertly have been of great interest for ages. For this purpose, text documents have been widely used and, consequently, various methods for hiding information in texts (text steganography) have been developed so far. This paper deals with a new method to hide information in Persian (Farsi) and Arabic texts. In the Unicode Standard, there are two different characters for "Ya" and "Kaf". The two characters of and have the same shape but different codes if they are used at the beginning or in the middle of words. As a result, we can hide information in texts, considering the specified data to be hidden and using one of these two characters (for...

A Pool-based active learning method for improving farsi-english machine translation system

, Article 2012 6th International Symposium on Telecommunications, IST 2012 ; 978-146732073-3 Bakhshaei, Somayeh ; Sharif University of Technology

Abstract

In this paper we try to alleviate the problem of scares resources for developing Farsi-English Statistical Machine Translation system (SMT). It is done by applying Active Learning (AL) idea to choose more informative sentences to be translated by a human and then be added to the base-line corpus. While using the human translations is worthless in compare to the other approaches of corpus gathering (like automatic approaches), it is more costly too. So, in this way we can improve the translation system with less cost. This is done in intricate to human translator. Applying Active learning idea to a SMT system, changes it to a system which can improve its based-line corpus by asking for the...

IST 2012

ShEMO: a large-scale validated database for persian speech emotion detection

, Article Language Resources and Evaluation ; 2018 ; 1574020X (ISSN) Nezami, O. M ; Jamshid Lou, P ; Karami, M ; Sharif University of Technology

Springer Netherlands 2018

Abstract

This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 h and 25 min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64% which is interpreted as “substantial agreement”. We also present benchmark results...

ShEMO: a large-scale validated database for Persian speech emotion detection

, Article Language Resources and Evaluation ; Volume 53, Issue 1 , 2019 ; 1574020X (ISSN) Mohamad Nezami, O ; Jamshid Lou, P ; Karami, M ; Sharif University of Technology

Springer Netherlands 2019

Abstract

This paper introduces a large-scale, validated database for Persian called Sharif Emotional Speech Database (ShEMO). The database includes 3000 semi-natural utterances, equivalent to 3 h and 25 min of speech data extracted from online radio plays. The ShEMO covers speech samples of 87 native-Persian speakers for five basic emotions including anger, fear, happiness, sadness and surprise, as well as neutral state. Twelve annotators label the underlying emotional state of utterances and majority voting is used to decide on the final labels. According to the kappa measure, the inter-annotator agreement is 64% which is interpreted as “substantial agreement”. We also present benchmark results...

Building and incorporating language models for Persian continuous speech recognition systems

, Article 5th International Conference on Language Resources and Evaluation, LREC 2006, 22 May 2006 through 28 May 2006 ; 2006 , Pages 2590-2593 Bahrani, M ; Sameti, H ; Hafezi, N ; Movasagh, H ; Sharif University of Technology

European Language Resources Association (ELRA) 2006

Abstract

In this paper building statistical language models for Persian language using a corpus and incorporating them in Persian continuous speech recognition (CSR) system are described We used Persian Text Corpus for building the language models First we preprocessed the texts of corpus by correcting the different orthography of words Also, the number of POS tags was decreased by clustering POS tags manually Then we extracted word based monogram and POS-based bigram and trigram language models from the corpus We also present the procedure of incorporating language models in a Persian CSR system By using the language models 274% reduction m word error rate was achieved in the best case

An Investigation into Reduplication and Lexicalization in Persian

, M.Sc. Thesis Sharif University of Technology Gili, Maryam (Author) ; Eslami, Moharram (Supervisor) ; Khosravizadeh, Parvaneh (Supervisor)

Abstract

This study aims at classifying outputs of Reduplication (Total and Partial Reduplication) in Persian, in order to demonstrate the types and processes which lead to the lexicalization of some reduplicated forms and assign them lexical position in lexicography as individual lexical entries. There are distinct phonological, morphological and semantic evidences behind the lexicalization of some types of reduplicated items in Persian. Approximately about 269 totally reduplicated words and 293 partially reduplicated words have been collected and analyzed according to the said morphological, phonological and semantic considerations. Partial and Total Reduplication process is one of the productive...

محتواي کتاب

Utilizing intelligent segmentation in isolated word recognition using a hybrid HTD-HMM

, Article International Conference on Circuits, Systems, Signal and Telecommunications - Proceedings, 21 October 2010 through 23 October 2010 ; October , 2011 , Pages 42-49 ; 9789604742714 (ISBN) Kazemi, R ; Sereshkeh, A. R ; Ehsandoust, B ; ; Sharif University of Technology

2011

Abstract

Isolated Word Recognition (IWR) is becoming increasingly attractive due to the improvement of speech recognition techniques. However, the accuracy of IWR suffers when large databases or words with similar pronunciation are used. The criterion for accurate speech recognition is suitable segmentation. However, the traditional method of segmentation equal segmentation does not produce the most accurate result. Furthermore, utilizing manual segmentation based on events is not possible in large databases. In this paper, we introduce an intelligent segmentation based on Hierarchical Temporal Decomposition (HTD). Based on this method, a temporal decomposition (TD) algorithm can be used to...

Niusha, the first persian speech-enabled IVR platform

, Article 2010 5th International Symposium on Telecommunications, IST 2010, 4 December 2010 through 6 December 2010, Tehran ; 2010 , Pages 591-595 ; 9781424481835 (ISBN) Bokaei, M. H ; Sameti, H ; Eghbal-Zadeh, H ; BabaAli, B ; Hosseinzadeh, K. H ; Bahrani, M ; Veisi, H ; Sanian, A ; Sharif University of Technology

2010

Abstract

This paper introduces Niusha, the first Persian speech-enabled IVR platform. This platform uses Persian recognizer and Persian text-to-speech synthesizer engines in order to interact with users. The platform is designed in a way that it can simply be customized in various domains and its components are adjustable with new words

The Corrosion Behavior of Sulfur Concrete in Persian Gulf Sea Water

, M.Sc. Thesis Sharif University of Technology Haghipour, Amir (Author) ; Dolati, Abolghasem (Supervisor)

Abstract

In this project sulfur concretes were prepared. The optimum of casting temperature, weight percent of sulfur cement and weight percent of filler were analyzed, so that better corrosion resistance could be found. It is observed, the optimum sulfur is 15% for sulfur concrete samples with 21% sulfur cement; the optimum filler is 10-15% with 18% sulfur cement and finally the optimum filler is 10% for concrete samples with 15% sulfur cement. The casting temperature for concrete with 15-21% sulfur cement is in the interval of 127-150°C could be achieved. By applying the relation between the optimum filler percent and the content of sulfur cement and choosing a suitable temperature for casting...

محتواي پايان نامه

An Abstract Model for Perso-Arabic Typography

, M.Sc. Thesis Sharif University of Technology Esfahbod, Behnam (Author) ; Tabesh, Yahya (Supervisor)

Abstract

In this research, we provide an abstract model for the typography of Persian language (Perso-Arabic script) and study two text corpora based on the Unicode properties of characters and this model. Then we search the corpora for phrases with special properties, based on the abstract model

محتواي پايان نامه

Persian Causative/Inchoative Alternation from an LFG Perspective

, M.Sc. Thesis Sharif University of Technology Jamshidi, Naser (Author) ; Khosravizadeh, Parvaneh (Supervisor) ; Shojaee, Razieh (Supervisor)

Abstract

This study attempts to describe and analyze Persian speakers’ linguistic knowledge of different types of causative and inchoative constructions within the framework of Lexical Functional Grammar (LFG). In order to do so, different studies on Persian causative alternations within theories other than LFG as well as causative alternations in other languages within LFG are reviewed. Next, different types of causative constructions are categorized conceptually and structurally after they are extracted from Bijankhan corpus (2005). Then they are analyzed syntactically and semantically. In this regard, we represent syntactic and semantic structures based on LFG as a monostratal theory. After that,...

محتواي کتاب

An automatic prosodic event detector using MSD HMMs for Persian language

, Article Communications in Computer and Information Science ; Vol. 427, issue , 2014 , p. 234-240 Saleh, F. S ; Shams, B ; Sameti, H ; Khorram, S ; Sharif University of Technology

Abstract

Automatic detection of prosodic events in speech such as detecting the boundaries of Accentual Phrases (APs) and Intonational Phrases (IPs) has been an attractive subject in recent years for speech technologists and linguists. Prosodic events are important for spoken language applications such as speech recognition and translation. Also in order to generate natural speech in text to speech synthesizers, the corpus should be tagged with prosodic events. In this paper, we introduce and implement a prosody recognition system that could automatically label prosodic events and their boundaries at the syllable level in Persian language using a Multi-Space Probability Distribution Hidden Markov...

High capacity Persian/Arabic text steganography

, Article Journal of Applied Sciences ; Volume 8, Issue 22 , 2008 , Pages 4173-4179 ; 18125654 (ISSN) Shirali Shahreza, M ; Shirali Shahreza, S ; Sharif University of Technology

2008

Abstract

One of the methods introduced for establishing hidden communication is steganography. Steganography is the art of hiding information in a cover media without attracting attention. Text documents are one of the common cover media used for steganography in past. Steganography in text is more difficult than other media because there is a little redundant information in text documents. In this study, we propose a high capacity text steganography method. Our method can hide about 400 bits in each kilobyte of text. In Persian and Arabic, each letter can have four different shapes regarding to its position in the word. In this method this feature of Persian and Arabic languages is used for...

Persian/arabic unicode text steganography

, Article 4th International Symposium on Information Assurance and Security, IAS 2008, Napoli, 8 September 2008 through 10 September 2008 ; 2008 , Pages 62-66 ; 9780769533247 (ISBN) Shirali Shahreza, M ; Shirali Shahreza, S ; Sharif University of Technology

2008

Abstract

Sending information secretly and communicating covertly have been of great interest for ages. On the other hand, text documents have been widely used and consequently various methods for hiding information in texts (Text Steganography) have been developed so far. In this paper a new method is proposed to hide information in digital Persian and Arabic Unicode texts. In Persian and Arabic, each letter can have four different shapes regarding to its position in the word. In this method by using this feature of Persian and Arabic languages and the way which documents are saved in the Unicode Standard, the information is hidden in Unicode text documents. This method has a good hiding capacity...

Semantic Clustering of Persian Verbs

, M.Sc. Thesis Sharif University of Technology Aminian, Maryam (Author) ; Sameti, Hossein (Supervisor)

Abstract

Semantic classification of words based on unsupervised learning methods is a challenging issue in computational lexical semantics. The goal of this field of study is to recognize the words that are in the same semantic classes; i.e., can have the same set of arguments. Among all word categories, verb is known as one the most important and is assumed as the central part of the sentence in certain linguistic theories such as case grammar and dependency grammar. Based on Levin’s idea, diathesis alternations and the similarity between these alternations are the clues for the semantic classification of verbs. This idea is verified in languages such as English and German with promising results....

محتواي پايان نامه

Normalization of Non-standard Texts for Persian language Using Neural
Networks

, M.Sc. Thesis Sharif University of Technology Seyyedi, Javad (Author) ; Sameti, Hossein (Supervisor)

Abstract

The purpose of this research is to normalize non-standard persian texts. We proposed a method to transfigure the texts with any non-standard structure into a formal and standard form. One of the major complications of the text normalization is the large variety of non-standard structures, and the fact that these diversities could not be classified in one constructional pattern. Furthermore, the concept of text normalization, in different situations, has multiple different definitions, and any of this settings needs a distinct normalization method. Supervised learning methods are not suitable for normalization due to variety of both standard and non-standard texts as well as the absence of...

محتواي کتاب

Numerical Simulation and Investigation of Impact of Climate Change on Probability Occurrence of Tropical Cyclone in Sea of Oman and Persian Gulf

, M.Sc. Thesis Sharif University of Technology Ilami, Dariush (Author) ; Abbaspour Tehrani Fard, Madjid (Supervisor)

Abstract

Super cyclone GONU formed on June 2007 has been recognized the strongest tropical cyclone in the North Indian Ocean (NIO). The aim of this study is to numerically simulate the cyclone GONU and its resulting waves in the southern seas of Iran leading to the best parametric model to simulate future occurrence of tropical cyclones (TCs) in the NIO. In this regard, the best track data of the cyclone is extracted from Indian Meteorological Department (IMD). The cyclone GONU wind field is simulated by Young and Sobey, Holland, and Rankin models in order to determine the most appropriate model which is then considered for regenerating cyclonic wind fields over the Makran coastline and the sea of...

محتواي کتاب