Sharif Digital Repository / Sharif University of Technology / Search result

Using Partially-Observable Markov Decision Process for Dialogue Management in Spoken Dialogue Systems

, M.Sc. Thesis Sharif University of Technology Rahbar Noudehi, Siavash (Author) ; Sameti, Hossein (Supervisor)

Abstract

The use of Spoken Dialogue Systems is growing everyday and these systems will substitute current Iterative Voice Response systems in near future. A Spoken Dialogue System consists of Speech Recognition, Language Understanding, Dialogue Management, Speech Generation and Text to Speech Modules. Among these modules the only one that is specific part of Dialogue Systems is Dialogue Management. The responsibility of this part is to determine system behavior to maximize specific variables such as user goal finding accuracy and speed of finding the goal. There were different approaches to dialogue management in recent years the use of Partially-Observable Markov Decision Processes was very popular...

محتواي پايان نامه

Automatic Language Model Adaptation

, M.Sc. Thesis Sharif University of Technology Sharif Razavian, Ali (Author) ; Sameti, Hossein (Supervisor)

Abstract

Language Model plays a vital role in speech recognition systems. Restricting the search space, language models combine the result of words recognition phase with the assumptions gained from the rules and structure of words’ syntax (inflection and etc…), resulting into a more appropriate sequence of words as output. Since the language model for a specific language is not static and varies under the influence of different conditions, the problem of matching language model is being posed. In general, the aim of an adaptive language model is to offer a model capable of capturing all possible changes in the structure of words’ syntax, inflection and semantic structures. If the matching process...

محتواي پايان نامه

Spoken Language Understanding in Dialogue System

, M.Sc. Thesis Sharif University of Technology Bokaei, Mohammad Hadi (Author) ; Sameti, Hossein (Supervisor)

Abstract

In contrast to automatic speech recognition (ASR), which converts a speaker’s spoken utterance into a text string, spoken language understanding (SLU) is aimed at interpreting user’s intentions from their speech utterances. Traditionally, this has been accomplished by writing context-free grammars (CFGs) or unification grammars (UGs) manually. The manual grammar authoring process is laborious and expensive, requiring much expertise. In addition, robustness is a vital requirement of these modules, because the input of these modules comes from a speech recognition unit and always contains errors. In recent years, many data-driven models have been proposed for spoken language understanding, but...

محتواي پايان نامه

Learning Dialogue Management in Spoken Dialogue Systems

, M.Sc. Thesis Sharif University of Technology Habibi, Maryam (Author) ; Sameti, Hossein (Supervisor)

Abstract

Applying spoken dialogue systems (SDS's) is growing in the real life more rapidly because of the advances in the design and management of these systems. The traditional touch tone computer telephony systems are being substituted by the SDS's. In a typical SDS, the user speaks naturally to the system through a phone line and the system provides the required information or performs the required action. Banking and ticket reservation are typical examples of the prevalent SDS's. A spoken dialogue system has four units: automatic speech recognition (ASR), natural language understanding (NLU), dialogue management (DM), and spoken language generation (SLG). In this work, the first spoken dialogue...

محتواي پايان نامه

Detecting Speakers in a Telephone Conversation

, M.Sc. Thesis Sharif University of Technology Soltani Farani, Ali (Author) ; Sameti, Hossein (Supervisor)

Abstract

The human speech signal conveys many levels of information ranging from phonetic content to speaker identity and even emotional status. This thesis deals with the task of open-set speaker identification (SI) from an unconstrained telephone conversation between two speakers. The goal is to find at most two speakers among a known set of target speakers that best match the voice samples of the input speech; the input voice samples are not constrained to the target speaker set. The uni-speaker problem is investigated first. The classic GMM-UBM system for text-independent SI and its adapted form are explored. The use of score-space information is advocated as a complementary source to the...

محتواي پايان نامه

Persian Statistical Natural Language Understanding Based on Partially Annotated Corpus

, M.Sc. Thesis Sharif University of Technology Jabbari, Fattaneh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Spoken language understanding unit is one of the most important parts of a spoken dialogue system. The input of this system is the output of speech recognition unit. The main function of this unit is to extract the semantic information from the input utterances. There are two main types of approaches to do this task: rule-based approaches, and data-driven approaches. Today data-driven approaches are of more interest because they are more flexible and robust compared to the rule-based approaches. The main drawback of these methods is that they need a large amount of fully annotated or in some cases Treebank data. Preparing such data is time consuming and expensive. The goal of this thesis is...

محتواي پايان نامه

Semantic Clustering of Persian Verbs

, M.Sc. Thesis Sharif University of Technology Aminian, Maryam (Author) ; Sameti, Hossein (Supervisor)

Abstract

Semantic classification of words based on unsupervised learning methods is a challenging issue in computational lexical semantics. The goal of this field of study is to recognize the words that are in the same semantic classes; i.e., can have the same set of arguments. Among all word categories, verb is known as one the most important and is assumed as the central part of the sentence in certain linguistic theories such as case grammar and dependency grammar. Based on Levin’s idea, diathesis alternations and the similarity between these alternations are the clues for the semantic classification of verbs. This idea is verified in languages such as English and German with promising results....

محتواي پايان نامه

Speaker Adaptation in HMM-Based Persian Speech Synthesis

, M.Sc. Thesis Sharif University of Technology Bahmaninezhad, Fahimeh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Text-to-speech synthesis, one of the key technologies in speech processing, is a technique for generating speech signal from arbitrarily given text with target speaker’s voice characteristics and various speaking styles and emotional expressions. Statistical parametric speech synthesishasrecently been shown to be very effective in generating acceptable synthesized speech. Therefore, in this study,the main focus is on one of the instances of these techniquescalled hidden Markov model-based speech synthesis. In text-to-speech systems, it is desirable to synthesize high quality speech using a small amount of speech data; this goal would be achieved by employing speaker adaptation framework and...

محتواي پايان نامه

High-Performance Keyword Spotting System for Persian Language

, M.Sc. Thesis Sharif University of Technology Ghorbani, Shahram (Author) ; Sameti, Hossein (Supervisor)

Abstract

Keyword spotting with high speed and accuracy is an important subject whithin speech processing domain especially when we are dealing with various transmission channels. In this research discriminative keyword spotting methods are compared with HMM-based approaches. We have employed the discriminative approaches as our baseline methods due to their higher accuracy. The drawback of the conventional discriminative methods is their high computation cost and long execution time. The discriminative approach consists of two steps: feature extraction and classification. We have proposed four ideas to improve the performance of the baseline method. To improve the speed of the process, in feature...

محتواي کتاب

Music Track Detection Using Audio Fingerprinting

, M.Sc. Thesis Sharif University of Technology Yazdanian, Saeed (Author) ; Sameti, Hossein (Supervisor)

Abstract

Music information reterival systems have a lot of applications in music filtering and broadcast monitoring due to the huge amount of multimedia data these days. In these systems the feature extraction method is called audio fingerprinting. Small size of fingerprints allows the systems to search efficiently in thousands or millions numbers of audio songs. The input signal is usually just a couple of seconds long and degraded in several ways. The goal is to design a system which is robust to signal degradations and efficient to search. In this thesis one of the basic systems is reviewed and improved in several ways. This system uses spectrogram of signals to extract features and build an...

محتواي کتاب

Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science (M.Sc.) in Computer Engineering, Artificial Intelligence

, M.Sc. Thesis Sharif University of Technology Hosseini, Mohammad Saleh (Author) ; Sameti, Hossein (Supervisor)

Abstract

Punctuation marks in every language, constitute an important part of a text. Not inserting these punctuations in text, makes the text ambiguous. The output text of automatic speech recognition (ASR) system, is typically a raw sequence of words, containing no punctuation marks. This makes the text difficult or even impossible to make sense of for humans, as well as for any further text processing tasks. The goal of this thesis is to perform automatic punctuation insertion in Persian texts lacking punctuation marks. To the best of our knowledge, this is the first work done in this context for the Persian language. For this purpose, firstly, we assembled a state-of-the-art corpus to train and...

محتواي کتاب

Feature Extraction for Spoofing Detection in Automatic Speaker Verification Systems

, M.Sc. Thesis Sharif University of Technology Adiban, Mohammad (Author) ; Sameti, Hossein (Supervisor)

Abstract

Automatic speaker verification systems are becoming increasingly epidemic by the development of technology. The use of these systems in applications such as smart phones is becoming increasingly popular as a user password, and the security of these systems against spoofing attacks has always been a concern. In this research, the purpose is to comprehensively investigate all types of possible spoofing attacks on speaker verification systems and then provide a countermeasure system against such attacks. The countermeasure system presented in this report, extracts features from speech signals that can make the difference between the genuine and spoofed speech spectrum more prominent by...

محتواي کتاب

Grapheme to Phoneme Conversion using Deep Neural Networks

, M.Sc. Thesis Sharif University of Technology Safari, Arash (Author) ; Sameti, Hossein (Supervisor)

Abstract

The goal of this research is to convert letter to phoneme using deep neural networks. As the deep neural networks are among the best methods for speech and text processing (The highest accuracy in converting text to letter in English language is obtained by using deep neural networks too.), multilayer deep neural networks are used in this research to increase the accuracy. It should be noted that deep neural networks have not been used for converting text to phoneme in Persian language before. In this research a rule based alignment method based on our preposed rule is presented and achieved an accuracy more than 98%. Several approaches for converting word to grapheme with emphasis on the...

محتواي کتاب

Music Emotion Recognition

, M.Sc. Thesis Sharif University of Technology Pouyanfar, Samira (Author) ; Sameti, Hossein (Supervisor)

Abstract

Measuring emotions of music is one of the methods to determine music content. Music emotion detection is applicable in music retrieval, recognition of music genre and also music data management softwares. Music emotion is considered in different sciences such as physiology, psychology, musicology and engineering. First, we collected a database of different types of music with various emotions. These data have been labeled according to their emotions. In this project, four emotions (Angry, happy, relax and sad) have been used as labels based on Thayer’s two dimension emotion model. There are two basic steps for music emotion recognition similar to other recognition systems: Feature extraction...

محتواي پايان نامه

A Persian Dialog System with Sequence to Sequence Learning

, M.Sc. Thesis Sharif University of Technology Ghafourian, Mohammad (Author) ; Sameti, Hossein (Supervisor)

Abstract

Conversation modeling is one of the most important goals in the field of understanding natural language and machine intelligence. Recently, with the enormous growth of the Internet and social networks, the amount of available data on the Web has increased significantly.This makes it possible to use data-driven approaches to solve the modeling problem of conversation.One of the most recent data-driven methods is the sequence to sequence modeling. In this document, after providing the necessary prerequisites, we examined the various models that have used the sequence to sequence approach for conversation modeling. We further examined the ways of improving the efficiency of this modeling...

محتواي کتاب

Design and Improvement of Sequence-level Objective Functions for DNN-based Large Vocabulary Continuous Speech Recognition

, Ph.D. Dissertation Sharif University of Technology Hadian, Hossein (Author) ; Sameti, Hossein (Supervisor)

Abstract

This thesis focuses on the problem of large vocabulary continuous speech recognition (LVCSR).Numerous research results in recent years proved effectiveness of deep neural networks (DNN) for LVCSR. As a result, many methods were proposed to incorporate DNNs in LVCSR. From one perspective we can look at these methods from the viewpoint of objective functions used for training DNNs. A frame-level objective function is one that is defined on frames locally, whereas a sequence-level objective function is defined on whole sequences. Since speech recognition is essentially a sequentional problem, here we focus on designing and imroving sequencelevel objective functions for DNNs. The main proposed...

محتواي کتاب

Introducing a Hybrid Language Model for Improving Performance of Continuous Speech Recognition Systems

, Ph.D. Dissertation Sharif University of Technology Bahrani, Mohammad (Author) ; Sameti, Hossein (Supervisor)

Abstract

The utilizing language model is one of the most effective methods for improving speech recognition performance. For speech recognition applications, several types of language models have been proposed for speech recognition applications that try to model some parts of language information, such as n-gram models, syntactic models, and semantic models. Although n-gram, syntactic and semantic models are able to model different structures that exist in natural language, they each only capture specific linguistic phenomena. None of them can simultaneously take into account all of language phenomena in a unified probabilistic framework. Recently, a number of semantic models called "latent topic...

محتواي پايان نامه

Cross-Lingual Speaker Adaptation for Statistical Parametric Speech Synthesis

, M.Sc. Thesis Sharif University of Technology Saleh, Fatemeh Sadat (Author) ; Sameti, Hossein (Supervisor)

Abstract

Speech synthesis and its applications have been very attractive recently. The main purpose of this technique is to produce a speech signal with natural characteristics of human speech like prosody and emotion. Among all existing methods for speech synthesis, statistical parametric speech synthesis methods are more promising because ofhigher flexibility in comparison to other methods. One of the applications of speech synthesis is speech to speech translation. In these systems, the generated voice in target language should have the same characteristics as the input voice in source language. The main purpose of this research is to review and evaluate the cross lingual speaker adaptation...

محتواي کتاب

Speaker Adaptation in Eigen Voice Space for Statistical Parametric Speech Syntheis

, M.Sc. Thesis Sharif University of Technology Shams, Boshra (Author) ; Sameti, Hossein (Supervisor)

Abstract

Recently various speaker adaptation methods in HMM-based speech synthesis are proposed. The importance of adaptation techniques is that we can design a system in which speech is generated with high quality and target speaker characteristics through limited adaptation data sets.
In this research, we focus on adaptation based on clustering and develop a new and novel method using eigenvoices in order to adapt a new speaker. We employ this approach for the first time in HSMM-based speech synthesis systems and its goal is to reduce the parameters and adaptation data of the system. In our proposed method, first some speaker dependent models are trained. For each model we combine the...

محتواي کتاب

Extractive Meeting Summarization through Discourse Analysis

, Ph.D. Dissertation Sharif University of Technology Bokaei, Mohammad Hadi (Author) ; Sameti, Hossein (Supervisor)

Abstract

Improvement of automatic speech recognition systems and the growth of audio data (such as broadcast news, voice mail, telephony conversations and meetings) have attracted plenty of research interest in the field of speech summarization. The goal of this dissertation is to improve the performance of the speech summarization in the domain of multi-party conversations, specifically meetings. Most of the previous work in this field are inheritted from the text summarization counterpart, whithout paying much attention to the discourse specific information of the multi-party conversations. The main idea of this work is to use discourse information to improve the accuracy of extracted summaries in...

محتواي کتاب