Loading...
Search for: reinforcement-learning
0.008 seconds
Total 151 records

    Modular framework kinematic and fuzzy reward reinforcement learning analysis of a radially symmetric six-legged robot

    , Article Life Science Journal ; Volume 10, Issue SUPPL 8 , 2013 , Pages 120-129 ; 10978135 (ISSN) Shahriari, M ; Osguie, K. G ; Khayyat, A. A. A ; Sharif University of Technology
    2013
    Abstract
    Hexapod robots gives us the ability to study walking robots without facing problems such as stability in many aspects. It has a great deal of flexibility in movement even if a leg becomes malfunctioned. Radially symmetric (hexagonal) hexapods have more flexibility in movement than rectangular leg alignments. Because of symmetry they can move in any direction and time efficiently. Inverse kinematic problem of this kind of hexapods is solved through a modular mobile view considering six degrees of freedom for the trunk. Then typical tripod and wave gaits are analyzed and simulated through the presented formulation. In Reinforcement Learning algorithm for walking it is important how to make... 

    Meta Reinforcement Learning for Domain Generalization

    , M.Sc. Thesis Sharif University of Technology Riyahi Madvar, Maryam (Author) ; Rohban, Mohammad Hossein (Supervisor)
    Abstract
    Deep reinforcement learning has achieved better cumulative rewards than humans in many environments like Atari. One drawback of these methods is their data inefficiency which makes training time-consuming, and in some cases having this amount of data is infeasible. Meta reinforcement learning can use past experiences to enable agents to adapt to new tasks faster and makes neural networks to train in a short amount of time.One of the methods in meta reinforcement learning is inferring tasks which helps exploitation policy to have good performance in new tasks. There’s a need to improve exploration policy as well as exploitation policy by gaining informative transitions about the new task.... 

    Optimising operational cost of a smart energy hub, the reinforcement learning approach

    , Article International Journal of Parallel, Emergent and Distributed Systems ; 2014 ; ISSN: 17445760 Rayati, M ; Sheikhi, A ; Ranjbar, A. M ; Sharif University of Technology
    Abstract
    The concept of smart grid (SG) has been introduced to improve the operation of the power systems. In modern structures of power systems, different reasons prompt researchers to suggest integrated analysis of multi-carrier energy systems. Considering synergy effects of the couplings between different energy carriers and utilising intelligent technologies for monitoring and controlling of energy flow may change energy system management in the future. In this paper, we propose a new solution which is entitled ‘smart energy hub’ (SEH) that models a multi-carrier energy system in a SG. SEH solutions allow homeowners to manage their energy consumption to reduce their electricity and gas bill. We... 

    On the convergence of heterogeneous reinforcement learning private agents to nash equilibrium in a macroeconomic policy game

    , Article Australian Journal of Basic and Applied Sciences ; Volume 5, Issue 7 , 2011 , Pages 491-499 ; 19918178 (ISSN) Hemmati, M ; Nili, M ; Sadati, N ; Sharif University of Technology
    2011
    Abstract
    A repeated inflation-unemployment game within the linear-quadratic frame-work of Barro and Gordon is studied assuming that the government would like to cheat optimally and the finite heterogeneous population of private agents attempts to learn the government's targets using a reinforcement learning algorithm. Private agents are heterogeneous in their initial expectations of inflation rate but are assumed to utilize an identical anticipatory reinforcement learning process, namely Q-learning. In our heterogeneous setting, the only way for the private agents to achieve a zero value for their loss function, is for all of them to correctly anticipate the Nash equilibrium. It is of particular... 

    Application of actor-critic reinforcement learning method for control of a sagittal arm during oscillatory movement

    , Article Biomedical Engineering - Applications, Basis and Communications ; Volume 16, Issue 6 , 2004 , Pages 305-312 ; 10162372 (ISSN) Golkhou, V ; Lucas, C ; Parnianpour, M ; Sharif University of Technology
    Institute of Biomedical Engineering  2004
    Abstract
    Numerous disciplines are engaged in studies involving motor control. In this study, we have used a single link system with a pair of muscles that are excited with alpha and gamma signals to achieve an oscillatory movement with variable amplitude and frequency. The system is highly nonlinear in all its physical and physiological attributes. The major physiological characteristics of this system are simultaneous activation of a pair of nonlinear muscle-like-actuators for control purposes, existence of nonlinear spindle-like sensors and Golgi tendon organ-like sensor, actions of gravity and external loading. Transmission delays are included in the afferent and efferent neural paths to account... 

    Neuromuscular control of sagittal ARM during repetitive movement by actor-critic reinforcement learning method

    , Article Intelligent Automation and Control Trends, Principles, and Applications - International Symposium on Intelligent Automation and Control, ISIAC - Sixth Biannual World Automation Congress, WAC 2004, Seville, 28 June 2004 through 1 July 2004 ; 2004 , Pages 371-376 ; 1889335223 (ISBN) Golkhou, V ; Lucas, C ; Parnianpour, M ; Sharif University of Technology
    2004
    Abstract
    In this study, we have used a single link system with a pair of muscles that are excited with alpha and gamma signals to achieve an oscillatory movement with variable amplitude and frequency. This paper proposes a reinforcement learning method with an Actor-Critic architecture instead of middle and low level of central nervous system (CNS). The Actor in this structure is a two layer feedforward neural network and the Critic is a model of the cerebellum. The Critic is trained by State-Action-Reward-State-Action (SARSA) method. The system showed excellent tracking capability and after 280 epochs the RMS error for position and velocity profiles were 0.02, 0.04 radian and radian/sec,... 

    Active learning method to solve bin packing problems

    , Article Proceedings of the IASTED International Conference on Neural Networks and Computational Intelligence, Grindelwald, 23 February 2004 through 25 February 2004 ; 2004 , Pages 263-268 Lotfi, T ; Shouraki, S. B ; Sharif University of Technology
    2004
    Abstract
    Previous researches have shown the success of using reinforcement learning in solving combinatorial optimization problems. The main idea of these methods is to learn (near) optimal evaluation function to improve local searches and find (near) optimal solutions. Stage algorithm introduced by Boyan & Moore, is one of the most important algorithm in this area. In the other hand fuzzy methods have been used in all fields of science to solve problems but still never used in combinatorial optimization problems. In this paper we focus on Bin Packing Problem. We introduce two basic fuzzy algorithms (ALM and IDS) and then solve our problem with these fuzzy algorithms. We run ALM and IDS algorithms on... 

    Computation offloading strategy for autonomous vehicles

    , Article 27th International Computer Conference, Computer Society of Iran, CSICC 2022, 23 February 2022 through 24 February 2022 ; 2022 ; 9781665480277 (ISBN) Farimani, M. K ; Karimian Aliabadi, S ; Entezari Maleki, R ; Sharif University of Technology
    Institute of Electrical and Electronics Engineers Inc  2022
    Abstract
    Vehicular edge computing is a progressing technology which provides processing resources to the internet of vehicles using the edge servers deployed at roadside units. Vehicles take advantage by offloading their computationintensive tasks to this infrastructure. However, concerning time-sensitive applications and the high mobility of vehicles, cost-efficient task offloading is still a challenge. This paper establishes a computation offloading strategy based on deep Q-learning algorithm for vehicular edge computing networks. To jointly minimize the system cost including offloading failure rate and the total energy consumption of the offloading process, the vehicle tasks offloading problem is... 

    Reinforcement Learning Approach in Self-Assembly Systems to Acquire Desired Structures

    , M.Sc. Thesis Sharif University of Technology Ravari, Amir Hossein (Author) ; Bagheri Shouraki, Saeed (Supervisor)
    Abstract
    Self-Assembly (SA) plays a critical role in the formation of different phenomena in nature. This phenomenon can be defined as an arrangement of meaningful patterns with the aggregate behavior of simpler structures. One of the examples of Self-Assembly can be considered of the formation of ice crystals from ice molecules. Previous works mainly focus on graph grammar and self-assembly in fully observable environments. These algorithms mainly consist of two main stages: first, constructing simpler structures and then joining these simpler structures to form a complex structure. The challenges of the previous work can be considered as the necessity of a central controller in the formation of... 

    Some Model-free Discrete Reinforcement Learning Algorithms

    , M.Sc. Thesis Sharif University of Technology Yousefizadeh, Hossein (Author) ; Daneshgar, Amir (Supervisor)
    Abstract
    In this thesis, we review some methods related to model-free discrete reinforcement learning and their corresponding algorithms. Our main goal is to present existing methods in an integrated and formal setup, without compromising their mathematical accuracy or comprehensibility. We have done our best to fix the inconsistencies existing in notations and definitions appearing in different areas of the vast literature. We discuss dynamic programming methods, including policy iteration and value iteration and temporal difference methods as well as policy-based methods such as policy gradient, advantage actor-critic, TRPO, and PPO. Among value-based methods, we discuss Q-learning and C51 where we... 

    Automatic abstraction in reinforcement learning using ant system algorithm

    , Article AAAI Spring Symposium - Technical Report ; Volume SS-13-05 , 2013 , Pages 9-14 ; 9781577356028 (ISBN) Ghafoorian, M ; Taghizadeh, N ; Beigy, H ; Sharif University of Technology
    2013
    Abstract
    Nowadays developing autonomous systems, which can act in various environments and interactively perform their assigned tasks, are intensively desirable. These systems would be ready to be applied in different fields such as medicine, controller robots and social life. Reinforcement learning is an attractive area of machine learning which addresses these concerns. In large scales, learning performance of an agent can be improved by using hierarchical Reinforcement Learning techniques and temporary extended actions. The higher level of abstraction helps the learning agent approach lifelong learning goals. In this paper a new method is presented for discovering subgoal states and constructing... 

    A new method for discovering subgoals and constructing options in reinforcement learning

    , Article Proceedings of the 5th Indian International Conference on Artificial Intelligence, IICAI 2011 ; 2011 , Pages 441-450 ; 9780972741286 (ISBN) Davoodabadi, M ; Beigy, H ; SIT; Saint Mary's University; EKLaT Research; Infobright ; Sharif University of Technology
    Abstract
    In this paper the problem of automatically discovering subtasks and hierarchies in reinforcement learning is considered. We present a novel method that allows an agent to autonomously discover subgoals and create a hierarchy from actions. Our method identifies subgoals by partitioning local state transition graphs. Options constructed for reaching these subgoals are added to action choices and used for accelerating the Q-Learning algorithm. Experimental results show significant performance improvements, especially in the initial learning phase  

    Real-Time IDS using reinforcement learning

    , Article 2008 2nd International Symposium on Intelligent Information Technology Application, IITA 2008, Shanghai, 21 December 2008 through 22 December 2008 ; Volume 2 , January , 2008 , Pages 593-597 ; 9780769534978 (ISBN) Sagha, H ; Bagheri Shouraki, S ; Hosein, K ; Mahdi, D ; Sharif University of Technology
    2008
    Abstract
    In this paper we proposed a new real-time learning method. The engine of this method is a fuzzy-modeling technique which is called ink drop spread (IDS). IDS method has good convergence and is very simple and away from complex formula. The proposed method uses a reinforcement learning approach by an actor-critic system similar to Generalized Approximate Reasoning based Intelligent Control (GARIC) structure to adapt the IDS by delayed reinforcement signals. Our system uses Temporal Difference (TD) learning to model the behavior of useful actions of a control system. It is shown that the system can adapt itself, commencing with random actions. © 2008 IEEE  

    A Study on Credit Assignment among Reinforcement Learning Agents

    , M.Sc. Thesis Sharif University of Technology Rahaie, Zahra (Author) ; Beigy, Hamid (Supervisor)
    Abstract
    Nowadays, multi-agent systems as part of the distributed artificial intelligence play an important role in modeling and solving complex industrial and commercial problems. They have distinguishing characteristics such as distributiveness (spatial, temporal, semantic, or functional distribution), robustness, parallel processing, etc. One of the capabilities that can be added to this system is the learning capability. It can help the system to adapt itself to the new environment. This paper proposed a method for the problem of credit assignment in multi-agent domain. Solving the multi-agent credit assignment problem, one can expect individual learning for a single agent in systems of... 

    Learning Dialogue Management in Spoken Dialogue Systems

    , M.Sc. Thesis Sharif University of Technology Habibi, Maryam (Author) ; Sameti, Hossein (Supervisor)
    Abstract
    Applying spoken dialogue systems (SDS's) is growing in the real life more rapidly because of the advances in the design and management of these systems. The traditional touch tone computer telephony systems are being substituted by the SDS's. In a typical SDS, the user speaks naturally to the system through a phone line and the system provides the required information or performs the required action. Banking and ticket reservation are typical examples of the prevalent SDS's. A spoken dialogue system has four units: automatic speech recognition (ASR), natural language understanding (NLU), dialogue management (DM), and spoken language generation (SLG). In this work, the first spoken dialogue... 

    Evaluation of NeuroEvolution of Augmenting Topologies in Cooperative Multi-Agent Learning

    , M.Sc. Thesis Sharif University of Technology Iravanian, Sina (Author) ; Mahdavi Amiri, Nezameddin (Supervisor) ; Beigy, Hamid (Supervisor)
    Abstract
    In multi-agent systems (MAS), collective behavior of autonomous agents and complexities arisen by their interactions are studied, while they are exploited to solve real-world complex problems. Machine learning methods are frequently used for problem solving in MAS, because complexities in these systems prevent a programmer to thoroughly describe the agents’ behaviors and the rules governing them.Reinforcement learning (RL) is one of the most commonly used learning methods for intelligent agents,because it does not need a model of the environment and learns agents’ policies through trial and error.Conventional RL algorithms store and update utilities for every possible state in a table. One... 

    A Study on Action Selection Deficits in Schizophrenic Patients Using Computational Modeling in Reniforcement Learning Framework

    , M.Sc. Thesis Sharif University of Technology Aghajari, Sara (Author) ; Vosoughi Vahdat, Bijan (Supervisor) ; Bahrami, Fariba (Supervisor)
    Abstract
    Schizophrenia is a devastating disorder that steals the opportunity of having a normal life from the sufferers. Drugs used in treating this disorder usually target psychotic symptoms and do not affect negative symptoms effectively. One of the deficits falling into this category is the impaired reversal learning. These patients usually learn the rules but do not renounce them after the contingencies of reward have been reversed. To investigate the reasons of this impairment, first a hypothesis of how different parts of the brain communicate during reversal learning is proposed using the physiological evidence. Then considering the known relation between the actor-critic model of the... 

    Thompson Algorithm and Multi-armed Bandit Problem

    , M.Sc. Thesis Sharif University of Technology Farazmand, Siavash (Author) ; Haji Mirsadeghi, Miromid (Supervisor) ; Alishahi, Kasra (Co-Supervisor) ; Zamani, Sadegh (Co-Supervisor)
    Abstract
    The multi-armed bandit problem is a popular model for studying exploration/exploitation trade-off in sequential decision problems. Many algorithms are now available for this well-studied problem. One of the earliest algorithms, given by W. R. Thompson, dates back to 1933. This algorithm,referred to as Thompson Sampling, is a natural Bayesian algorithm. The basic idea is to choose an arm to play according to its probability of being the best arm. Thompson Sampling algorithm has experimentally been shown to be close to optimal. In this dissertation several papers are being reviewed. In these papers it has been shown that Thompson Sampling algorithm achieves logarithmic expected regret for the... 

    Regularization Methods for Improving Data Efficiency in Reinforcement Learning

    , M.Sc. Thesis Sharif University of Technology Ahmadian Shahreza, Hamid Reza (Author) ; Alishahi, Kasra (Supervisor)
    Abstract
    Reinforcement learning is a successful model of learning that has received a lot of attention in recent years and has had significant achievements. However, methods based on reinforcement require a lot of data. Therefore, it is important to find ideas to keep learning at a high level despite the lack of data. Many of these ideas are known as statistical regularity. In this thesis, we study methods to enhance the learning rate, including methods for sharing neural network weights between value function and policy networks. In this thesis we will try to gain a more general understanding of the regularization in reinforcement learning and increase the learning rate by implementing these methods... 

    Expertness framework in multi-agent systems and its application in credit assignment problem

    , Article Intelligent Data Analysis ; Vol. 18, issue. 3 , 2014 , p. 511-528 Rahaie, Z ; Beigy, H ; Sharif University of Technology
    Abstract
    One of the challenging problems in artificial intelligence is credit assignment which simply means distributing the credit among a group, such as a group of agents. We made an attempt to meet this problem with the aid of the reinforcement learning paradigm. In this paper, expertness framework is defined and applied to the multi-agent credit assignment problem. In the expertness framework, the critic agent, who is responsible for distributing credit among agents, is equipped with learning capability, and the proposed credit assignment solution is based on the critic to learn to assign a proportion of the credit to each agent, and the used proportion should be learned by reinforcement...