E-Book

E-Book, Englisch, 276 Seiten

Kumar / Bhardwaj / Bajaj Reinforcement Learning: Foundations and Applications

1. Auflage 2025
ISBN: 978-981-5322-31-6
Verlag: De Gruyter
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)

Häufig gestellte Fragen zu E-Books

E-Book, Englisch, 276 Seiten

ISBN: 978-981-5322-31-6
Verlag: De Gruyter
Format: EPUB
Kopierschutz: Adobe DRM (»Systemvoraussetzungen)

Häufig gestellte Fragen zu E-Books

75,84 €

(inkl. MwSt.)

versandkostenfreie Lieferung
sofort verfügbar

Reinforcement Learning: Foundations and Applications combines rigorous theory with real-world relevance to introduce readers to one of the most influential branches of modern Artificial Intelligence. Walking readers through the essential principles, algorithms, and techniques that define reinforcement learning (RL), the book highlights how RL enables intelligent systems to learn from interaction and optimize decision-making in domains such as robotics, autonomous control, game AI, finance, and healthcare. The book opens with foundational RL concepts, including Markov Decision Processes, dynamic programming, and the exploration-exploitation dilemma. It then progresses to advanced material covering policy gradient methods, actor-critic architectures, deep reinforcement learning models, and multi-agent systems. Dedicated application chapters demonstrate how RL drives adaptive control, sequential decision-making, and practical problem-solving-supported by case studies, diagrams, and algorithm pseudocode. Rich with examples, research insights, and implementation guidance, this book equips readers with both the conceptual understanding and applied perspective needed to master reinforcement learning. Key Features Blends foundational RL theory with practical, application-driven case studies. Explains both model-based and model-free reinforcement learning approaches. Covers cutting-edge methods including Deep Q-Networks, continuous control, and reward shaping. Presents clear diagrams, pseudocode, and implementation notes to support hands-on learning. Highlights current challenges, limitations, and emerging research directions in RL.

Kumar / Bhardwaj / Bajaj Reinforcement Learning: Foundations and Applications jetzt bestellen!

Autoren/Hrsg.

Kumar, Mukesh

Bhardwaj, Vivek

Bajaj, Karan

Mallik, Saurav

Wang, Mingqiang

Fachgebiete

Mathematik | Informatik EDV | Informatik EDV & Informatik Allgemein

Weitere Infos & Material

Leseproben

Exploring the Basics of Reinforcement Learning

Punam Rattan1, *, Ram Krishnan Raji Nair1, Korhan Cengiz2

1 School of Computer Application, Lovely Professional University, Phagwara, Punjab, India

2 Department of Information Technologies, Faculty of Informatics and Management, University of Hradec Kralove, Kralove, Czech Republic

Abstract

Sequential decisions, or decisions that are made repeatedly over time, like the daily stock replenishment decisions made by inventory control, can be optimized using a type of learning model called reinforcement learning. Reinforcement Learning (RL) and human learning are comparable in that people can learn abilities that enhance their performance in challenging tasks, such as test-taking, gymnastics, and swimming. These human action skills often inspire RL. More specifically, in terms of real-world applications, the objective is to determine the best method for managing uncertainty while making successive decisions over time in a dynamic system. A policy is a plan to make future decisions consistently over time in a dynamic system. The purpose of RL was to determine the best way for a dynamic system to function in various conditions. This first chapter has covered the basic ideas behind reinforcement learning. We delve into great detail on the numerous nuances, characteristics, and challenges of reinforcement learning.

Keywords: Artificial intelligence, Deep Q-network, Machine learning, Natural language processing, Reinforcement learning.

* Corresponding author Punam Rattan: School of Computer Application, Lovely Professional University, Phagwara, Punjab, India; E-mail: punamrattan@gmail.com

INTRODUCTION

RL, a branch of Machine Learning (ML), employs trial-and-error methods to maximize the rewards acquired collectively based on the feedback received for individual actions, enabling Artificial Intelligence (AI) based systems to function in a dynamic environment. One subfield of ML is reinforcement learning. To maximize the benefits, it involves acting appropriately in a particular circumstance. Many software programs and gadgets use it to determine the best course of action or behavior in a specific situation. In contrast to supervised learning, which involves training the model with the correct answer pre-existing in the training set, RL involves training the model in the absence of an answer and

using the reinforcement agent's judgment to determine how to accomplish the given task [1]. Even in the absence of a training dataset, it will eventually learn from its encounters.

The idea that a positive reward reinforces an optimal behaviour or action lies at the heart of reinforcement learning. RL algorithms are used by machines and software agents to determine the optimal behaviour based on feedback from the environment. Reinforcement Learning (RL) algorithms have the ability to continuously adapt to their environment over time, depending on the complexity of the task, with the objective of maximizing cumulative rewards. Thus, like the unsteady child, a robot learning to walk through RL will attempt several approaches to reach the goal, receive feedback regarding the effectiveness of those approaches, and then modify until walking is the desired outcome. For example, the robot falls when it takes a large step forward, so it modifies its step to become smaller to determine if that is the key to remaining upright. Through various iterations, it keeps learning and eventually gains the ability to walk. In this instance, standing tall is the prize while falling is the penalty. Optimal actions are encouraged by the robot based on the feedback it receives for its activities [2]. RL algorithms are used to evaluate data and choose the optimal course of action. After each action, the algorithm receives feedback to help it determine if the choice it made was neutral, incorrect, or correct. It is a helpful technique for automated systems that must make numerous small decisions without human intervention [3]. RL is a self-governing, self-teaching system that essentially gains knowledge by error. It acts with the goal of optimizing rewards, or, to put it another way, it learns by doing to get the best outcomes.

RL works in a mathematical framework consisting of the following components as shown in Fig. (1):

Fig. (1))
Reinforcement process [4].

State Space: All available information and problem features that are useful for making a decision. This includes fully known or measured variables (for example, the current levels of stock on hand in inventory control) as well as unmeasured variables for which you might only have a belief or estimate (for example, a forecast of demand for the future day or week).
Action Space: Decisions that one can take in each state of the system.
Reward Signal: A scalar signal that provides the necessary feedback about performance, and, therefore, the opportunity to learn which actions are beneficial in any given state. Learning is both local in its nature, providing immediate as well as long-term gains, because actions taken in any state lead to future states where another action is taken, and so on.

BASIC IDEAS IN REINFORCEMENT LEARNING

The following components are present in every RL problem, as shown in Fig. (2).

Fig. (2))
Components of the reinforcement process [5].

Agent: The program in charge of the object of interest (a robot, for example) is called an agent.
Environment: This is a programming definition of the external world. The environment consists of everything that the agent or agents interact with. It's designed to give the impression that it is a real-world case for the agent [6]. It is necessary to demonstrate an agent's performance, or whether it will function properly in an actual application.
Rewards: This provides us with a score indicating the algorithm's performance in relation to the surroundings. It is shown as either 0 or 1. A “1” indicates that the policy network made the proper decision, a “0” indicates that it made a bad one. Put differently, earnings and losses are represented by rewards.
Policy: The policy is the algorithm that the agent uses to choose what to do. This is the section that may use models or not.

ONLINE VS. OFFLINE LEARNING

An agent can get data for learning rules in two broad ways: online and offline, as shown in Fig. (3).

Fig. (3))
Online vs. offline reinforcement learning

Online: In this case, data are directly collected by an agent through interaction with its environment. Iteratively processing and gathering this data occurs while the agent is still engaging with that environment [7].
Offline: An agent can learn from logged data about an environment even if it does not have direct access to it. This refers to education that is conducted offline. A significant portion of research has focused on offline learning due to the practical challenges associated with directly interacting with surroundings to train models [8].

RL IMPLEMENTATION APPROACHES

RL can be applied in ML in three primary ways as follows:

Value-based: The purpose of the value-based method was to identify the maximum value at a state under all policies. As a result, the agent anticipates the long-term return under policy p at any state or states. The optimum course of action for the largest potential future benefits is determined by this technique, which does not rely on the value function. The agent employs this technique to implement a policy in a manner that maximizes the reward at the end of each stage [9]. Using the policy-based approach, there are two main categories of policies. The same action is produced by every state's policy (p). Probabilistic Probability dictates the generated action in this method.
Model-based, in this method, the agent investigates the environment to gain knowledge about it by building a virtual model of it. Since the model representation varies depending on the environment, there is no set solution or algorithm for this technique.

REINFORCEMENT TYPES

There are two types of reinforcement, namely Positive and Negative.

Positive: The definition of positive reinforcement is when an action that results from a certain behaviour makes the behaviour stronger and more frequent....

Produktsicherheit

Fragen zum Artikel?

Ihre Fragen, Wünsche oder Anmerkungen

Vorname*

Nachname*

Ihre E-Mail-Adresse*

Kundennr.

Ihre Nachricht*

Lediglich mit * gekennzeichnete Felder sind Pflichtfelder.

Wenn Sie die im Kontaktformular eingegebenen Daten durch Klick auf den nachfolgenden Button übersenden, erklären Sie sich damit einverstanden, dass wir Ihr Angaben für die Beantwortung Ihrer Anfrage verwenden. Selbstverständlich werden Ihre Daten vertraulich behandelt und nicht an Dritte weitergegeben. Sie können der Verwendung Ihrer Daten jederzeit widersprechen. Das Datenhandling bei Sack Fachmedien erklären wir Ihnen in unserer Datenschutzerklärung.

75,84 € (inkl. MwSt.)

sofort verfügbar

Webcode: sack.de/khw2c