An actor is a machine resource, like a CPU or GPU, that is running a self-play agent. A self-play agent is an algorithm that is autonomously generating experiences of interaction with the environment. Those experiences are stored in the avatar's memory, to feed the learner.
Interaction with the Environment
Time Step
The image at the top represents an interaction between the Echo brain and the Environment at any given time step. Constructing an interaction with the environment implies understanding of the observations and rewards signal the avatar receives from the environment, and the actions it can execute.
Environment
The environment can be anything, and refers to the problem that the avatar must solve. The environment produces observations and rewards.
For example, if the avatar has to learn how to trade in the stock market, the environment is the financial market. If the avatar has to learn how to optimize the pump scheduling of a water distribution network, then the environment is the water distribution network. If the avatar has to optimize the energy consumption for cooling in a data center, then the environment is the dynamics of the data center.
In most cases the environment will be stochastic, meaning that there is some aspects of the environment that are uncertain. Echo works well in such environments too, so the avatar learns to adapt its behavior when sudden changes occur in the environment.
Observations
An observation contains information about the environment in a given time step. In Echo, the observation may be partial, meaning that it is also able to learn by partially looking at the environment, where there are hidden information that is not revealed to the avatar.
Rewards
The reward is a signal from the environment that indicates how good the avatar is doing. In some cases, the reward is known only at the end of the experience, and in others it is received at each time step.
In most cases, it is possible to configure a reward system in which the avatar receives a +1 if a condition is met, and a -1 otherwise, computed by the end of the experience. In other cases, the reward may be computed at every time step. The best reward signal for the problem at hand is designed at the implementation stage.
Actions
For this version of Echo, actions can be anything, but must be discrete actions. It is also possible to implement continuous actions, but that has not been tested yet.
For example, in the Trading Stocks case, the action space can be composed by the actions "do nothing", "buy" and "sell", and they can be executed in no particular order. In this way you allow the avatar to learn how to surf the financial market, so it collects profits regardless, whether the price trend is going up or down. If prices are going up, the avatar executes the action "buy" first, closing the transaction with a "sell" action; this is known as executing a long-position. On the contrary, if prices are going down, the avatar execute first the action "sell", and closes the transaction with action "buy"; this is known as a short-sale. The action "do nothing" would have no restrictions, unless you want to introduce boundaries to the avatar's behavior.
Interval
Echo receives an observation from the environment with a given interval and makes an evaluation to decide for an action. The interval usually refers to a period of time, for example, 1 second, 1 minute or 1 hour, etc. Thus, define a constant period of time between time steps. This definition conditions the speed in which observations from the environment are received when the avatar is operating, and therefore, an evaluation is performed to decide for an action. The historical data for learning must come with such interval.
End Condition
Define a condition to stop the loop of interactions with the environment. In general, a condition to stop could be anything, usually specified as reaching a maximum time steps, mixed with other conditions to stop earlier.
Episode
An episode is a sequence of time steps.
Let's define an episode as a full trajectory of time steps of the avatar interacting with the environment, until a condition to stop has been met. With the above definitions, the avatar is ready to complete episodes, which are sequences of information containing the configuration of a time step (observation, action, reward).
Experience
An experience is a sequence of episodes. These experiences are autonomously generated by a self-play agent, which is an algorithm that is constantly interacting with the environment.
In most cases, an experience will be composed by two episodes, because of the way Echo is designed to solve complex problems, using the theory of games from Novel Prize-winning mathematician Dr. John Nash, with two players in a zero-sum game, to find out an equilibrium point where the avatar optimizes its performance.
Memory
The memory is a sequence of experiences. The experiences generated by the self-play agent are stored in the avatar's memory. This memory has a fixed size, large enough to store enough data so the avatar can always pick fresh experiences to learn from. To maintain the maximum number of experiences allowed in memory, old experiences are constantly removed, to make space for the new ones.
Self-Play
A self-play agent is an autonomous generator of experiences that are stored in memory, to feed the learner.
Actors
An actor is a machine that runs a self-play agent to generate experiences. Many actors are used to generate experiences that are stored in the shared memory. In this way, the memory is constantly renewed so the learner always has fresh experiences to learn from.
Learner
To improve the performance of the avatar, the learner takes the experiences from memory, processes them and adjust the responses of the deep neural network.
Learner's Objective
In a two agents competition setting, to find an equilibrium point that is the optimum performance, the avatar makes decisions on what actions to execute at every time step, such that by the end of an episode it maximizes the cumulative rewards it receives from the environment.
When the avatar has been trained, tested and deployed, Echo is ready for launch. Discover details on Echo avatar once it has been deployed.
Echo Brains
Copyright © 2023 Echo Brains - All rights reserved.
Usamos cookies para analizar el tráfico del sitio web y optimizar tu experiencia en el sitio. Al aceptar nuestro uso de cookies, tus datos se agruparán con los datos de todos los demás usuarios.