Echo Brains

Echo BrainsEcho BrainsEcho Brains

Echo Brains

Echo BrainsEcho BrainsEcho Brains
  • Home
  • Technology
  • How it works
  • Applications
  • Services
  • Contact
  • Más
    • Home
    • Technology
    • How it works
    • Applications
    • Services
    • Contact
  • Home
  • Technology
  • How it works
  • Applications
  • Services
  • Contact

Learning - What Actors Do

Actor is a compute resource

An actor is a machine resource, like a CPU or GPU, that is running a self-play agent. A self-play agent is an algorithm that is autonomously generating experiences of interaction with the environment. Those experiences are stored in the avatar's memory, to feed the learner.


Interaction with the Environment 


  • Echo receives an observation from the environment
  • Echo processes the observation and decides to execute an action
  • The action somehow influences the environment
  • In the next time step, the environment responds with a new observation and a reward signal
  • Echo processes the observation and reward signal and decides to execute another action.
  • An so on, and so forth, until a condition to stop is met.
  • All in all, the interaction of the avatar with the environment consists on a loop of receiving observations and reward signals from the environment and executing actions with a discrete interval. 


Time Step


The image at the top represents an interaction between the Echo brain  and the Environment at any given time step. Constructing an interaction with the environment implies understanding of the observations and rewards signal the avatar receives from the environment, and the actions it can execute.

Environment


The environment can be anything, and refers to the problem that the avatar must solve. The environment produces observations and rewards. 


For example, if the avatar has to learn how to trade in the stock market, the environment is the financial market. If the avatar has to learn how to optimize the pump scheduling of a water distribution network, then the environment is the water distribution network. If the avatar has to optimize the energy consumption for cooling in a data center, then the environment is the dynamics of the data center. 


In most cases the environment will be stochastic, meaning that there is some aspects of the environment that are uncertain. Echo works well in such environments too, so the avatar learns to adapt its behavior when sudden changes occur in the environment.


Observations


An observation contains information about the environment in a given time step. In Echo, the observation may be partial, meaning that it is also able to learn by partially looking at the environment, where there are hidden information that is not revealed to the avatar.


Rewards


The reward is a signal from the environment that indicates how good the avatar is doing. In some cases, the reward is known only at the end of the experience, and in others it is received at each time step. 


In most cases, it is possible to configure a reward system in which the avatar receives a +1 if a condition is met, and a -1 otherwise, computed by the end of the experience. In other cases, the reward may be computed at every time step. The best reward signal for the problem at hand is designed at the implementation stage.


Actions


For this version of Echo, actions can be anything, but must be discrete actions. It is also possible to implement continuous actions, but that has not been tested yet.


For example, in the Trading Stocks case, the action space can be composed by the actions "do nothing", "buy" and "sell", and they can be executed in no particular order. In this way you allow the avatar to learn how to surf the financial market, so it collects profits regardless, whether the price trend is going up or down. If prices are going up, the avatar executes the action "buy" first, closing the transaction with a "sell" action; this is known as executing a long-position. On the contrary, if prices are going down, the avatar execute first the action "sell", and closes the transaction with action "buy"; this is known as a short-sale. The action "do nothing"  would have no restrictions, unless you want to introduce boundaries to the avatar's behavior.


Interval


Echo receives an observation from the environment with a given interval and makes an evaluation to decide for an action. The interval usually refers to a period of time, for example, 1 second, 1 minute or 1 hour, etc.  Thus, define a constant period of time between time steps. This definition conditions the speed in which observations from the environment are received when the avatar is operating, and therefore, an evaluation is performed to decide for an action. The historical data for learning must come with such interval.


End Condition


Define a condition to stop the loop of interactions with the environment. In general, a condition to stop could be anything, usually specified as reaching a maximum time steps, mixed with other conditions to stop earlier. 


Episode


An episode is a sequence of time steps.

Let's define an episode as a full trajectory of time steps of the avatar interacting with the environment, until a condition to stop has been met. With the above definitions, the avatar is ready to complete episodes, which are sequences of information containing the configuration of a time step (observation, action, reward). 


Experience


An experience is a sequence of episodes. These experiences are autonomously generated by a self-play agent, which is an algorithm that is constantly interacting with the environment.


In most cases, an experience will be composed by two episodes, because of the way Echo is designed to solve complex problems, using the theory of games from Novel Prize-winning mathematician Dr. John Nash, with two players in a zero-sum game, to find out an equilibrium point where the avatar optimizes its performance. 


Memory


The memory is a sequence of experiences. The experiences generated by the self-play agent are stored in the avatar's memory. This memory has a fixed size, large enough to store enough data so the avatar can always pick fresh experiences to learn from. To maintain the maximum number of experiences allowed in memory, old experiences are constantly removed, to make space for the new ones.


Self-Play


A self-play agent is an autonomous generator of experiences that are stored in memory, to feed the learner.


Actors


An actor is a machine that runs a self-play agent to generate experiences. Many actors are used to  generate experiences that are stored in the shared memory. In this way, the memory is constantly renewed so the learner  always has fresh experiences to learn from.


Learner


To improve the performance of the avatar, the learner takes the experiences from memory, processes them and adjust the responses of the deep neural network.


Learner's Objective


In a two agents competition setting, to find an equilibrium point that is the optimum performance, the avatar makes decisions on what actions to execute at every time step, such that by the end of an episode it maximizes the cumulative rewards it receives from the environment.

Echo's Operation

When the avatar has been trained, tested and deployed, Echo is ready for launch. Discover details on Echo avatar once it has been deployed.

Find out more
  • Home
  • Technology
  • How it works
  • Implementation
  • Learning
  • Actors
  • Operation
  • Applications
  • Services
  • Investors
  • Contact

Echo Brains

Copyright © 2023 Echo Brains - All rights reserved.

Este sitio web utiliza cookies

Usamos cookies para analizar el tráfico del sitio web y optimizar tu experiencia en el sitio. Al aceptar nuestro uso de cookies, tus datos se agruparán con los datos de todos los demás usuarios.

Aceptar