The project started with an idea to extend graphic editors and animation software to help the users create animations faster. The graphic editor helpers idea is by no means new and has been explored by Witkin (for example, see ”Snakes: Active contour models”) though his implementation is mostly physics-based. We’re aware of the need for an agent exploring the world to integrate user experience with the application. Otherwise, learning on static images (for instance, program screenshots), in typical Machine Learning fashion (1) won’t bring our understanding closer to AI and (2) will be limited in what an application can offer to the user.
In this scenario, the agent’s world is a graphic canvas, a 3D scene. The agent can move objects, draw, delete, and do any command available to the user. Working in a 3D from the beginning is a tough problem. Luckily, some of the tasks are boiled down to a 1D case with an agent moving in a binary world with black and white pixels. In this scenario, we can formulate the task as moving in an infinite Turing tape. The infiniteness is achieved by not bounding the screen to a fixed HxW resolution but rather allowing the agent to move outside of the viable box, in which case the agent’s observations will always be either 1 or 0 depending on how you define black and white pixels.
The problem of predicting the i-th symbol in a Turing tape can be cast to data compression with the latter used to build a higher-order Markov decision process.