Researchers from the University of Warsaw, Google AI and deepsense.ai take on a new reinforcement learning challenge on Cloud TPU hardware accelerators. The goal of the experiment is to end-to-end train an artificial intelligence to play video games fully inside a computation graph.
A team from the University of Warsaw, made up of Piotr Miłoś, Błażej Osiński and Henryk Michalewski, has started a collaboration on reinforcement learning research with Łukasz Kaiser from the Google Brain team and with researchers from deepsense.ai. This project is connected to a research programme on RL that deepsense.ai started last year.
In the experiment, an artificial intelligence will be end-to-end trained to play video games fully inside a computation graph. Assuming that a game simulator would also be a part of the graph, this could make tasks such as training AI to play video games even faster than what deepsense.ai’s team achieved last year. The intent is to run the training process entirely on Cloud TPUs, which are new machine learning accelerators designed by Google. This will save time previously spent on communication between accelerators and a host computer.
The main experiments are being run on Cloud TPUs via the TensorFlow Research Cloud programme and supported by Google Warsaw’s Antonio Gulli, Ignacy Kowalczyk and Maciej Pytel, who are helping us to deploy our experiments on the Google Cloud Platform. TFRC provides ML researchers with access to second-generation Cloud TPUs, each of which provides 180 teraflops of machine learning acceleration.
Henryk Michalewski, a research team leader on the project, offered his appreciation. “Many thanks to Google for sharing early access to Cloud TPUs with us, as well as to Antonio Gulli and Ignacy Kowalczyk for providing the Google Cloud Platform power to deploy our experiments. With such a strong infrastructure, we’re perfectly equipped to tackle our ambitious goal and leverage the research on reinforcement learning efficiency we started last year.”
In 2017, Michalewski’s team from deepsense.ai published a paper describing a new method they had developed to train a robotic arm to grip a can of coke. Their work was recognised by the General Chairs of the Conference on Robot Learning (CoRL) as one of 11 noteworthy papers in the reinforcement learning and robotics category. The team presented the paper in November at Google’s headquarters in Mountain View. The method could be used, for example, to train humanoid robots to combine single steps into a walk or a run.