Using virtual data to solve real problems

It seems one of the next frontier in advancing machine learning is in data generation.

It is no secret that deep learning has been advancing object classification, language translation, speech recognition, and virtually any other task where large amounts of labeled data are available or can be easily collected. However, not all tasks can rejoice from such immediate availability of labeled data, and for some of them, such data could be almost impossible to acquire (like for example, imagine annotating every single pixel in every frame of a video, indicating to which object in a scene this pixel is related to, such as a lamp, a ball, or a road – and imagine repeating this process for thousands or even millions of videos).

Maybe it might seem surprising to some not in the field, but actually one of the most common ways to obtain labeled data for training machine learning systems currently is to use Amazon’s Mechanical Turk, a platform where human workers get paid to manual label training data for computers. Needless to say, using human labor is not really (nor should be) exactly cheap.

On the other hand, if there is anything that computers really excel at, is at replacing humans in the most repetitive and boring tasks. But wait. If this is the case, then… Are there ways in which we could use computers to generate the training data needed to train other computers?

Well, the answer is yes. And while techniques for doing so have been known for a long time, it has been receiving an ever increasing interest over the past years, and became one the recurring topics during the talks at the Thirtieth Annual Conference on Neural Information Processing Systems (NIPS2016), one of the most important conferences in artificial intelligence and machine learning that was taking place in Barcelona just a few days from when this post was first published.

Virtual Worlds and Human Actions for Video Understanding

On Wednesday 7th December we presented a demonstration at NIPS2016 on how synthetic human action videos could be generated using game engines, procedural animation, and limited-time physics simulations. To the best of our knowledge, our work represents the first time synthetic, physically-plausible human action videos have been procedurally generated from the ground-up to successfully train vision-based human action recognition systems.