In many domains, data is generated at a fast pace. A clear example is the Internet of Things (IoT) applications, where connected sensors yield large amount of data in short periods. To build predictive models from this data, you need to either settle for traditional offline learning or attempt to learn from the data incrementally. A significant setback with the offline learning approach is that it’s slow to react to changes in the domain, and these changes can have a catastrophic impact on the model predictive performance, since the patterns in which the model was trained on are no longer valid. An online approach where the model is trained incrementally can potentially fix this; however, the untold story is that the existing challenges for offline learning are still present (and are even maximized) when processing the data online. These challenges include, but are not limited to, raw data preprocessing, efficient incremental updates to models, algorithms to detect changes and react to them, and dealing with lots of unlabeled and delayed-labeled data.
Heitor Murilo Gomes
I am currently a senior research fellow at the University of Waikato in the machine learning group. My main research area is Machine Learning, specially Evolving Data Streams, Concept Drift, Ensemble methods and Big Data Streams. I contribute to MOA (Java), StreamDM (Spark Streaming) and scikit-multiflow (Python) open data stream mining projects.