Machine learning for streaming data: Practical insights
In many domains, data is generated at a fast pace. A clear example is the Internet of Things (IoT) applications, where connected sensors yield large amount of data in short periods. To build predictive models from this data, you need to either settle for traditional offline learning or attempt to learn from the data incrementally. A significant setback with the offline learning approach is that it’s slow to react to changes in the domain, and these changes can have a catastrophic impact on the model predictive performance, since the patterns in which the model was trained on are no longer valid. An online approach where the model is trained incrementally can potentially fix this; however, the untold story is that the existing challenges for offline learning are still present (and are even maximized) when processing the data online. These challenges include, but are not limited to, raw data preprocessing, efficient incremental updates to models, algorithms to detect changes and react to them, and dealing with lots of unlabeled and delayed-labeled data.