Lecture at the IOT Stream Data Mining course
Lecture, Telecom Paris, Paris, France
Lecture at the IOT Stream Data Mining course (Paris, France) as part of the Data and Knowledge 2nd year Master Program of Université Paris Saclay 2019-2020.
Lecture, Telecom Paris, Paris, France
Lecture at the IOT Stream Data Mining course (Paris, France) as part of the Data and Knowledge 2nd year Master Program of Université Paris Saclay 2019-2020.
Seminar, INESC-TEC, Porto, Portugal
Presented the paper “Streaming Random Patches for Evolving Data Stream Classification.”
Talk, Javits Center, New York, USA
In many domains, data is generated at a fast pace. A clear example is the Internet of Things (IoT) applications, where connected sensors yield large amount of data in short periods. To build predictive models from this data, you need to either settle for traditional offline learning or attempt to learn from the data incrementally. A significant setback with the offline learning approach is that it’s slow to react to changes in the domain, and these changes can have a catastrophic impact on the model predictive performance, since the patterns in which the model was trained on are no longer valid. An online approach where the model is trained incrementally can potentially fix this; however, the untold story is that the existing challenges for offline learning are still present (and are even maximized) when processing the data online. These challenges include, but are not limited to, raw data preprocessing, efficient incremental updates to models, algorithms to detect changes and react to them, and dealing with lots of unlabeled and delayed-labeled data.
Talk, ExCel London, London, UK
We present how to build random forest models from streaming data. This is achieved by training, predicting and adapting the model in real-time with evolving data streams. The implementation is on the open source library StreamDM, built on top of Apache Spark.
Talk, Javits Center, New York, USA
The main difference between batch machine learning implementations in Spark (MLlib and Spark ML) and StreamDM is that the latter focus on algorithms that can be trained and adapted incrementally. This can be a huge advantage in some domains as it enables automatically updating the learning models. StreamDM is currently under development by Huawei Noah’s Ark Lab and Télécom ParisTech.
Tutorial, Croke Park Conference Centre, Dublin, Ireland
The volume of data is rapidly increasing due to the development of the technology of information and communication. This data comes mostly in the form of streams. Learning from this ever-growing amount of data requires flexible learning models that self-adapt over time. In addition, these models must take into account many constraints: (pseudo) real-time processing, high-velocity, and dynamic multi-form change such as concept drift and novelty. The tutorial was combined with a workshop on the same topic.
Tutorial, Barra da Tijuca Beach Windsor, Rio de Janeiro, Brazil
The main goal of this tutorial is to introduce attendees to big data stream mining theory and practice. We will use the StreamDM framework to illustrate concepts and also to demonstrate how data stream mining pipelines can be deployed using StreamDM.
Tutorial, ExCel London, London, UK
An overview of StreamDM, a real-time analytics open source software library built on top of Spark Streaming, developed at Huawei’s Noah’s Ark Lab and Télécom ParisTech.
Talk, Harbin Institute of Technology, Shenzhen, China
Invited talk at the Harbin Institute of Technology (HIT) (Shenzhen, China): Ensemble Learning for Evolving Data Streams.