DataSense : Data intelligence
Contact : C. Hudelot MICS Laboratory of ECP (Ecole Centrale Paris, CentraleSupélec) Université Paris-Saclay) & B. Thirion (INRIA, Université Paris-Saclay)
DataSense, reference garden
Table of contents
- Task 1: Scalable, expressive and secure tools for large-scale data
- Task 2: Making sense of complex, heterogeneous data
- Task 3: Machine learning : meta-learning and multi-task
- Task 4: Distributed decision making: partially observable dynamic game and multi-objective policy optimization
- Task 5: Interaction and Visualization
- Related PhD thesis funded by DigiCosme
- Other related PhD thesis
Pervasive, overwhelming information is gradually reshaping the way individuals and societies think, learn, decide and interact. The so-called “perils and promises of big data” call for integrated research efforts as they raise unprecedented scientific, ethical and cognitive issues. First, the feasibility of many tasks radically changes when sufficient data is available. Information finding through the Web is a prime example, and so is machine translation: the wealth of multi-lingual corpora enabled a new computational approach to translation, through statistical alignment of text fragments. Second, new goals become reachable, through a smart exploitation of casual data. The early detection of flu outbreaks from the analysis of Google queries offers an example of such opportunistic uses of existing data. Third, the data deluge might bring into question some of our values or practices, such as data privacy and freedom. Likewise, the ability to analyze massive amounts of data provides experimental scientists with unprecedented opportunities; but does this modify the nature of scientific methodology?
DataSense targets five out of the many questions raised by the explosion of data: How to handle larger and larger amounts of data; How to make sense of, learn from and decide with data; How to leverage human expertise in data-intensive tasks. Principled, well-grounded, unfoldable models are needed to harness data and master its ever increasing volume and complexity. These models must accommodate the primary requirements of the digital New World: robustness and effectiveness, e.g., through massive distributed processing including cloud environments (see ComEx), and safety, e.g. in terms of data privacy and access control (see SciLex). Further, data should support the production of new knowledge through the enabling technology of statistical machine learning, benefitting from the strong interactions between ICT and Mathematics on Campus de Saclay. Going one step further, while the production of new knowledge indeed is a goal per se, it is also a means for optimal decision making, at the cross-road again of ICT and Mathematics.
Finally, the bandwidth of interaction between human users and machine-hosted data must increase, requiring significant advances in two regards. On the one hand, visual and non-visual rendering of massive data must be improved to better support human expertise in human-machine interaction and human-human communication. On the other hand, users’ expectations, profiles and capabilities must be modeled to support the social intelligence of the machine.
Partners involved: DAVID, LI-PaRAD, LRI, CEA LIST, LIX, INRIA Saclay.
Partners involved: LRI, LIMSI, CEA LIST, LSV, LIX, INRIA Saclay, LTCI, E3S.
Partners involved: LRI, LIMSI, CEA LIST, INRIA Saclay, L2S, LTCI
Collaborations: Lab. Maths Orsay (P. Massart, P. Pansu); CMLA-Cachan (N. Vayatis), AgroParisTech (A. Cornuéjols), Laboratoire Accélérateur Linéaire, UPSud (B. Kégl).
Task 4: Distributed decision making: partially observable dynamic game and multi-objective policy optimizationContact: Marc Schoenauer (INRIA, Saclay).
Partners involved: LRI, CEA LIST, INRIA Saclay, LTCI, L2S.
Task 5: Interaction and VisualizationContact: M. Beaudouin-Lafon (LRI).
Partners involved: LRI, LIMSI, CEA LIST, INRIA Saclay, LTCI