RESEARCH DAYS 2016
DataSense Programme, April 12, 14.30, room F3.09
14:30-16:00: DataSense, Session 1
14h30: Isabelle GuyonTitle: Practical teaching of data science: the role of challenges
In the big data era, solving data analytics problems requires a work force of interdisciplinary data scientists leaders that can translate scientific questions into data analysis problems, supervise the execution of these problems, and rigorously analyze the results to form appropriate conclusions. The Mckinsey Global Institute estimates that by 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills and an even larger shortage of as 1.5 million managers/leaders with the know-how to use the analysis of big data to make effective decisions. There are many university programs gearing up to create students with deep analytical skills that can make valuable contribution to research needs. But training individuals to lead projects that produce scientifically rigorous reproducible research results using data is a much more challenging. Challenges offer a way for graduate students to gain mentored practical data science project leadership training and experience.
This spring 30 master students from the big data master of Paris-Saclay, in groups of 5 to 6 people, have been enrolled as competition organizers and created 5 mini-challenges https://sites.google.com/a/chalearn.org/saclay/. Over the course of five weeks, they have learned to formulate a scientific question from data downloaded from the Internet, develop a challenge protocol, select metrics of evaluation, implement baseline software, implement their challenges on the Codalab platform, and prepare advertising material in the form of a short video. Their final products have been used as projects in a class of 60 undergraduate students at UPSud. This innovative educational methodology contrasts with prior uses of challenges in teaching that were limited to challenge solving. Teaching to organize challenges inverts the traditional curriculum. Students work in “consulting company teams” to tackle real world applied analytics problems and not only engage in the full data life cycle (preparation, exploration, modeling, and interpretation), but also learn to formulate new problems they are interested in and induce others to work on them.
15h: Guillaume DoquetTitle: Feature-based Transfer Reinforcement Learning
Abstract: Learning an accurate policy can sometimes be a costly, time-consuming process.
The idea of Transfer Reinforcement Learning is to leverage the policy learned in a source context to facilitate or speed up the learning of an accurate policy in a (sufficiently similar) target context.
The novelty of the proposed Feature-Based Transfer Reinforcement Learning is to i) exploit the trajectories of the source controller;
ii) define a few features characterizing the good trajectories; iii) relate these features to the parametric description of the source
controller. The assumption of "sufficient similarity" between the source and the target tasks enables using the features to navigate
in the parametric policy space and reaching a satisfactory target policy faster.
The validation of the FeaTREL algorithm considers the the ball-in-cup task (so-called "Bilboquet").
The agent, knowing how to play the game with a given rope length, wants to learn how to play with a new rope length, as fast as possible. Promising results are demonstrated.
15h15: Agnes DelabordeTitle: Memory footprints in human robot interaction
Abstract: The authors of the present study propose trails for a Human-Robot Interaction architecture in which the selective diffusion of footprints and logs extracted from the robot's memory would improve the traceability of the robot's internal decision-making, which could for example offer a guarantee of transparency in case of faulty or contentious situations. The description of the proposed architecture is based on the authors' studies on a social Human-Robot Interaction (HRI) system designed in the context of the French robotic project ROMEO (Bpifrance).
The HRI system considered by the authors computes the emotional state of the user (extracted from speech) and updates his/her emotional and interactional profile. From this profile, the robot selects a social attitude (expressed through its utterances).
In the manner of the human memory system, the robotic system deals with a working memory that organizes the input data in a processable way: it produces a vector for each speaking turn composed of the identity of the speaker, the class of the sound (laughter, speech, robot's own voice), the duration of the turn, and emotional data of speech turns. The information is then processed by the memory and added to a pile, but also interpreted so as to get an overall representation of the emotional behaviors of the user. One distinguishes there two levels of memory data: low-level data (the vector) and interpretations performed upon this data. These elements of information are the basis for a decision making, either in terms of action taking (expressing a social attitude) or for an update of the user profile. While still preserving the secrecy of the algorithms behind the decision-making, the inputs and outputs of the latter can be stored in memory and broadcast for a search of liability.
Following this concrete study on an existing HRI system, the authors plan to generalize the types of memory data that could be stored and made available by a social robot, by defining both what can possibly be broadcast, and what could be required by law experts for the determination of liabilities.
This study is carried out in the context of the working group Interactive Robotics ("GT Robotique Interactive"), and the ISN in the framework of the TE2R project (LIDEX Paris-Saclay & ISN, "Traces, Explications et Responsabilité du Robot" – "Footprints, explanation and liability of the robot") in collaboration with the CERDI laboratory (Centre d'Étude et de Recherche en Droit de l'Immatériel).
15h30: Alexandre GramfortTitle: When machines look at neurons: learning from neuroscience time series
Abstract: Understanding how the brain works in healthy and pathological conditions is
considered as one of the challenges for the 21st century. After the first electroencephalography (EEG) measurements in 1929, the 90’s was the birth of modern functional brain imaging with the first functional MRI (fMRI) and full head magnetoencephalography (MEG) system. By offering noninvasively unique insights into the living brain, imaging has revolutionized in the last twenty years both clinical and cognitive neuroscience. More recently, the field of brain imaging and electrophysiology has embraced a new set of tools. Using statistical machine learning new applications have emerged, going from brain computer interaction systems to "mind reading". In this talk, I will briefly explain what the different techniques can offer and show how modern computational and machine learning tools can help uncover neural activations in multivariate time series.
16:30-18:15: DataSense, Session 2
16h30: Dimo BrockhoffTitle: COCO: A Platform for Comparing Continuous Optimizers Effortlessly
Abstract: Numerical optimization problems are at the core of many present industrial design or development tasks with domains as wide as medicine, economy, and engineering. Numerical black-box optimization methods, interpreting a problem as a black-box where the only available information is the obtained function value for some query points, are the methods of choice when models are non-differentiable, non-convex, multimodal, noisy, or too complex to be mathematically tractable. The latter properties appear frequently in particular when the objective function is derived from a numerical simulation of which even the source code might not be available. Algorithm benchmarking is the mandatory but also a tedious (because repetitive) path when designing new optimization algorithms and to recommend efficient and robust algorithms for practical purposes. The Comparing Continuous Optimizers platform (Coco) aims at automatizing this benchmarking in the case of numerical black-box problems. From providing example scripts to conduct benchmarking experiments in various languages, Coco allows to generate plots and tables for performance assessment including statistical tests and the presentation of the results in HTML and LaTeX/PDF format with only a few commands. This frees the time of algorithm designers and practitioners alike who can spend more time on the design of better methods or on the application of the best available approaches for their problems. In this talk, we will see, in a tutorial-like style, how to use the Coco software and how to interpret its main plots when comparing algorithm performance. We will also see benchmarking results that have been collected in recent years for about 140 algorithm variants from various domains including the state-of-the-art methods for unconstrained single-objective optimization such as BFGS, NEWUOA, Nelder-Mead simplex, or CMA-ES. A glimpse on the newly provided bi-objective test suite will conclude the talk.
17h: Aurélie NévéolTitle: Cross-document event coreference in Electronic Health Record
Abstract: References to phenomena ocurring in the world and their temporal caracterization can be found in a variety of natural language utterances. For this reason, temporal analysis is a key issue in natural language processing. This project aims to analyze documents in the field of medicine viz. clinical narratives with a strong temporal and chronological perspective. We develop methods and tools to automatically extract significant medical events linked to relevant temporal information from the Electronic Health Records of patients. We aim to create a temporal line of the patient's medical history by merging the information extracted from multiple documents concerning the patient. This work addresses the complex problem of temporal analysis of clinical documents with an original approach leveraging information from multiple documents. It is guided by linguistic principles and by the application to retrospective analysis of patient care, which shall facilitate event normalization.
17h15: Olivier FerretTitle: Multilingual Semantic Representations WG
Abstract: The semantic representation of words, phrases or even larger units such as sentences or texts has been a central issue in the field of Natural Language Processing (NLP) for a long time and is the focus of the Digicosme Working Group on Multilingual Semantic Representations. One approach for building such representations is to rely on handcrafted symbolic knowledge, typically represented in resources such as lexico-semantic WordNet-like networks. Another approach consists in building semantic representations from corpora, generally following a distributional approach. This second trend, which has been particularly active over the past few years, is central in Deep Learning work concerning the building of the distributed lexical representations called word embeddings. These lexical representations have been demonstrated to be particularly effective in capturing semantic similarities and replacing traditional features in classifiers performing various NLP tasks. In this talk, we will give an overview of the issues raised by these different approaches as they have been considered in the context of the Digicosme Working Group on Multilingual Semantic Representations, for both monolingual and multilingual semantic representations.