The following posters have been selected for presentation during the 2020 edition of the school:
Outstanding poster awards
Clonal evolution in colorectal cancer
Natalia Garcia Martin
Simulation of a black hole using deep learning
Safe online bid optimization with ROI constraints
Smartcaching at CMS: towards an AI-based model
Physics-informed Machine Learning simulator
for wildfire propagation
for wildfire propagation
Contrastive learning of cardiac signals
Contrastive learning of cardiac signals
Towards auditability for fairness in DL
Population graph GNNs for brain age prediction
Elderly human activity recognition: Time and frequency domain features analysis
Self-supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis (Seth Siddharth). [Meet link, Jan 12th]
The camera captured human pose is an outcome of several sources of variation. Performance of supervised 3D pose estimation approaches comes at the cost of dispensing with variations, such as shape and appearance, that may be useful for solving other related tasks. As a result, the learned model not only inculcates task-bias but also dataset-bias because of its strong reliance on the annotated samples, which also holds true for weakly-supervised models. Acknowledging this, we propose a self-supervised learning framework to disentangle such variations from unlabeled video frames. We leverage the prior knowledge on the human skeleton and pose in the form of a single part-based 2D puppet model, human pose articulation constraints, and a set of unpaired 3D poses. Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, not only facilitates the discovery of interpretable pose disentanglement but also allows us to operate on videos with diverse camera movements.
Reinforced Knowledge Distillation for Visual Object Tracking (Dunnhofer Matteo). [Meet link, Jan 13th]
Many different principles have been exploited to track a generic object in videos, contributing to numerous visual tracking algorithms. Despite the achieved remarkable results, the visual object tracking community has conceived disjointly specific models for robust tracker fusion, accurate online target adaptation, or fast processing tracking. In this poster, a novel tracking methodology that jointly achieves all such goals is proposed. Our solution takes advantage of existing visual trackers as a source of information. A compact student model is trained via the marriage of knowledge distillation (KD) and reinforcement learning (RL). The former allows to transfer and compress tracking knowledge of existing trackers. The latter enables learning of evaluation measures which are then used to exploit the teachers online. After learning, student and teachers are exploited to design a robust tracking solution which comes with three modalities that can be activated depending on the accuracy/computational resources/speed requirements. Moreover, we show that our framework is suitable to adapt the knowledge of deep regression trackers for new tracking domains, in a weakly-supervised setting. In this case, RL is used to express weak supervision as a scalar application-dependent and temporally-delayed feedback, while KD is employed to guarantee learning stability.
Recent work on self-supervised representation learning employs various pretext tasks during pre-training to learn the structure of the input domain. These methods have yielded performance close to fully supervised methods on a downstream classification task, however their transferability on other downstream tasks involving high-level semantics remains to be explored. In this work, we investigate the representations learned through four state-of-the-art self-supervised learning approaches (AMDIM, SimCLR, and more recent works SWAV and BYOL) by learning a multimodal task. In particular, i) we explore the transferability of these representations for visual grounding, which requires a more fine-grained understanding of the image content and the inter-relation of vision and language semantics; ii) we compare the performance of these approaches with a fully supervised method and provide baselines on localization and cross-modal retrieval task on MS COCO and Flickr30K datasets; and iii) we analyze the features learned by these approaches through deep cross-modal feature inversion.
Combined Color Semantics and Deep Learning for the Automatic Detection of Dolphin Dorsal Fins (Losapio Gianvito). [Meet link, Jan 13th]
Photo-identification of animals is a widely used non-invasive technique in biological studies for identifying single individuals only relying on specific unique visual characteristics. This information is essential to infer knowledge about the spatial distribution, site fidelity, abundance or habitat use of a whole population. Today there is a large demand for algorithms that can help domain experts in the analysis of large image datasets. For this reason, the problem of identify and crop the relevant portion of an image is not negligible in any photo-identification pipeline. Our paper approaches the problem of automatically cropping cetaceans images with a hybrid technique based on domain analysis and deep learning. Domain knowledge is applied for proposing relevant regions with the aim of highlighting the dorsal fins. Later, a binary classification of fin vs. no fin is performed by a custom, resource efficient convolutional neural network that was specifically designed for this purpose. Results obtained on real images demonstrate the feasibility of the proposed approach in the automated process of large datasets of Risso’s dolphins photos, enabling its use on more complex large scale studies. Moreover, the results of this study suggest to extend this methodology to biological investigations of different species.
Poster + Instructions for the live demo here:
Spectral geometric methods have brought revolutionary changes to the field of geometry processing -- however, when the data to be processed exhibits severe partiality, such methods fail to generalize. As a result, there exists a big performance gap between methods dealing with complete shapes, and methods that address the case of missing geometry. In this paper, we propose a possible way to fill this gap. Specifically, we introduce the first method to compute shape compositions without requiring to solve first for a dense correspondence between the given partial shapes. We do so by operating in a purely spectral domain, where we define a union operation between short sequences of eigenvalues. Working with eigenvalues allows to deal with unknown correspondence, different sampling, and different discretization (point clouds and meshes alike), making this operation especially robust and general. Our approach is data-driven, and can generalize to isometric and non-isometric deformations of the surface, as long as these stay within the same semantic class (e.g., human bodies), as well as to partiality artifacts not seen at training time.
Automatic linguistic description of objects, people, and their interactions in an open virtual world (Avram Andrei-Marius). [Meet link, Jan 12th]
The project tries to create a common representation between language and vision that will take the form of a graph, where each node represents an atomic event, linked with other nodes trough actions in space-time. At this moment, we use a game engine to create a controlled environment from which the graph, the videos and the textual descriptions can be generated. Once we generate the dataset, we want to explore the capabilities of a neural network to capture the graph structure from videos and generate the corresponding descriptions
How We Went beyond Word Sense Inventories and Learned to Gloss (Bevilacqua Michele). [Meet link, Jan 12th]
Mainstream computational lexical semantics embraces the assumption that word senses can be represented as discrete items of a predefined inventory. In this paper we show this needs not be the case, and propose a unified model that is able to produce contextually appropriate definitions. In our model, Generationary, we employ a novel span-based encoding scheme which we use to fine-tune an English pre-trained Encoder-Decoder system to generate glosses. We show that, even though we drop the need of choosing from a predefined sense inventory, our model can be employed effectively: not only does Generationary outperform previous approaches in the generative task of Definition Modeling in many settings, but it also matches or surpasses the state of the art in discriminative tasks such as Word Sense Disambiguation and Word-in-Context. Finally, we show that Generationary benefits from training on data from multiple inventories, with strong gains on various zero-shot benchmarks, including a novel dataset of definitions for free adjective-noun phrases.
Predicting responses to survey questions from question text embeddings (Fang Qixiang). [Meet link, Jan 13th]
Social constructs like personal values and political orientation are traditionally measured with survey questions. Well-designed survey questions should thus be able to elicit responses that accurately capture the constructs of interest. Research shows that such responses are influenced by not only the meaning (i.e. the underlying construct of interest) but also the form of the questions (e.g. language style; question length). It has been, however, a challenge to incorporate these two aspects in prediction models of survey responses. It is the goal of our project to look into the possibility of using language models and sentence vectors to tackle this problem. We will focus on three tasks: 1) construction of a corpus of survey questions with which a language model can be pre-trained; 2) representation of survey questions as vectors; 3) prediction of survey question responses.
Predicting Deal Closure in a Sales CRM using Hierarchical Email Embeddings (Gupta Vishal). [Meet link, Jan 13th]
Emails are the most common form of communication in a sale and can be used to actively determine the customer's interest in purchasing a product/service. Statistically, deals with more email replies from the customer are more likely to win. Our project, Deal sentiment at Freshworks as a part of the Freshsales CRM involves predicting sentiment from customers' and agents' mails and using it to estimate the probability of the deal winning. Additionally, we've also explored hierarchical embeddings to deploy sentiment models across accounts from multiple domains and accounts with minimal data.
Are We on the Right Track? The Possibilities and Limitations of Existing Computational Methods for Framing Analysis (Klamm Christopher). [Meet link, Jan 15th]
Reading a news article that deals with a specific topic, represents an individual perspective, and includes a conscious choice of words can have a great impact on our interpretation and perception of the world. The analysis of the use of communication strategies and their impact on our world view is the task of framing analysis, which is a well-established method in social sciences. In recent years, computational framing analysis has become a heterogeneousness sub-domain in NLP, which is trying to address social science related problems and enrich existing methods through NLP. This trend led to a diverse sub-domain with social and computational relevance. In this work, we create a framework for a comprehensive and systematic analysis of existing computational methods for framing analysis. To illustrate the differences, we apply these methods to a heterogeneous set of news articles to show the effects of different framing concepts on framing analysis"
NEO: A Tool for Taxonomy Enrichment with New Emerging Occupations (Seveso Andrea). [Meet link, Jan 16th]
Taxonomies provide a structured representation of semantic relations between lexical terms, acting as the backbone of many applications. This is the case of the online labour market, as the growing use of Online Job Vacancies (OJVs) enables the understanding of how the demand for new professions and skills changes in near-real-time. Therefore, OJVs represent a rich source of information to reshape and keep labour market taxonomies updated to fit the market expectations better. However, manually update taxonomies is time-consuming and error-prone. This inspired NEO, a Web-based tool for automatically enrich the standard occupation and skill taxonomy (ESCO) with new occupation terms extracted from OJVs. NEO - which can be applied to any domain - is framed within the research activity of an EU grant collecting and classifying OJVs over all 27+1 EU Countries. As a contribution, NEO(i) proposes a metric that allows one to measure the pairwise semantic similarity between words in a taxonomy; (ii) suggests new emerging occupations from OJVs along with the most similar concept within the taxonomy, by employing word-embedding algorithms; (iii) proposes GASC measures Generality, Adequacy, Specificity, Comparability) to estimate the adherence of the new occupations to the most suited taxonomic concept, enabling the user to approve the suggestion and to inspect the skill-gap. Our experiments on 2M+ real OJVs collected in the UK in 2018, sustained by a user-study, confirm the usefulness of NEO for supporting the taxonomy enrichment task with emerging jobs. A demo of a deployed instance of NEO is also provided.
Distributed Representations (e.g. those learned by Autoencoders, GNNs and Word2Vec) encode information from images, natural language and other media into vectors, which in turn can be used to derive new information (e.g. word similarity in NLP or link prediction in KGE). Since distributed representations can be used to encodes both entities and concepts, my work is focused on mapping entities representation on their respective concept representation. This mapping operation is inspired by human psychology: e.g. kids will learn the concept of “dog” only after have be seen different dogs and by abstracting common characteristics. My PhD research is focused in design mapping functions (or projections) from entities representation to concept representations and design induction functions in order to obtain a single, valid concept representation starting from different entities representation. Actually on a process which starts from named entity contextual representations and project them into different conceptual representation spaces, derived from ontologies using different approaches and which express different semantics (e.g.hierarchical relation between concepts with hyperbolic graph embedding). The process can be applied to many downstream tasks such as: entity typing, coreference resolution, named entity linking, metaphor detection and table annotation.
Graph Neural Networks
Improving the Aggregation in Graph Networks: can nodes understand their neighbourhood? (Corso Gabriele) [Meet link, Jan 12th]
Graph Neural Networks (GNNs) have been shown to be effective models for different predictive tasks on graph-structured data. This poster will combine the studies on the Principal Neighbourhood Aggregation (NeurIPS 2020) and the Directional Graph Networks (oral at DiffGeo4DL workshop @ NeurIPS 2020). We will examine the expressive power of graph neural networks showing the limitations when it comes to the continuous feature spaces and directional kernels. Each of these will motivate improvements to the aggregation method of GNNs which will lead us to fully generalize CNNs. Empirical results from molecular chemistry and computer vision benchmarks will validate our findings.
Convolutional Neural Networks (CNNs) have been very successful at solving a variety of computer vision tasks such as object classification and detection, semantic segmentation, activity understanding, to name just a few. One key enabling factor for their great performance has been the ability to train very deep networks. Despite their huge success in many tasks, CNNs do not work well with non-Euclidean data, which is prevalent in many real-world applications. Graph Convolutional Networks (GCNs) offer an alternative that allows for non-Eucledian data input to a neural network. While GCNs already achieve encouraging results, they are currently limited to architectures with a relatively small number of layers, primarily due to vanishing gradients during training. This work transfers concepts such as residual/dense connections and dilated convolutions from CNNs to GCNs in order to successfully train very deep GCNs. We show the benefit of using deep GCNs (with as many as 112 layers) experimentally across various datasets and tasks.
Uncertainty Estimation and Generative Models
Automatic Music Arrangement Using Generative Adversarial Networks (Barnabò Giorgio). [Meet link, Jan 12th]
When talking about computer-based music generation, two are the main threads ofresearch: the construction ofautonomous music-making systems, and the design ofcomputer-based environments to assist musicians. However, even though creatingaccompaniments for melodies is an essential part of every producer’s and song-writer’s work, little effort has been done in the field of automatic music arrange-ment in the audio domain. In this contribution, we propose a novel frameworkforautomatic music accompaniment in the Mel-frequency domain. Using sev-eral songs converted into Mel-spectrograms – a two-dimensional time-frequencyrepresentation of audio signals – we were able to automatically generate origi-nal arrangements for both bass and voice lines. Treating music pieces as images(Mel-spectrograms) allowed us to reformulate our problem as anunpaired image-to-image translationproblem, and to tackle it with CycleGAN, a well-establishedframework. Moreover, the choice to deploy raw audio and Mel-spectrograms en-abled us to more effectively model long-range dependencies, to better representhow humans perceive music, and to potentially draw sounds for new arrangementsfrom the vast collection of music recordings accumulated in the last century. Ourapproach was tested on two different downstream tasks: given a bass line creat-ing credible and on-time drums, and given an acapella song arranging it to a fullsong. In absence of an objective way of evaluating the output of music generativesystems, we also defined a possible metric for the proposed task, partially basedon human (and expert) judgement.
Posterior Network: Uncertainty Estimation without OOD Samples via Density-Based Pseudo-Counts (Charpentier Bertrand). [Meet link, Jan 13th]
Accurate estimation of aleatoric and epistemic uncertainty is crucial to build safe and reliable systems. Traditional approaches, such as dropout and ensemble methods, estimate uncertainty by sampling probability predictions from different submodels, which leads to slow uncertainty estimation at inference time. Recent works address this drawback by directly predicting parameters of prior distributions over the probability predictions with a neural network. While this approach has demonstrated accurate uncertainty estimation, it requires defining arbitrary target parameters for in-distribution data and makes the unrealistic assumption that out-of-distribution (OOD) data is known at training time. In this work we propose the Posterior Network (PostNet), which uses Normalizing Flows to predict an individual closed-form posterior distribution over predicted probabilites for any input sample. The posterior distributions learned by PostNet accurately reflect uncertainty for in- and out-of-distribution data -- without requiring access to OOD data at training time. PostNet achieves state-of-the art results in OOD detection and in uncertainty calibration under dataset shifts.
Reliably quantifying the confidence of deep neural classifiers is a challenging yet fundamental requirement for deploying such models in safety-critical applications. In this paper, we introduce a novel target criterion for model confidence, corresponding to True Class Probability (TCP). We show that TCP offers better properties for confidence estimation than standard Maximum Class Probability (MCP). Since the true class is by essence unknown at test time, we propose to learn TCP criterion from data with an auxiliary neural network, introducing a specific learning scheme adapted to this context. We evaluate our approach on the task of failure prediction and with self-training strategies for domain adaptation, which both necessitate appropriate confidence estimates. Extensive experiments are conducted for validating the relevance of the proposed approach in each task. We study various network architectures, small and large scale datasets for image classification and semantic segmentation. In every tested benchmark, our approach outperforms strong baselines. We complete these results with a study of relevant variations and qualitative results to assess the quality of confidence estimates.
We extend the framework of variational autoencoders to represent transformations explicitly in the latent space. In the family of hierarchical graphical models that emerges, the latent space is populated by higher order objects that are inferred jointly with the latent representations they act on. To explicitly demonstrate the effect of these higher order objects, we show that the inferred latent transformations reflect interpretable properties in the observation space. Furthermore, the model is structured in such a way that in the absence of transformations, we can run inference and obtain generative capabilities comparable with standard variational autoencoders. Finally, utilizing the trained encoder, we outperform the baselines by a wide margin on a challenging out-of-distribution classification task.
Adversarially Learned One-Class Classifier for Novelty Detection and similar tasks (Khalooei Mohammad). [Meet link, Jan 15th]
This work was inspired by the success of generative adversarial networks (GANs) for training deep models in unsupervised and semi-supervised settings. We proposed an end-to-end architecture for one-class classification. The architecture is composed of two deep networks, each of which trained by competing with each other while collaborating to understand the underlying concept in the target class, and then classify the testing samples. One network works as the novelty detector, while the other supports it by enhancing the inlier samples and distorting the outliers. The intuition is that the separability of the enhanced inliers and distorted outliers is much better than deciding on the original samples.
Instant recovery of shape from spectrum via latent space connections (Rampini Arianna). [Meet link, Jan 16th]
Constructing compact encodings of geometric shapes lies at the heart of 2D and 3D Computer Vision. A desirable property in many applications is to be able to recover the shape from its (latent) encoding, and various auto-encoder architectures have been designed to solve this problem. At the same time, a classical approach in the domain of spectral geometry is to encode a shape using the sequence of eigenvalues (spectrum) of its Laplacian operator. Unfortunately, although encoding shapes via their Laplacian spectra is straightforward, the inverse problem of recovering the shape from the eigenvalues is very difficult. Thus the idea of combining the strengths of data-driven auto-encoders with those of spectral methods. We introduce the first learning-based method for recovering shapes from Laplacian spectra. Our model takes the form of an auto-encoder, enriched with a cycleconsistent module to map latent vectors to sequences of eigenvalues. Our datadriven approach replaces the need for ad-hoc regularizers required by prior methods, while providing more accurate results at a fraction of the computational cost. Our learning model applies without modifications across different dimensions (2D and 3D shapes alike), representations (meshes, contours and point clouds), as well as across different shape classes, and admits arbitrary resolution of the input spectrum without affecting complexity. The increased flexibility allows us to address notoriously difficult tasks in 3D vision and geometry processing and to provide a proxy to differentiable eigendecomposition, as we showcase in applications of mesh super-resolution, shape exploration, and style transfer.
Bayesian Triplet Loss: Uncertainty Quantification in Image Retrieval (Warburg Frederik). [Meet link, Jan 16th]
Uncertainty quantification in image retrieval is crucial for downstream decisions, yet it remains a challenging and largely unexplored problem. Current methods for estimating uncertainties are poorly calibrated, computationally expensive, or based on heuristics.We present a new method that views image embeddings as stochastic features rather than deterministic features. Our two main contributions are (1) a likelihood that matches the triplet constraint and that evaluates the probability of an anchor being closer to a positive than a negative; and (2) a prior over the feature space that justifies the conventional l2 normalization. To ensure computational efficiency, we derive a variational approximation of the posterior, called the Bayesian triplet loss, that produces state-of-the-art uncertainty estimates and matches the predictive performance of current state-of-the-art methods.
Reinforcement Learning and Time Series
Systematic Generalisation through Task Temporal Logic and Deep Reinforcement Learning (Gonzalez Leon Borja). [Meet link, Jan 12th]
This poster presents a neuro-symbolic agent that combines deep reinforcement learning (DRL) with temporal logic (TL), and achieves systematic out-of-distribution generalisation in tasks that involve following a formally specified instruction. Specifically, the agent learns general notions of negation and disjunction, and successfully applies them to previously unseen objects without further training. To this end, we also introduce Task Temporal Logic (TTL), a learning-oriented formal language, whose atoms are designed to help the training of a DRL agent targeting systematic generalisation. To validate this combination of logic-based and neural-network techniques, we provide experimental evidence for the kind of neural-network architecture that most enhances the generalisation performance of the agent. Our findings suggest that the right architecture can significatively improve the ability of the agent to generalise in systematic ways, even with abstract operators, such as negation, which previous research have struggled with.
Deep learning abilities to classify intricate variations in temporal dynamics of multivariate time series (Liotet Pierre). [Meet link, Jan 14th]
The aim of this work is to investigate the ability of deep learning (DL) architectures to learn temporal dynamics in multivariate time series. The methodology consists in using well known synthetic stochastic processes for which changes in joint temporal dynamics can be controlled. This permits to compare deep learning against classical machine learning techniques relying on documented hand-crafted waveletbased features. First, we assess the performance of several different DL architectures and show the relevance of convolutional neural networks (CNN). Second, we test the robustness of CNN performance in classifying subtle changes in multivariate temporal dynamics with respect to learning conditions (dataset size, time series sample size, transfer learning).
Interactive policy optimization for deep multi-agent reinforcement learning (Nema Videh). [Meet link, Jan 15th]
Multi-agent systems are gaining a lot of importance in machine learning. This includes a wide range of work in deep multi-agent reinforcement learning (RL). Here, one of the core differences from single-agent RL is the non-stationarity of the environment, i.e., the other agents' policy also changes as an agent updates its policy. Traditional methods like independent RL lead to undesirable behavior and might even lead to divergence, a classic example of which is gradient ascent descent (GDA) diverging from the Nash equilibrium in a bilinear zero-sum game. Such methods fail as they assume the other agents to be a part of the environment and not learning at all. This problem is magnified for more complicated general-sum high-dimensional environments with more than two agents. A few works like Learning with Opponent Learning Awareness (LOLA) take one step look-ahead at the learning of other agents. However, can we generalize this? Yes, we can. Using the idea of Competitive Gradient Descent (CGD), we can learn with full awareness. CGD is the natural generalization of GDA to multi-player games. It takes the interaction among the agents into account by including an additional bilinear interaction term in the updates. When viewed in terms of the Neumann series, the CGD update corresponds to infinitely recursive game-theoretic reasoning. This poster involves showing an extension of this optimization framework to practical policy-gradient (both stochastic and deterministic) and trust-region RL algorithms that can be used effectively in multi-agent settings. The experimental investigations are shown in 1. various zero-sum environments like the bilinear game, matching pennies, rock-paper-scissors, Markov soccer, etc., where it converges better than traditional methods and encourages interesting interactive behavior in the sequential settings 2. General-sum settings like iterated and sequential social dilemmas involving ideally cooperating agents that might defect due to social greed or fear. The CGD-type approaches converge to socially optimal equilibria, whereas traditional approaches lead to sub-optimal selfish equilibria. As an extended scope, this multi-agent optimization framework can also be generalized to more than two agents and applied to defend from adversarial policies in simulated robotic environments.
Safe Online Bid Optimization with Return-On-Investment Constraints (Romano Giulia). [Meet link, Jan 15th]
Online advertising campaigns collect a worldwide expenditure of more than 100 billion USD per year. Their automatic optimization is today one of the most challenging tasks in artificial intelligence. Customarily, the advertiser's goal is a tradeoff between achieving high volumes, maximizing the sales of the products to advertise, and high profitability, that is maximizing the Return-on-Investment (ROI). The business units of the companies need simple ways to address this tradeoff by maximizing the volumes while guaranteeing that a lower bound on the ROI is met. Although ROI constraints are crucial for most businesses, they are oftentimes violated by state-of-the-art methods for bid optimization, and they raise several major issues when designing ad hoc algorithms. In particular, we show that the problem of finding revenue-maximizing bids satisfying budget and ROI constraints cannot be approximated within any factor unless P=NP. However, we show that it is possible to design a pseudo-polynomial-time algorithm to find an optimal solution for the problem. Furthermore, we show that no online learning algorithm can violate the ROI constraints less than a linear number of times while guaranteeing a sublinear pseudo-regret. We show that an adaptation of the GCB algorithm provides a sublinear bound on the regret but rarely satisfies the ROI constraints. Then, we propose the GCBsafe algorithm to keep the probability of violating the ROI constraints under a predefined confidence δ at the cost of a linear pseudo-regret bound. Finally, we experimentally evaluate the performances of GCB and GCBsafe in terms of pseudo-regret/constraint-violation tradeoff, and we analyze the sensitivity of the algorithms.
Hierarchical reinforcement learning for efficient exploration and transfer (Steccanella Lorenzo). [Meet link, Jan 16th]
Sparse-reward domains are challenging for reinforcement learning algorithms since significant exploration is needed before encountering reward for the first time. Hierarchical reinforcement learning can facilitate exploration by reducing the number of decisions necessary before obtaining a reward. In this paper, we present a novel hierarchical reinforcement learning framework based on the compression of an invariant state space that is common to a range of tasks. The algorithm introduces subtasks which consist in moving between the state partitions induced by the compression. Results indicate that the algorithm can successfully solve complex sparse-reward domains, and transfer knowledge to solve new, previously unseen tasks more quickly.
To reason about and interact with their environment, robots require accurate scene explanations based on objects and their 6D pose. We propose to consider the physical plausibility of objects under their estimated pose to increase accuracy and reliability of such scene explanations. Based on results in object pose estimation, object pose refinement and robotic grasping, we show the benefit of considering physical plausibility at both object and scene level.
Dense 3D Reconstruction and Pose Estimation with Convolutional Neural Networks (Hoang Dinh-Cuong). [Meet link, Jan 14th, ONLY 2pm-2.30pm!]
We present a system for accurate 3D instance-aware semantic reconstruction and 6D pose estimation, using an RGB-D camera. Our framework couples convolutional neural networks (CNNs) and a state-of-the-art dense Simultaneous Localisation and Mapping (SLAM) system, ElasticFusion, to achieve both high-quality semantic reconstruction as well as robust 6D pose estimation for relevant objects. While the main trend in CNN-based 6D pose estimation has been to infer object’s position and orientation from single views of the scene, our approach explores performing pose estimation from multiple viewpoints, under the conjecture that combining multiple predictions can improve the robustness of an object detection system. The resulting system is capable of producing high-quality object-aware semantic reconstructions of room-sized environments, as well as accurately detecting objects and their 6D poses.
Multi-modal robotic visual-tactile detection of surface cracks (Palermo Francesca). [Meet link, Jan 14th]
We present results for an innovative approach involving vision and tactile sensing to detect and characterise surface cracks. The proposed algorithm localises surface cracks in a remote environment through videos/photos taken by an on-board robot camera, which is then followed by automatic tactile inspection of the surfaces. Faster R-CNN deep learning-based object detection is used for identifying the location of potential cracks. Random forest classifier is used for tactile identification of the cracks to confirm their presences. Offline and online experiments to compare vision only and combined vision and tactile based crack detection are demonstrated. Two experiments have been developed to test the efficiency of the multi-modal approach: online accuracy detection and time required to explore a surface and localise a crack. A total of 10 trials have been conducted to compare the accuracy of the multi-modal approach with vision only method. When using both modalities cooperatively, the model is able to correctly detect 92.85\% of the cracks while it decreases to 46.66\% when using only vision information. Exploring a surface using only tactile requires around 199 seconds. This time is reduced to 31 seconds when using both vision and tactile together. This approach may be implemented also in extreme environments (e.g. in nuclear plants) since gamma radiation does not interfere with the basic sensing mechanism of fibre optic-based sensors.
Low Dimensional State Representation Learning with Reward-shaped Priors (Botteghi Nicolò). [Meet link, Jan 15th]
Reinforcement Learning has been able to solve many complicated robotics tasks without any need for feature engineering in an end-to-end fashion. However, learning the optimal policy directly from the sensory inputs, i.e the observations, often requires processing and storage of a huge amount of data. In the context of robotics, the cost of data from real robotics hardware is usually very high, thus solutions that achieve high sample-efficiency are needed. We propose a method that aims at learning a mapping from the observations into a lower-dimensional state space. This mapping is learned with unsupervised learning using loss functions shaped to incorporate prior knowledge of the environment and the task. Using the samples from the state space, the optimal policy is quickly and efficiently learned. We test the method on several mobile robot navigation tasks in a simulation environment and also on a real robot.
When unmanned aerial vehicles (UAVs) fly autonomous missions, they typically rely on global satellite navigation systems (GNSS) like GPS for global position estimation. However, GNSS signals can be easily jammed. We propose a camera-based method that uses onboard imagery and data from Open Street Maps as a backup system for GNSS. First, the aerial imagery from the onboard camera is translated into a map-like representation. Then we match it with a reference map to infer the vehicle’s position. Experiments over a typical sized mission area are performed and exhibit localization accuracy <5m. Our results show that the proposed method can serve as a backup to GNSS systems where suitable landmarks like buildings and roads are available.
Learn to Path: Using neural networks to predict Dubins path characteristics for aerial vehicles in wind (Turricelli Erick). [Meet link, Jan 16th]
For asymptotically optimal sampling-based path planners such as RRT*, the quality of paths generated improves as the number of samples added to the motion tree increases. However, each additional sample often requires a nearest-neighbor search. If this search is computationally expensive, fewer states can be sampled and path quality degrades. In the case of planning time-optimal Dubins airplane6paths for aerial vehicles in wind, cost calculations are performed with an iterative solver, resulting in slow nearest-neighbor searches. This motivates finding an alternative, faster cost approximation using a lightweight neural network. In this paper, we demonstrate a learned approach which grows the RRT* motion tree at double the rate of previous, iterative methods and achieves up to 10.8 % shorter paths in the same search time as an iterative solver.
Generalising via meta-examples for continual learning in the wild (Bertugli Alessia, Stefano Vincenzi). [Meet link, Jan 12th]
Learning quickly and continually is still an ambitious task for neural networks. Indeed, many real-world applications do not reflect the learning setting where neural networks shine, as data are usually few, mostly unlabelled and come as a stream. To narrow this gap, we introduce FUSION - Few-shot UnSupervIsed cONtinual learning - a novel strategy which aims to deal with neural networks that "learn in the wild", simulating a real distribution and flow of unbalanced tasks. We equip FUSION with MEML - Meta-Example Meta-Learning - a new module that simultaneously alleviates catastrophic forgetting and favours the generalisation and future learning of new tasks. To encourage features reuse during the meta-optimisation, our model exploits a single inner loop per task, taking advantage of an aggregated representation achieved through the use of a self-attention mechanism. To further enhance the generalisation capability of MEML, we extend it by adopting a technique that creates various augmented tasks and optimises over the hardest. Experimental results on few-shot learning benchmarks show that our model exceeds the other baselines in both FUSION and fully supervised case. We also explore how it behaves in standard continual learning consistently outperforming state-of-the-art approaches.
Continual Language Understanding: Improving BERT One Task at a Time (Coria Juan Manuel). [Meet link, Jan 13th]
Continually training a neural network on a sequence of data is a challenging problem due to a famous phenomenon called catastrophic forgetting or interference, in which the model's performance on previous tasks drastically decreases with the length of the sequence. Many existing solutions deal with simplified scenarios on homogeneous tasks like MNIST splits and permutations, addressing catastrophic forgetting mostly by controlling either network capacity or the representation space with manually designed algorithms that can potentially fail in more complex settings. Following recent work on meta learning in the domain of computer vision, we address this problem by training a language model to learn a sequence of heterogeneous NLP tasks and reduce forgetting. We believe this approach can take us one step closer to reliable continual learning systems that are better adapted to non trivial language understanding problems.
The ability to learn in dynamic, nonstationary environments without forgetting previous knowledge, also known as Continual Learning (CL), is a key enabler for scalable and trustworthy deployments of adaptive solutions. While the importance of continual learning is largely acknowledged in machine vision and reinforcement learning problems, this is mostly under-documented for sequence processing tasks. This work proposes a Recurrent Neural Network (RNN) model for CL that is able to deal with concept drift in input distribution without forgetting previously acquired knowledge. We also implement and test a popular CL approach, Elastic Weight Consolidation (EWC), on top of two different types of RNNs. Finally, we compare the performances of our enhanced architecture against EWC and RNNs on a set of standard CL benchmarks, adapted to the sequential data processing scenario. Results show the superior performance of our architecture and highlight the need for special solutions designed to address CL in RNNs
Papers: https://ieeexplore.ieee.org/document/9207550, https://arxiv.org/abs/2004.04077
Many real world prediction problems involve structured tasks across multiple modalities. We propose to extend previous work in modular meta learning to the multimodal setting. Specifically, we present an algorithmic approach to apply task aware modulation to a modular meta learning system that decomposes structured multimodal problems into a set of modules that can be reassembled to learn new tasks. We also propose a series of experiments to compare this approach with state of the art modular and multimodal meta learning approaches on multimodal function prediction and image classification tasks.
Meta-learnt priors slow down catastrophic forgetting in neural networks (Spigler Giacomo). [Meet link, Jan 16th]
Current training regimes for deep learning usually involve exposure to a single task / dataset at a time. Here we start from the observation that in this context the trained model is not given any knowledge of anything outside its (single-task) training distribution, and has thus no way to learn parameters (i.e., feature detectors or policies) that could be helpful to solve other tasks, and to limit future interference with the acquired knowledge, and thus catastrophic forgetting. Here we show that catastrophic forgetting can be mitigated in a meta-learning context, by exposing a neural network to multiple tasks in a sequential manner during training. Finally, we present SeqFOMAML, a meta-learning algorithm that implements these principles, and we evaluate it on sequential learning problems composed by Omniglot and MiniImageNet classification tasks.
Machine Learning for Healthcare
The knowledge of the brain white matter (WM) anatomy is spreading thanks to the advances in neuroimaging. Tracking algorithms allow the reconstruction of axonal pathways composing the WM as 3d polylines, namely streamlines. The grouping of anatomically similar streamlines into bundles establish the most accurate knowledge of the WM anatomy. However, while such knowledge has recently lead to the publication of WM bundle atlases, it is still lacking a volumetric atlas (parcellation) of the WM. In this work, we aim to derive a first WM parcellation exploiting the anatomical principles borrowed directly from the streamline trajectories. We propose a geometric deep learning method that given a set of streamlines generates two outputs: (i) the classification of each streamline into a bundle class, and (ii) a latent dictionary encoding the information of direction and location of the 3d points of the streamline. The obtained learned latent dictionary can be mapped back to the 3d space providing both interpretable and discriminative parcellation of the white matter. Streamlines represented with the dictionary elements are distinguishable for the bundle class they belong to.
Decoding Raman Spectroscopy Towards the Diagnosis of SARS-COV-2 Infection and Other Diseases (Bertazioli Dario). [Meet link, Jan 13th]
Raman Spectroscopy promises the ability to encode in spectral data the significant differences between saliva samples belonging to patients affected by a disease and samples of healthy patients (controls). Based on this assumption, an efficient and non-invasive method for pathology screening could be developed. However, the interpretation of the Raman spectral fingerprint is a difficult and time-consuming procedure even for domain experts. We implement and test a deep learning CNN-based pipeline able to classify spectral data according to their provenance in terms of disease affected or controls patients. The pipeline has been tested for the detection of the SARS-COV-2 Infection, and for the screening of several neurodegenerative diseases such as Amyotrophic Lateral Sclerosis, Alzheimer, and Parkinson. The challenging character of this work is the attempt to deal with a small amount of data, at the same time trying to reduce the human delegated task, i.e. by avoiding the typical preprocessing required for interpreting the spectral data in an ordinary approach. The results are relevant, with the system being able to significantly discriminate positive (COVID, Alzheimer or Parkinson-affected) patients with satisfying performances, although a generalization study on a larger amount of data could be required for proper clinical settings.
Methods for subclonal reconstruction can provide key insights into tumour evolution by identifying subclonal driver mutations, patterns of parallel evolution and differences in mutational signatures between cellular populations (Dentro et al., 2017). The majority of CRC tumours develop through the well-established adenoma-carcinoma sequence of events initiated by genetic alterations in the APC tumour suppressor through aberrant Wnt signaling, and further promoted by oncogenic mutations in KRAS, BRAF, and PIK3CA, and loss or mutations in tumour suppressors such as TP53 (Fletcher et al., 2018). Our aim is to understand if and how alternative genetic pathways lead to different disease subtypes and if treatment response is affected by evolutionary profile. We extend methodology for copy number calling and subclonal reconstruction, currently able to analyse WGS and WES with a matched normal sample, in order to incorporate the analysis of samples with targeted sequencing and no matched normal to analyse a set of 355 colorectal samples. The cancer cell fraction (CCF) values can be estimated from copy number and variant calls of the samples. CCF allows us to identify clusters of clonal and subclonal mutations using Bayesian nonparametric methods. Additionally, identification of mutations in cancer driver genes enables further investigation into the chronological order of key mutation events, shedding light on the evolution history of the tumour. Existing ranking models such as Plackett-Luce are employed to infer evolutionary trajectories. We search for driver events that result in the characterisation of distinct cancer subtypes. The evolutionary subtypes inferred from DPClust should be able to discern between a majority of samples following the main evolutionary pathway and a subset of samples with a separate characterisation. This is clinically relevant for the early detection of aggressive subtypes based on the genetic profile as well as for treatment stratification."
CLOCS: Contrastive Learning of Cardiac Signals Across Space, Time, and Patients (Kiyasseh Dani). [Meet link, Jan 14th]
The healthcare industry generates troves of unlabelled physiological data. This data can be exploited via contrastive learning, a self-supervised pre-training mechanism that encourages representations of instances to be similar to one another. We propose a family of contrastive learning methods, CLOCS, that encourages representations across time, leads, and patients to be similar to one another. We show that CLOCS consistently outperforms the state-of-the-art approach, SimCLR, on both linear evaluation and fine-tuning downstream tasks. We also show that CLOCS achieves strong generalization performance with only 25% of labelled training data. Furthermore, our training procedure naturally generates patient-specific representations that can be used to quantify patient-similarity.
Incorporating network based protein complex discovery into automated model construction (Scherer Paul). [Meet link, Jan 16th]
We propose a method for gene expression based analysis of cancer phenotypes incorporating network biology knowledge through unsupervised construction of computational graphs. The structural construction of the computational graphs is driven by the use of topological clustering algorithms on protein-protein networks which incorporate inductive biases stemming from network biology research in protein complex discovery. This structurally constrains the hypothesis space over the possible computational graph factorizations whose parameters can then be learned through supervised or unsupervised task settings. The sparse construction of the computational graph enables the differential protein complex activity analysis whilst also interpreting the individual contributions of genes/proteins involved in each individual protein complex. In our experiments analysing a variety of cancer phenotypes, we show that the proposed methods outperform SVM, Fully connected MLP, and Randomly connected MLPs in all tasks. Our work introduces a scalable method for incorporating large interaction networks as prior knowledge to drive the construction of powerful computational models amenable to introspective study.
AI-based second harmonic generation signal multiphasor analysis for collagen micro-architecture investigation in tumor sections (Scodellaro Riccardo). [Meet link, Jan 16th]
Collagen organization changes with the tissue pathological conditions, like cancer, and can be monitored through second harmonic generation imaging, a label-free method sensitive to the fibrils microstructure. As a consequence, collagen can be exploited as an early-tumor diagnosis marker. Coupling a phasor-based method with a non-supervised machine learning algorithm, our protocol is able to map pixel-by-pixel crucial features of the collagen fibrils and enlighten different collagen organizations. Basing on these maps, our protocol can automatically discriminate, on fixed tumor sections, tumor area from the surrounding tissue with an accuracy ∼ 90%, opening the possibility to effectively assist histopathologists in cancer diagnosis.
Many common neurological and neurodegenerative disorders, such as Alzheimer’s disease, dementia and multiple sclerosis, have been associated with abnormal patterns of apparent ageing of the brain. Discrepancies between the estimated brain age and the actual chronological age (brain age gaps) can be used to understand the biological pathways behind the ageing process, assess an individual’s risk for various brain disorders and identify new personalised treatment strategies. By flexibly integrating minimally preprocessed neuroimaging and non-imaging modalities into a population graph data structure, we train two types of graph neural network (GNN) architectures to predict brain age in a clinically relevant fashion as well as investigate their robustness to noisy inputs and graph sparsity. The multimodal population graph approach has the potential to learn from the entire cohort of healthy and affected subjects of both sexes at once, capturing a wide range of con- founding effects and detecting variations in brain age trends between different sub-populations of subjects.
In this work we explore the applicability of Generative Adversarial Networks (GAN) in Recommender Systems. More specifically we present GANMF, a GAN tasked to estimate the user and item latent factors in a matrix factorization (MF) approach for the generic Top-N recommendation problem. We show through extensive experiments that GANMF is on par and in some datasets better than traditional MF techniques like PureSVD and WRMF. Moreover we perform an ablation study on the components of GANMF in order to understand the importance of each of them.
DBVAE: Deep Belief Variational Autoencoder for top-N Recommender Systems (Ray Ruchira). [Meet link, Jan 14th]
Countless e-commerce businesses employ recommender systems to address the exponential growth of their products and customers. Collaborative filtering (CF) has become a widespread approach to provide personalized recommendations to users (especially movies, songs and online shopping items). This technique attempts to learn user-item relationships based on historical implicit/explicit data. Matrix factorization (MF) is a class of collaborative filtering techniques which outperforms other variants of the CF approach. However, the sparsity of data prevents them from learning user-item representation efficiently. In this paper, we propose a Deep Belief Network to extract the underlying features of the user-item matrix. Subsequently, these features are fed into a variational autoencoder to provide top-N recommendations. We conducted experiments with popular real-world datasets to show the effectiveness of the proposed model. The proposed model outperforms state-of-the-art recommendation approaches in standard evaluation metrics.
Safety and Fairness
Deep learning has produced big advances in artificial intelligence, but trained neural networks often reflect and amplify bias in their training data, and thus produce unfair predictions. We propose a novel measure of individual fairness, called prediction sensitivity, that approximates the extent to which a particular prediction is dependent on a protected attribute. We show how to compute prediction sensitivity using standard automatic differentiation capabilities present in modern deep learning frameworks, and present preliminary empirical results suggesting that prediction sensitivity may be effective for measuring bias in individual predictions.
Explainable Artificial Intelligence (XAI) is booming in the academia and industry, mainly thanks to the proliferation of darker more complex black-box solutions which are replacing their more transparent ancestors. Believing that the over-all performance of an XAI system can be augmented by considering the end-user as a human being, we are studying the ways we can improve the explanations by making them more informative and easier to use from one hand, and interactive and customizable from the other hand.
Benchmarking Bias Mitigation Algorithms in Representation Learning through Fairness Metrics (Reddy Charan). [Meet link, Jan 15th]
With the recent expanding attention of machine learning researchers and practitioners to fairness, there is a void of a common framework to analyze and compare the capabilities of proposed models in deep representation learning. In this paper, we evaluate different fairness methods trained with deep neural networks on a common synthetic dataset to obtain a better insight into the working of these methods. In particular, we train about 2000 different models in various setups, including unbalanced and correlated data configurations, to verify the limits of the current models and better understand in which setups they are subject to failure. In doing so we present a dataset, a large subset of proposed fairness metrics in the literature, and rigorously evaluate recent promising debiasing algorithms in a common framework hoping the research community would take this benchmark as a common entry point for fair deep learning.
Regularisation Can Mitigate Poisoning Attacks: A Novel Analysis Based on Multiobjective Bilevel Optimisation (Carnerero Cano Javier). [Meet link, Jan 16th]
Machine Learning (ML) algorithms are vulnerable to poisoning attacks, where a fraction of the training data is manipulated to deliberately degrade the algorithms' performance. Optimal poisoning attacks, which can be formulated as bilevel optimisation problems, help to assess the robustness of learning algorithms in worst-case scenarios. However, current attacks against algorithms with hyperparameters typically assume that these hyperparameters remain constant ignoring the effect the attack has on them. We show that this approach leads to an overly pessimistic view of the robustness of the algorithms. We propose a novel optimal attack formulation that considers the effect of the attack on the hyperparameters by modelling the attack as a multiobjective bilevel optimisation problem. We apply this novel attack formulation to ML classifiers using L2 regularisation and show that, in contrast to results previously reported, L2 regularisation enhances the stability of the learning algorithms and helps to mitigate the attacks. Our empirical evaluation on different datasets confirms the limitations of previous strategies, evidences the benefits of using L2 regularisation to dampen the effect of poisoning attacks and shows how the regularisation hyperparameter increases with the fraction of poisoning points.
Video + Poster + Slides: https://www.dropbox.com/sh/nccc84y07tsxcdf/AABKjPunyfjN0nFMaKAjtPJoa?dl=0
Applied Machine Learning
Elderly Human Activity Recognition: Time and Frequency Domain Features Analysis (Abdu Haruna). [Meet link, Jan 12th]
Mobile devices such as smartphones and smartwatches are the most commonly attached devices to humans in respective of age, gender, or health status of an individual. Activity/action recognition sensors (such as an accelerometer) are embedded inside these devices that generate useful data for activity recognition. Due to the inability of elderly peoples in performing some physical activities, it makes the sensor generate noisy data that most times the noise frequency is higher than the intended pure signals, and statistical features only can not be able to represent the subject’s posture correctly, which leads to the activity recognition models unable to generalize. The aim of this research is to analyses the time and frequency domain features relevant in recognizing elderly human activities including transitions. Tri-axial accelerometer and its signal magnitude were used as the time domain data, and the frequency domain data was extracted by performing spectral analysis using Fast Fourier Transform (FFT). A fixed-size sliding window with 50% overlap was used in segmenting the raw sensor data. Relevant features were extracted and selected using an Extremely Randomized Trees Classifier (Extra Trees Classifier). To correctly classify the activities a random forest classifier was used. 95% accuracy was achieved when both time and frequency domain features were used, and only 82% and 88% accuracy was obtained using time and frequency domain features only respectively.
Physics-Informed Machine Learning Simulator for Wildfire Propagation (Azeglio Simone). [Meet link, Jan 12th]
The impact of anthropogenic climate change is increasing the frequency and the severity of wildfires in many regions of the world. These events cause countless damages to ecosystems, as well as producing serious socioeconomic con- sequences. Scientific models able to predict wildfire evolution have been gaining significant interest since the 1940s, not only from a research point of view, but also for practical needs in emergency situations. A wildfire is by definition a complex physical system, since its behaviour spans on different scales and its evolution is not solely related to boundary conditions - such as weather, vegetation, soil conditions etc. - but is strongly dependent on fluid dynamics phenomena that cannot be predicted on the basis of environmental data by itself. This prevents pure Machine Learning (ML) models from succeeding. Because of this reason, most successful models are built upon a physics-based Computational Fluid Dynamics (CFD) core and one of the most trusted, state-of-the-art implementation, is the module “FIRE”, part of the WRF simulator, employed by several governmental institutions. To dramatically improve WRF-Fire performances, we would like to employ Scientific Machine Learning (Sci-ML) techniques such as “physics-informed model discovery and learning”. Practically, it means modelling Partial Differential Equations (PDEs) with some peculiar ML architectures that will be presented in more detail below. This innovative approach - would not be “data-driven”, but “physics-informed”.
Preprint : https://arxiv.org/abs/2012.06825
Code : https://github.com/MachineLearningJournalClub/MLJC-UniTo-ProjectX-2020-public/
Slides + example: https://github.com/MachineLearningJournalClub/MLJC-UniTo-ProjectX-2020-public/tree/rel2/M2LSchoolSlides
On the extraction and interpretation of heterogeneous tabular data (Bonfitto Sara). [Meet link, Jan 13th]
Nowadays, spreadsheets are one of the most used means for the representation of table-structured data. However, they do not impose any restriction on the table structure as well as on its content. This makes hard its automatic processing and the integration with other information sources. The poster presents our approach for extracting data from spreadsheets and transforming them in meaningful information. The approach takes into account the heterogeneity of the values and errors that can occur in single columns and proposes different graphical facilities that we are developing for supporting the user in facing this kind of problems.
To probe the evolution of the universe, estimations of galaxy redshifts are typically required. The most accurate measure of redshift is known as the spectroscopic redshift, but it is expensive to acquire and not available for all detected galaxies. In this work, we study how data augmentation can be used to supplement deep learning methods for predicting redshift from multi-band galaxy images. In particular, we show how altering the relative colour between channels is related to the predictive ability of the neural network. Finally, we also investigate how we can use these augmentations in conjunction with contrastive learning methods and compare the results to fully supervised learning.
A Machine Learning approach to monitor Public Lighting energy consumptions (Danese Valeria). [Meet link, Jan 13th]
The goal of this work is to provide a monthly monitoring of the Public Lighting contracts in Italy owned by the global energy player Engie in order to promptly detect energy frauds or technical malfunctions. Such situations can be noticed by analyzing meter readings of each Point Of Distribution (POD). Indeed, a POD is made up of several lighting poles whose theoretical consumption depends on geographical location and some technical parameters which are not available. Using only meter readings as input, we provide a Machine Learning approach that estimates each POD expected consumption and reports anomalous values. Engie Efficiency Managers have started to validate algorithm outputs by doing practical verifications. The first results show that the system has a good ability to detect anomalous consumption cases.
Inland surface waters expand with floods and contract with droughts, so there is no one map of our streams. Current satellite approaches are limited to monthly observations that map only the widest streams. These are fed by smaller tributaries that make up much of the dendritic surface network but whose flow is unobserved. A complete map of our daily waters can give us an early warning for where droughts are born: the receding tips of the flowing network. Mapping them over years can give us a map of impermanence of our waters, showing where to expect water, and where not to. To that end, we feed the latest high-res sensor data to multiple deep learning models in order to map these flowing networks every day, stacking the times series maps over many years. Specifically, (i) we enhance water segmentation to 50 cm/pixel resolution, a 60 times improvement over previous state-of-the-art results. Our U-Net trained on 30-40cm WorldView3 images can detect streams as narrow as 1-3m (30-60 times over SOTA). Our multi-sensor, multi-res variant, WasserNet, fuses a multi-day window of 3m PlanetScope imagery with 1m LiDAR data, to detect streams 5-7m wide. Both U-Nets produce a water probability map at the pixel-level. (ii) We integrate this water map over a DEM-derived synthetic valley network map to produce a snapshot of flow at the stream level. (iii) We apply this pipeline, which we call Pix2Streams, to a 2-year daily PlanetScope time-series of three watersheds in the US to produce the first high-fidelity dynamic map of stream flow frequency. The end result is a new map that, if applied at the national scale, could fundamentally improve how we manage our water resources around the world.
Personal Informatics System for Robust Recognition of Human’s Behavior (Gashi Shkurta). [Meet link, Jan 14th]
Mobile and wearable devices such as, e.g., smartphones and smartwatches, enable the continuous and unobtrusive collection of sensor data like, e.g., physical activity or heart rate, outside laboratory, in everyday life settings. These data can be used to derive insights regarding user’s behavior in everyday life and advance the personalization and effectiveness of applications that aim to promote health and well-being. The class of systems that help people collect and reflect on personal information are known as personal informatics systems and have a high potential to also enhance people’s performance and well-being at the workplace. Despite the significant amount of research in this field, the deployment of such systems in real-life settings is still complex and prone to errors. This is because the quality of the data collected in natural settings is significantly hampered by the presence of noise in the signals, missing data or varying contexts. To this end, signal enhancement and noise-robust methods are needed for accurate assessment of people’s behavior. While several factors might impact the robustness of personal informatics systems in natural settings, we investigate three forms of prominent heterogeneity that these systems are expected to encounter in real-world: (1) noisy data, (2) missing data and (3) concurrent behaviors. In this poster, I will discuss how we applied machine learning techniques to data sets collected in real-world settings for overcoming the aforementioned forms of heterogeneity.
Unobtrusive Recognition of Knowledge Workers’ Behavior and Affect Using Sensor Data (Di Lascio Elena). [Meet link, Jan 14th]
The ubiquity of personal devices such as, e.g., smartphones and smartwatches, have enabled a continuous collection of sensor data as, e.g., location or heart rate in real-life settings. Processing these data using machine learning and data analytics techniques, is possible to derive information about people’s behavior, activities, and affect in real-life settings. This information can be then used by personal informatics systems to guide intervention strategies for promoting awareness and motivate behavioral changes towards a healthier and happier life. Recently there has being a growing interest in using personal informatics systems to support knowledge workers with the goal of improving their productivity and well-being. Personal informatics systems in the workplace could, for example, suggest workers take breaks when prolonged work is assessed or help workers to schedule their workdays based on their detected levels of productivity, engagement, and stress. While this certainly sounds intriguing, many obstacles must still be overcome to obtain reliable systems that could support people at work. In this poster, I will report about our experience in using sensor data collected from personal devices to automatically and unobtrusively infer workers’ behavior and affect during work activities.
Intention and trajectory prediction of road users Using sequential deep learning algorithms (Achaji Lina). [Meet link, Jan 15th]
In our work, we are investigating deep learning methods to solve the intention and trajectory prediction problem for road users in front of autonomous cars. However, while deep learning techniques are achieving a remarkable success on well defined structured data, the information we consume in our environment has a network structure. To understand one element of this information, we need to understand how it is endorsed and referenced by the other elements of a vast network of links. A fundamental point here is that when we formulate our model within a network framework, we should evaluate our actions not in isolation, but with the expectation that the world will respond to what we do, based on the fact that each individual's actions have implicit consequences for the outcomes of everyone in the system. To this end, we are researching on graph neural networks and graph scene parsing techniques, along with the sequential deep learning algorithms such as LSTMs, to solve the trajectory and intention prediction problem.
Handling Non-Stationary Experts in Inverse Reinforcement Learning: A Water System Control Case Study (Likmeta Amarildo). [Meet link, Jan 15th]
One of the challenges for applying Reinforcement Learning (RL) in real-world scenarios is the absence of a formalized reward signal, especially in presence of multiple, possibly conflicting, objectives. However, observational data of many real systems are nowadays available, providing demonstrations from experts (e.g., human operators) that can be used in Inverse Reinforcement Learning (IRL) to formalize the observed task in an RL fashion. In this paper, we address the problem of inferring the preferences of the historical operation of Lake Como. In this case study, no interaction with the environment is allowed, and only a fixed dataset of demonstrations is available. Moreover, the expert is non-stationary since its intentions change during decades when exposed to changing external forces. For this reason, we propose an extension of the batch model-free algorithm Σ-GIRL to the non-stationary case. For the Lake Como scenario we provide formalization, experiments and a discussion to interpret the obtained results.
Video presentation: https://slideslive.com/38943284
Analyzing the mismatch between the education system and the job market (Makdoun Ibtissam). [Meet link, Jan 16th]
Analyzing the mismatch between the education system and the job market is one of the areas of studies in order to align the education system to the need of the job market and fix the gap. Most studies focus on analyzing the mismatch in terms of vertical and horizontal mismatch. Vertical mismatch occurs when the level of education or qualification is less or more than required while Horizontal mismatch occurs when the type of education or skills is inappropriate for the job. Hence, previous works overlook the sector or the discipline mismatch that can result in a high number of unemployment among higher education graduates. We note that unemployment among higher education graduates can have several effects on job satisfaction and wages. We propose a new methodology to study and compare the disciplines needed by the job market and those supplied by the education system. Results will help us define the main disciplines needed by the job market and inform universities about to help them align their curricula.
Generating Crochet Patterns With Recurrent Neural Networks (Nakiranda Proscovia). [Meet link, Jan 16th]
Crocheting is a process of creating fabric by interlocking loops of yarn, thread, or strands of other materials using a crochet hook.It has been used to make clothing, de ́cor, ac- cessories, and recently in architecture, and mathematics to explain different geometrical concepts . It can be used to come up with design objects of up to three dimensions. To come up with a given object shape, one has to follow algo- rithmic instructions called crochet patterns. Coming up with new innovative patterns is a non-trivial task and challenging especially for beginners. Recently software tools for creating crochet patterns have emerged. However, most existing soft- ware are limited to generating two-dimensional patterns and require a sketch from a user as first input. This paper proposes the use of a machine learning method, specifically recurrent neural networks (RNN) to generate three-dimension crochet patterns. The model is trained with existing crochet patterns and it learns generate new crochet patterns automatically. In particular the model is based on a special kind of RNN known as a Long-Short-Term-Memory (LSTM) network. LSTM net- works have enhanced memory capability, creating the possibility of using them for learning and generating crochet patterns.
Deep Learning Framework in the Remote Sensing Application for Smart City Development: A case of Raleigh, NC (Ogungbire Abimbola). [Meet link, Jan 16th]
Smart cities are cities that provide the essential infrastructure and avail the average citizen a decent life. Rainwater harvesting is an example of smart city solution. In this poster, I will propose a method to quantify the total impervious surface for Raleigh, North Carolina. This quantification is going to help assess the potential of smart solution in this city. This method is going to be automated to extract urban roof tops and other relevant impervious surface for the purpose of rain water harvesting.
Indoor positioning through sequence modelling: a simple and effective approach (Saccomanno Nicola). [Meet link, Jan 16th]
My research is focused on the development of new approaches to improve the state of the art in indoor positioning, mainly considering the widely adopted technique of WiFi fingerprinting. Precisely, the idea is that of developing approaches to solve some well-known issues related to this field while providing (accuracy) performance comparable to or better than the current state of the art approaches. To achieve this, we are investigating deep learning techniques. A first step that produced exciting results was the application of LSTM models to a particular representation of WiFi fingerprints. We showed that exploiting ranking based fingerprints, thus allowing us to almost entirely ignore the exact signal strength, we were capable of achieving comparable performance to SOTA solutions, also providing robustness to signal strength fluctuations (noise). Current investigation and analysis regarding the application of attention-based models (i.e., LSTM with attention and transformers) are undergoing.
The HL-LHC scenario for Particle Physics is forcing the WLCG community to look for solutions capable of satisfying the huge future demand in terms of storage and computing resources. As of today, the community is starting to exploit a Data Lake-based solution that envisions a small number of Data Lakes across the world with a reduced number of storage endpoints. We expect data caching to play a central role as a technical solution to reduce the load on custodial data centers and the impact of latency. In this work, a Reinforcement Learning-based cache model is applied in CMS context and the first results are presented.
Research Engineering and Software Tools
Competitive Analysis of the Top Gradient Boosting Machine Learning Algorithms (Ayachit Sai). [Meet link, Jan 12th]
In this paper, we compare four state-of-the-art gradient boosting algorithms viz. XGBoost, CatBoost,LightGBM and SnapBoost. All these algorithms are a form of Gradient Boosting Decision Trees(GBDTs). They find wide usage across competitive machine learning contests like Kaggle, due to their flexibility and considerably faster training times. Since a typical vanilla GBDT is usually implemented as a black box model, our research makes an attempt to help improve the explainability of GBDTs. We performed an exhaustive 360 degree comparative analysis of each of the four algorithms by training and testing them on diverse datasets leveraging IBM’s PowerAI AC922 CPU. The analysis was performed using two approaches; One was by training the baseline algorithms on the datasets, and the other was by performing systematic hyperparameter optimization (HPO) using the HyperOpt framework. Although the HPO process is resource-intensive, the Power System architecture facilitated lower training times without compromising the algorithm’s accuracy for each of the datasets. We present the accuracy scores and training times across the four datasets for both the aforementioned approaches. The results imply that despite interesting trends observed across all the datasets, there is no clear winner that excels equally in every aspect of performance.
Traditional methods of studying accretion flows onto black holes mainly consist of computationally expensive numerical simulations. This often imposes severe limitations to the dimensionality, simulation times, and resolution. Computational astrophysics is in urgent need of new tools to accelerate the calculations, thereby leading to faster results. We propose a deep learning method to make black hole "weather forecasting": a data-driven approach for solving the chaotic dynamics of BH accretion flows. Our model can reproduce the results of a hydrodynamic simulation with an error < 3% and also speed-up the calculations by a factor of 10^4.5, thus reducing the simulation time.