Posters

The following posters will be presented during the school this year.

Monday, 12th September

Improving Generalization in Federated Learning by Seeking Flat Minima (Debora Caldarola)

Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum, linking the model's lack of generalization capacity to the sharpness of the solution. Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) and ii) averaging stochastic weights (SWA) on the server-side can substantially improve generalization in Federated Learning and help bridging the gap with centralized models. By seeking parameters in neighborhoods having uniform low loss, the model converges towards flatter minima and its generalization significantly improves in both homogeneous and heterogeneous scenarios. Empirical results demonstrate the effectiveness of those optimizers across a variety of benchmark vision datasets (e.g. CIFAR10/100, Landmarks-User-160k, IDDA) and tasks (large scale classification, semantic segmentation, domain generalization). 

Anomaly Detection Requires Better Representations (Tal Reiss)

Anomaly detection seeks to identify unusual phenomena, a central task in science and industry. The task is inherently unsupervised as anomalies are unexpected and unknown during training. Recent advances in self-supervised representation learning have directly driven improvements in anomaly detection. In this position paper, we first explain how self-supervised representations can be easily used to achieve state-of-the-art performance in commonly reported anomaly detection benchmarks. We then argue that tackling the next generation of anomaly detection tasks requires new technical and conceptual improvements in representation learning.

AI for the Preservation of Cultural Heritage (Lucia Cipolina-Kun)

Within the field of Cultural Heritage, image inpainting is a conservation process that fills in missing or damaged parts of an artwork to present a complete image. Multi-modal diffusion models have brought photo-realistic results on image inpainting where content can be generated by using descriptive text prompts.  Additionally, generative models produce many plausible outputs for a given prompt. Our work presents a methodology to improve the inpainting of fine art by automating the selection process of inpainted candidates. We propose a discriminator model that processes the output of inpainting models and assigns a proba- bility that indicates the likelihood that the restored image belongs to a certain painter

Learning new physics efficiently with nonparametric methods (Gianvito Losapio)

We present a machine learning approach for model-independent new physics searches. The corresponding algorithm is powered by recent large-scale implementations of kernel methods, nonparametric learning algorithms that can approximate any continuous function given enough data. Based on a recent proposal, the model evaluates the compatibility between experimental data and a reference model, by implementing a hypothesis testing procedure based on the likelihood ratio. Model-independence is enforced by avoiding any prior assumption about the presence or shape of new physics components in the measurements. We show that our approach has dramatic advantages compared to neural network implementations in terms of training times and computational resources, while maintaining comparable performances. In particular, we conduct our tests on higher dimensional datasets, a step forward with respect to previous studies.

SimplePatho: a Dataset for Automatic Text Simplification of Pathology Reports (Jan Trienes)

Text simplification aims to reduce the complexity of a document while retaining most of its meaning. The goal is to make information more accessible to a wide audience of people, including language learners, people with reading difficulties and children. Current work on automatic text simplification mainly considered Wikipedia and News articles. In this work, we propose a novel dataset for text simplification consisting of pathology reports. We argue that pathology reports are a central piece of documentation that helps patients understand their clinical condition. In addition, medical reports are an interesting testbed for exploring the generalizability of state-of-the-art text simplification methods to the clinical domain. We discuss a novel annotation protocol which is used by 9 medical students to manually simplify a large corpus of 1000 German pathology reports. We analyze the simplification operations that were applied to understand how professionals simplify these texts. In addition, we prepare the corpus for use in machine learning based text simplification methods and establish baselines on this data. We plan to release the dataset with other researchers working on text simplification.

Nesterov Momentum for Dose Optimization in CT-Scan using Deep Learning (Aurelle Tchagna Kouanou)

Deep Learning (DL) and Data Science (DS) brought several breakthroughs in biomedical image analysis by making available more consistent tools and robust for the identification, classification, reconstruction, denoising, quantification and segmentation of patterns in biomedical images. Recently, some applications of DL in Computer Tomography scans (CT) for low dose optimization were developed. DL comes with a new vision to process biomedical data images from CT scans. It becomes important to develop architectures and/or methods based on DL algorithms for minimizing radiation during a CT scan exam thanks to reconstruction and processing technics. Some architectures were proposed in the literature with stochastic parallel gradient descent (SPGD) but did not reach a very good accuracy. To improve the classical SPGD algorithm during the biomedical image CT Scan Exam, we proposed in this paper an optimized algorithm based on Nesterov-accelerated adaptive momentum estimation. The proposed architecture is evaluate based on the Training Loss and the Training PSNR according to several epochs, respectively

Towards Implementing Truly Sparse Connections in Deep RL Agents (Bram Grooten)

In this work we extend the effectiveness of dynamic sparse training in the regime of deep reinforcement learning with the aim of going beyond simulating sparsity in neural networks with binary masks, and to open the path for truly sparse implementations. We enhance the previous work, by masking also the estimated first and second moments of the gradients within the Adam optimizer for non-existing connections. In Humanoid-v3, considered one of the most difficult environments of the MuJoCo Gym suite, we reach 90% sparsity while outperforming dense training for TD3 and SAC.

Reinforcement Learning in Multi-Objective Multi-Agent Systems (Willem Ropke)

Multi-objective games present a natural framework for studying strategic interactions between rational individuals concerned with more than one objective. We explore both the game-theoretical foundations as well as the learning behaviour of agents in such games.

An Information-Theoretic Approach for Unsupervised Keypoint Detection (Ali Younes)

We propose InfoKey a novel approach for keypoint detection from videos. Our method treats keypoints as messengers, transmitting the information from the surrounding area. We build upon that to define two losses; the Information maximization loss; that optimizes the information transmitted collectively by keypoints, and the information transportation loss; which aims to find the optimal transport of a keypoint between two consecutive frames. Our method tackles previously unresolved challenges in the context of unsupervised keypoint detection from videos. We compare against representative baselines in challenging datasets of everyday human tasks, unveiling the power of our approach in detecting the scene structure and tracking moving entities.

Extraction of procedural knowledge from robot-assisted surgical texts (Marco Bombieri)

"Thousands of different types of surgical procedures are performed daily in hospitals around the world. These procedures are typically described in detail in books,  manuals,  academic papers and online resources abundantly available nowadays. The description of a procedure conveys the so-called procedural knowledge i.e. the knowledge possessed by an intelligent agent able to perform a task.

The automatic extraction of knowledge about intervention execution from surgical manuals is of the utmost importance to develop knowledge-based clinical decision support systems, to automatically execute some procedure’s steps or to summarise procedural information spread throughout the texts in a structured form usable as a study resource by medical students. 

This poster deals with the automatic extraction of procedural knowledge from written texts of the robotic and robotic-surgical sectors using natural language processing techniques."

Continual Learning of Dynamical Systems with Competitive Multi-Head Reservoir Computing (Leonard Bereska)

"Machine learning recently proved efficient in learning differential equations and dynamical systems from data.

However, the data is commonly assumed to originate from a single never-changing system.

In contrast, when modeling real-world dynamical processes, the data distribution often shifts due to changes in the underlying system dynamics.

Continual learning of these processes aims to rapidly adapt to abrupt system changes without forgetting previous dynamical regimes.

This work proposes an approach to continual learning based on reservoir computing, a state-of-the-art method for training recurrent neural networks on complex spatiotemporal dynamical systems.

Reservoir computing fixes the recurrent network weights - hence these cannot be forgotten - and only updates linear projection heads to the output.

We propose to train multiple competitive prediction heads concurrently.

Inspired by neuroscience's predictive coding, only the most predictive heads activate, laterally inhibiting and thus protecting the inactive heads from forgetting induced by interfering parameter updates.

We show that this multi-head reservoir minimizes interference and catastrophic forgetting on several dynamical systems, including the Van-der-Pol oscillator, the chaotic Lorenz attractor, and the high-dimensional Lorenz-96 weather model. Our results suggest that reservoir computing is a promising candidate framework for the continual learning of dynamical systems."

Vehicle Localization and Anomaly Detection for Video Surveillance in a Dynamic Bayesian Network Framework (Giulia Slavic)

This research project proposes a Bayesian hierarchical multi-sensorial framework for self-aware artificial agents based on prominent neuroscience theories. A hierarchical Coupled Dynamic Bayesian Network (CDBN) model is used, combining information from several sensors, both proprioceptive and exteroceptive, low-dimensional and high-dimensional. Vehicle localization is performed using image data and the learned CDBN. When dealing with unexpected situations, anomalies with respect to the normal cases are detected.

Mechanical Modeling and Data-driven Control of Continuum Soft Manipulators for Environment Interaction​ (Carlo Alessi)

Continuum soft manipulators are hyper-redundant robotic arms made of compliant materials that can deform elastically while safely interacting with unstructured environments. However, we pay the inherent compliance of soft materials with structures with virtually infinite degrees of freedom and complex nonlinear dynamics. Consequently, traditional modeling and control methods are not directly applicable because they assume a kinematic chain of rigid links. Nonetheless, models based on reduced-order continuum mechanics are becoming a standard for describing the deformations of soft bodies. In addition, data-driven controllers based on machine learning can exploit their nonlinear dynamics.

Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning (Raphael Avalos)

Many recent successful off-policy multi-agent reinforcement learning (MARL) algorithms for cooperative partially observable environments focus on finding factorized value functions, leading to convoluted network structures. Building on the structure of independent Q-learners, our LAN algorithm takes a radically different approach, leveraging a dueling architecture to learn decentralized best-response policies via individual advantage functions. The learning is stabilized by a centralized critic whose primary objective is to reduce the moving target problem of the individual advantages. The critic, whose network's size is independent of the number of agents, is cast aside after learning. Evaluation on the StarCraft II multi-agent challenge benchmark shows that LAN reaches state-of-the-art performance and is more scalable with respect to the number of agents, opening up a new promising direction for MARL research.

Fully Differentiable Approach to Semiempirical Quantum Mechanics (Christian Hoelzer)

"We present a semiempirical quantum mechanical feature model based on the extended tight binding~(xTB) method. Using automatic differentiation frameworks allows to efficiently include the xTB Hamiltonian as layer into machine learning models. With the quantum mechanical feature space information about the local electronic structure become available as descriptors and also allow to include their dependence on the semiempirical parameters in the training. 

The access to the internal feature representation of molecular information within the xTB framework leverages the possiblity to use machine learning for inference based on robust, quantum physics-based descriptors. 

For a fully automated re-parameterization, the parameters used to parameterize the method can be optimized using gradient descent. Simple loss functions or more complicated functions can be used as objective functions. In this way, the xTB implementation can be optimized for the respective use case and thus the accuracy of the xTB method can be improved."

Generalisation in Cooperative Multi-Agent Reinforcement Learning (Jonathan Cook)

Substantial advances in reinforcement learning (RL) have led to a growth in research interest towards the generalisation capabilities of RL agents. Meanwhile, cooperative  multi-agent reinforcement learning (MARL) has been successfully applied to an expanding plethora of tasks, in both simulated and real-world environments. However, whilst the study of generalisation in single-agent RL is burgeoning, there has been much less research into the generalisation of cooperative MARL systems. Specifically, common practice has seen MARL systems trained and evaluated in the same environment, which is initialised identically across episodes. This restricts the model's ability to learn or be tested on any general task dynamics that could extend across different versions of an environment. This project aims to address the research deficit at the intersection of cooperative MARL and the broader landscape of RL generalisation research. We first design two fully-cooperative grid-world games, one very simple and one requiring more complex coordination, and use procedural content generation to instantiate unique environment levels. Using these games, we evaluate the compositional generalisation of prominent value decomposition and policy gradient cooperative MARL algorithms. In doing so, we show that multi-agent policy gradient algorithms efficiently learn generalisable policies, whilst state-of-the-art approaches to value decomposition exhibit a more prominent generalisation gap. We also demonstrate the effectiveness and generality of independent learning on these environments and extend this analysis to a continuous control task.

Wednesday, 14th September

Unsupervised Segmentation of Multimodal Images for Standard Cell Reverse Engineering (Sharon Lin)

In the last decade, advancements in silicon nano-fabrication have rendered integrated circuits far more difficult to reverse-engineer (RE). While this may be beneficial for manufacturers, it impairs law enforcement agencies, quality assurance, and offensive security researchers who rely on RE to ensure the safety of consumers of hardware products. One of the main challenges of modern-day hardware RE is re-constructing the original circuit design from a lone chip. Chip imaging can be done in a clean room setting using scanning electron microscopy, but the image quality leads to insufficiently accurate image segmentation of chip layers into tracks and vertical interconnect accesses (VIAs). This project describes our attempt at using unsupervised learning techniques to automate the image processing and track/VIA detection on these images, thus allowing us to construct a netlist from the resulting chip layer images in order to begin the process of hardware RE. 

Visual data detection through side-scattering in a multimode optical fiber (Barak Hadad)

Light propagation in optical fibers is accompanied by random omnidirectional scattering. The small fraction of coherent guided light that escapes outside the cladding of the fiber forms a speckle pattern. Here, visual information imaged into the input facet of a multimode fiber with a transparent buffer is retrieved, using a convolutional neural network, from the side-scattered light at several locations along the fiber. This demonstration can promote the development of distributed optical imaging systems and optical links interfaced via the sides of the fiber.

Predicting hazardous materials in the Swedish building stock using data mining (Pei-Yu Wu)

Transitioning into a circular construction is an inevitable trend to optimize resource efficiency. However, the presence of hazardous materials from the end-of-lifecycle buildings are incompatible with the ambition for a circular construction and challenges its realization in practice. Pre-demolition audits therefore act as a crucial means to assure quality of the recovered materials. Over years, these inventories of hazardous waste have been archived on a national scale, but are left out from building stock registers. What’s their potential as input data for machine learning prediction? How can we leverage the past detection records to trace the patterns of hazardous materials in the existing building stock. The thesis tries to answer these questions by mining the archived inventory data and information from relevant building registers. In search for emergent data-driven approaches for in situ hazardous material identification, the research front of construction and demolition waste management was presented. A promising hazardous material dataset and a machine learning pipeline were created as the means for assessing the potential detection and exposure risk. Also, the complexity of applied AI in addressing the diversity of building data is highlighted. The applied research aims to open a discussion for the necessity of establishing a standardized data collection infrastructure and assessment procedure to facilitate a data-driven hazardous material management.

Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling (Bo Wan)

We introduce a new task, unsupervised vision-language (VL) grammar induction. Given an image-caption pair, the goal is to extract a shared hierarchical structure for both image and language simultaneously.  We argue that such structured output, grounded in both modalities, is a clear step towards the high-level understanding of multimodal information. Besides challenges existing in conventional visually grounded grammar induction tasks, VL grammar induction requires a model to capture contextual semantics and perform a fine-grained alignment. To address these challenges, we propose a novel method, CLIORA, which constructs a shared vision-language constituency tree structure with context-dependent semantics for all possible phrases in different levels of the tree. It computes a matching score between each constituent and image region, trained via contrastive learning.  It integrates two levels of fusion, namely at feature-level and at score-level, so as to allow fine-grained alignment. We introduce a new evaluation metric for VL grammar induction, CCRA, and show a 3.3% improvement over a strong baseline on Flickr30k Entities. We also evaluate our model via two derived tasks, i.e., language grammar induction and phrase grounding, and improve over the state-of-the-art for both.

On the Trade-off between Redundancy and Local Coherence in Summarization (Ronald Cardenas Acosta)

"Extractive summarization systems are known to produce poorly coherent and, if not accounted for, highly redundant text. In this work, we tackle the problem of summary redundancy in unsupervised extractive summarization of long, highly-redundant documents. For this, we leverage a psycholinguistic theory of human reading comprehension which directly models local coherence and redundancy. Implementing this theory, our system operates at the proposition level and exploits properties of human memory representations to rank similarly content units that are coherent and non-redundant, hence encouraging the extraction of less redundant final summaries. Because of the impact of the summary length on automatic measures, we control for it by formulating content selection as an optimization problem with soft constraints in the budget of information retrieved. Using summarization of scientific articles as a case study, extensive experiments demonstrate that the proposed systems extract consistently less redundant summaries across increasing levels of document redundancy, whilst maintaining comparable performance (in terms of relevancy and local coherence) against strong unsupervised baselines according to automated evaluations."

ERIC: Emotionally Reliable & Intelligent Chatbot (Swati Rajwal)

Chatbots are conversational agents that allow users to perform tasks via voice or text-based commands in natural language. In the last decade, chatbot development has gained a lot of industrial and academic attention. Many frameworks and toolkits have emerged, which accelerates chatbot development. This paper proposes a unique logical architectural framework for chatbot design named ERIC, Emotionally Reliable and Intelligent Chatbot. The architecture is easily extendible and supports multiple communication channels for user interactions. It uses Google's Dialogflow and Assistant, knowledge base, Python-based flask application, and an NLP model. The paper first proposes the overall chatbot framework. Then, a prototype implementation of the proposed logical architecture, viz. a chatbot for children, is described. The paper also discusses how others can develop their chatbots following the proposed architecture.

Robot Design Optimization for Human-Robot Collaborative Lifting Tasks  (Carlotta Sartore)

"Humanoid robots are foreseen to be soon part of our daily life, performing a large variety of tasks, often in collaboration with humans. For this reason, several control architectures have been developed to address ergonomic physical human-robot interaction. However, the robot hardware design is yet to be considered as an element that can be optimized with respect to the collaborative task. This work presents a framework allowing to consider hardware parameters as optimization variables in the problem of ergonomic collaborative lifting of generic payloads, generating hints for optimal humanoid robot hardware design, considering both hardware and ergonomic specific constraints. The proposed methodology is validated on the iCub humanoid robot considering the different scenario of payload lifting tasks."

Regulation of social media and the evolution of content: a cross-platform analysis (Marina Rizzi)

Is self-regulation of social media effective in moderating online content? Or does it shift regulated and abusive content to other corners of the internet? This paper addresses these questions analyzing the effect of social media regulation from a cross-platform perspective. I will exploit an episode of enlargement of Twitter’s regulation toward hate speech directed toward groups based on their race and ethnicity, and I will investigate whether regulation is effective in curbing this harmful content, or whether there are spillover effects between social media and the conversation shifts to unregulated platforms (like Parler, a platform that has promoted itself as a service that allows you to “speak freely, without fear of being deplatformed”). I will exploit methods from causal econometrics, machine learning and natural language processing to assess the effectiveness of regulation. Preliminary results show that hate speech directed toward particular races and ethnicities decrease, relatively to hate speech directed toward other groups, in Parler, after the implementation of the regulation in Twitter.

Robust architecture-agnostic and noise resilient training of photonic deep learning models (Manos Kirtas)

Neuromorphic photonic accelerators for Deep Learning (DL) have increasingly gained attention over the recent years due to their ability for ultra fast matrix-based calculations and low power consumption providing a great potential for DL implementations to deal with a wide range of different applications. At the same time, physical properties of the optical components hinder their application since they introduce a number of limitations, such as easily saturated activation functions as well as the existence of various noise sources. As a result, photonic DL models are especially challenging to be trained and deployed, compared with regular DL models, since  traditionally used methods do not take into account the aforementioned constraints. To overcome these limitations and motivated by the fact that the information lost in one layer cannot be easily recovered when gradient-descent based algorithms are employed, we propose a novel training method for photonic neuromorphic architectures that is capable of taking into account a wide range of limitations of the actual hardware, including noise sources and easily saturated activation mechanisms. Compared to existing works, the proposed method takes a more holistic view of the training process, focusing both on the initialization process, as well as on the actual weight updates. The effectiveness of the proposed method is demonstrated on a variety of different problems and photonic neural network (PNN) architectures, including a noisy photonic recurrent neural network evaluated on high-frequency time series forecasting and a deep photonic feed-forward setup consisting of a transmitter, noisy channel, and receiver, which is used as an intensity modulation/direct detection system (IM/DD).

Δ-Machine Learning Model for the Correction of DFT-Based NMR Chemical Shifts Toward CCSD(T) Quality (Julius Stückrath)

"Nuclear magnetic resonance (NMR) spectroscopy is one of the most important and widely used analytical methods for elucidating molecular structures in chemistry, which is accompanied by a great interest in prediction techniques for its key quantity: the NMR chemical shift. For medium-sized systems, density functional theory (DFT) is a very efficient choice for calculating these quantities, but is still subject to significant inaccuracies. Thus, the use of articifial neural networks to predict or correct NMR chemical shifts has become popular in recent years. 

In this study, we have developed a general machine learning-based correction (Δ-ML) model aimed at improving NMR chemical shifts calculated with DFT toward the quality of the highly accurate coupled cluster CCSD(T) method. A large an easily expandable data set has been assembled containing small diverse molecules for which reference NMR chemical shifts have been calculated at the CCSD(T) level of theory. Based on 7000+ data points for hydrogen and 4000+ for carbon atoms, the new workflow allows the construction of a flexible Δ-ML model for any desired DFT level of theory. In conclusion, the presented Δ-ML approach is capable of improving the prediction of NMR spectra in daily use."

Multi-fidelity modeling with neural networks (Paolo Conti)

Highly accurate numerical or physical experiments are often very time-consuming or expensive to obtain. When time or budget restrictions prohibit the generation of additional data, the amount of available samples may be too limited to provide satisfactory model results. Multi-fidelity methods deal with such problems by incorporating information from other sources, which are ideally well-correlated with the high-fidelity data, but can be obtained at a lower cost. By leveraging correlations between different data sets, multi-fidelity methods often yield superior generalization when compared to models based solely on a small amount of high-fidelity data. Here, we present multi-fidelity neural network architectures for the treatment of parametrized time-dependent nonlinear problems. We show the generality of the proposed models on different kinds of high- and low-fidelity data sources and on a various set of engineering problems.

Hyperbolic Busemann Learning with Ideal Prototypes (Mina Ghadimi Atigh)

Hyperbolic space has become a popular choice of manifold for representation learning of  various datatypes from tree-like structures and text to graphs. Building on the success of deep learning with prototypes in Euclidean and hyperspherical spaces, a few recent works have proposed hyperbolic prototypes for classification. Such approaches enable effective learning in low-dimensional output spaces and can exploit hierarchical relations amongst classes, but require privileged information about class labels to position the hyperbolic prototypes. In this work, we propose Hyperbolic Busemann Learning. The main idea behind our approach is to position prototypes on the ideal boundary of the Poincar\'{e} ball, which does not require prior label knowledge. To be able to compute proximities to ideal prototypes, we introduce the penalised Busemann loss. We provide theory supporting the use of ideal prototypes and the proposed loss by proving its equivalence to logistic regression in the one-dimensional case. Empirically, we show that our approach provides a natural interpretation of classification confidence, while outperforming recent hyperspherical and hyperbolic prototype approaches.

Unsupervised Semantic Segmentation with Self-supervised Object-centric Representations (Andrii Zadaianchuk)

"We show that recent advances in self-supervised feature learning enable unsupervised object discovery and semantic segmentation with a performance that matches the state of the field on supervised semantic segmentation 10 years ago. We propose a methodology based on unsupervised saliency masks and self-supervised feature clustering to kickstart object discovery followed by training a semantic segmentation network on pseudo-labels to bootstrap the system on images with multiple objects. We present results on PASCAL VOC that go far beyond the current state of the art (47.3 mIoU), and we report for the first time results on MS COCO for the whole set of 81 classes: our method discovers 34 categories with more than 20% IoU, while obtaining an average IoU of 19.5 for all 81 categories."

Fomenting a healthier exchange of ideas: a social media analysis of sheared articles using natural language processing (Federico Albanese)

"Social networks function as a space where users can share their thoughts and exchange opinions. Previous work shows that users seek to interact mainly with users with whom they share interests and opinions by forming closed communities called echo chambers. By becoming ideologically biased, these communities can lead to negative consequences such as increased polarization, political extremism, and hate speech. Therefore, it is of interest to find tools that promote the healthy exchange of ideas of different political leanings within the echo chambers.

In this context, we collected a dataset with more than 3,000 Reddit political discussions in Democratic and Republican forums, where more than 16,000 users discussed news from different Mass Media. Then, using natural language processing techniques, we analyze the toxicity and sentiment of comments when the news is from media with the same or opposite leaning as the members of the forum. We found that the biggest toxicity is present when an article is shared in a community with the same leaning (it exalts them) and not from opposing one. In addition, we implement a language model based on deep neural networks that allows us to see which articles of opposite leaning to a community are of interest to said community."

TS-Rep: Time series representation learning in robotics using self-supervised contrastive learning (Pratik Rajnikant Somaiya)

Learning representation of time series data coming from robot's sensors is a challenging task in robotics due to the high-dimensionality and non-stationary nature of the data. We propose a new self-supervised contrastive learning framework, TS-Rep, for time series representation learning in robotics. Unlike existing frameworks for time series, it applies contrastive loss on both output and intermediate layers' representations. TS-Rep consists of two parts: i) contrastive loss on temporal features extracted from a Bi-directional Long Short Term Memory (Bi-LSTM) network to learn disentangled feature representations and ii) cross-correlation loss on the intermediate layer representations to encourage the network to learn decorrelated features at the intermediate stages in the network. We validate our approach with clustering and anomaly detection on two robotics datasets from different settings, manipulation and navigation. Our learned representations show that TS-Rep groups inherently similar instances together and learn a clusterable representation. Our results indicate that TS-Rep is learning superior representation than the state-of-the-art time series representation learning frameworks. We further train a one-class classifier on output representation for anomaly detection task and achieve significant improvement over state-of-the-art anomaly detection frameworks.

Thursday, 15th September

Exploring Exploration Strategies in Reinforcement Learning (Kelsi Blauvelt)

A challenge of reinforcement learning (RL) is the balance of exploration and exploitation strategy during training. This work visualizes common RL exploration strategies and discusses sample efficiency alongside benefits of exploration.

Container Localisation and Mass Estimation with an RGB-D Camera (Tommaso Apicella)

In the research area of human-robot interactions, the automatic estimation of the mass of a container manipulated by a person leveraging only visual information is a challenging task. The main challenges consist of occlusions, different filling materials and lighting conditions. The mass of an object constitutes key information for the robot to correctly regulate the force required to grasp the container. We propose a single RGB-D camera-based method to locate a manipulated container and estimate its empty mass i.e., independently of the presence of the content. The method first automatically selects a number of candidate containers based on the distance with the fixed frontal view, then averages the mass predictions of a lightweight model to provide the final estimation. Results on the CORSMAL Containers Manipulation dataset show that the proposed method estimates empty container mass obtaining a score of 71.08% under different lighting or filling conditions. 

DS-TOD: Efficient Domain Specialization for Task Oriented Dialog (Chia-Chien Hung)

Recent work has shown that self-supervised dialog-specific pretraining on large conversational datasets yields substantial gains over traditional language modeling (LM) pretraining in downstream task-oriented dialog (TOD). These approaches, however, exploit general dialogic corpora (e.g., Reddit) and thus presumably fail to reliably embed domain-specific knowledge useful for concrete downstream TOD domains. In this work, we investigate the effects of domain specialization of pretrained language models (PLMs) for TOD. Within our DS-TOD framework, we first automatically extract salient domain-specific terms, and then use them to construct DomainCC and DomainReddit -- resources that we leverage for domain-specific pretraining, based on (i) masked language modeling (MLM) and (ii) response selection (RS) objectives, respectively. We further propose a resource-efficient and modular domain specialization by means of domain adapters -- additional parameter-light layers in which we encode the domain knowledge. Our experiments with prominent TOD tasks -- dialog state tracking (DST) and response retrieval (RR) -- encompassing five domains from the MultiWOZ benchmark demonstrate the effectiveness of DS-TOD. Moreover, we show that the light-weight adapter-based specialization (1) performs comparably to full fine-tuning in single domain setups and (2) is particularly suitable for multi-domain specialization, where besides advantageous computational footprint, it can offer better TOD performance.

Recognizing misogynous memes on the Web: biased models and tricky archetypes (Giulia Rizzi)

"Misogyny is a form of hate against women and has been spreading exponentially through the Web, especially on social media platforms. Hateful content towards women can be conveyed not only by text, but also using visual and/or audio sources or their combination, highlighting the necessity to deal with a multimodal problem. One of the predominant forms of multimodal content against women is represented by memes, which are images characterized by a pictorial content with an overlaying text introduced a-posteriori. Its main aim is originally to be funny and/or ironic, making misogyny recognition in memes even more challenging.

In this poster, both unimodal and multimodal approaches are investigated to determine which source contributes more on the detection of misogynous memes. Moreover, a bias estimation technique is proposed to identify specific elements that compose a meme that could lead to unfair models, together with a bias mitigation strategy based on Bayesian Optimization. Finally, a detailed error analysis is reported to highlight challenging archetypes of memes that open to new research issues."

Robust generative image privacy (Mariia Zameshina)

"There are two main categories of privacy methods: pixel-based methods (e.g., FAWKES) and generative methods (e.g., VQGAN, StyleGAN). We introduce modifications (inspired by pixel-based methods) to generative methods so that they make private versions closer to distant target images. We compare them using privacy metrics and using image quality metrics, and then investigate the robustness of these methods to unknown image recognition methods."

An Integrated Deep Learning Model for Identifying, Classifying, Counting Wildlife Animals and Behaviour Detection in Camera-trap Images (Frank Godlove Kilima)

Camera-trapping is a common source of wildlife data for different wildlife studies. Traditionally, collected images have been analyzed manually which is time consuming, expensive, prone to errors and bias, laborious, constrain wildlife studies and delays decision making. Recently, deep learning (DL) methods have provided effective solutions for automated analyses of such large datasets. Although their predictive performance on benchmark datasets such as MS COCO, CIFAR 10, MNIST etc exists and being used as a basis for choosing deep learning methods for wildlife studies, there is a need to learn and understand better their performance and behaviours on realistic camera-trap images in order to improve decision on their choice for wildlife studies. This is because there are many DL methods with different level of accuracy, processing speed, and demand for computer resources, and existence of significant differences between benchmark and camera-trap images. This study assessed and compared the predictive performance (and behaviour) of two object detectors namely Faster R-CNN and Single Shot Detector (SSD) integrated on four different ResNet convolutional neural network (CNN) frameworks namely ResNet50, ResNet101, ResNet152 and Inception ResNet V2 in relation to their performance on MS COCO benchmark dataset. These object detectors were trained on 11,019 camera-trap images for eleven wildlife animal species from Snapshot Serengeti and Serengeti Biodiversity Program datasets using transfer learning and TensorFlow 2 Object Detection API. Our results show that despite having small difference in predictive performance, two of the three least performing models in MS COCO evaluation, Faster R-CNN ResNet101 and Faster R-CNN ResNet152, have been the best performing methods in our study while three best performing models in MS COCO evaluation (Faster R-CNN) Inception ResNet, SSD ResNet101 (RetinaNet101), and SSD ResNet152 (RetinaNet152) are not the top three methods in our study. This result indicates that choice of DL models for wildlife studies using camera-trap images should not rely on models’ performance on benchmark datasets, but on their performance on camera-trap images. This is because camera-trap images are characteristically different (often messier) than benchmark images and therefore different DL models will perform differently. Our study results further show that larger training dataset and animal size for a particular animal class do not automatically lead to higher predictive performance than classes with less training data and body sizes. This indicates that there are other important factors to influence predictive performance of object detectors in addition to size of training data and animals. 

Smart Objects: Impact localization powered by TinyML (Stefanos Heikki Panagiotou)

Growing momentum in embedded systems and the wide use of sensors in everyday life, have motivated significantly, novel research in Internet of Things (IoT) systems and on-device Machine Learning (TinyML) processing. However, limitations in the energy stock and the computational capabilities of resource-scarce devices prevent the implementation of complex ML algorithms in IoT devices, which typically have limited computing power, small memory, and generate large amounts of data. This paper, aims to research and exploit the TinyML emerging technology for embedding intelligence in low-power devices, towards next generation IoT paradigm and smart sensing, in the context of SHM. In particular, the purpose is to provide integrated SHM functionality in plastic objects and thus make them “conscious” and self-explanatory (smart objects), by being able to localize any occurring impacts on the structure. We implement and benchmark Random Forest and Shallow Neural Network models on Arduino NANO 33 BLE, using an experimental dataset of piezoelectric sensor measurements concerning impact events in a thin plastic plate. The classification and model footprint results, 98.71% - 8KB and 95.35% - 12KB of accuracy and flash memory size for each model respectively, are very promising and constitute a solid baseline for motivating our concept.

Exploiting multimodality in clinical data for improved decision making (Sneha Jha)

Clinical data often exists in different forms across the lifetime of a patient - structured  data in the form of laboratory readings, unstructured or semi-structured narrative data, imaging data of various kinds, possibly audio data and other observational data. Decision making often requires synthesizing information and data from multiple sources. Since this data exists in scattered, noisy or inaccessible formats within the healthcare workflow, machine learning models often attempt to learn patterns from large but incomplete information.  Representation and modelling of such data to exploit the complementarity and redundancy of information across modalities can increase predictive power, reliability, and confidence over structured clinical data alone. 

Draw me a Flower: Processing and Grounding Abstraction in Natural Language (Avshalom Manevich)

Abstraction is a core tenant of human cognition and communication. When narrating natural language instructions, humans naturally evoke abstraction to convey complex procedures in an efficient and concise way. Yet, interpreting and grounding abstraction expressed in NL has not been systematically studied in NLP, with no accepted benchmarks specifically eliciting abstraction in NL. In this work, we set the foundation for a systematic study of processing and grounding abstraction in NLP. First, we deliver a novel abstraction elicitation method and present HEXAGONS, a 2D instruction-following game. Using HEXAGONS we collected over 4k naturally-occurring visually-grounded instructions rich with diverse types of abstractions. From these data, we derive an instruction-to-execution task and assess different types of neural models. Our results show that contemporary models are substantially inferior to human performance, and that models’ performance is inversely correlated with the level of abstraction, showing less satisfying performance on higher levels of abstraction. These findings are consistent across models and setups, confirming that abstraction is indeed a challenging phenomenon deserving further attention and study in NLP/AI research.

Robust Interpretable Text Classification against Spurious Correlations Using AND-rules with Negation (Rohan Kumar Yadav)

The state-of-the-art natural language processing models have raised the bar for excellent performance on a variety of tasks in recent years. However, concerns are rising over their primitive sensitivity to distribution biases that reside in the training and testing data. This issue hugely impacts the performance of the models when exposed to out-of-distribution and counterfactual data. The root cause seems to be that many machine learning models are prone to learn the shortcuts, modelling simple correlations rather than more fundamental and general relationships. As a result, such text classifiers tend to perform poorly when a human makes minor modifications to the data, which raises questions regarding their robustness. In this paper, we employ a rule-based architecture called Tsetlin Machine (TM) that learns both simple and complex correlations by ANDing features and their negations. As such, it generates explainable AND-rules using negated and non-negated reasoning. Here, we explore how non-negated reasoning can be more prone to distribution biases than negated reasoning. We further leverage this finding by adapting the TM architecture to mainly perform negated reasoning using the specificity parameter $s$. As a result, the AND-rules becomes robust to spurious correlations and can also correctly predict counterfactual data. Our empirical investigation of the model's robustness uses the specificity $s$ to control the degree of negated reasoning. Experiments on publicly available Counterfactually-Augmented Data demonstrate that the negated clauses are robust to spurious correlations and outperform Naive Bayes, SVM, and Bi-LSTM by up to 20%, and  ELMo by almost 6% on counterfactual test data.

Computational Analysis of Holocaust Biographical data (Isuri Nanomi Arachchige)

Memories talk to history, and people walk to the future with memories.More than anyone

in history, witnesses of the Holocaust have a good understanding of the influences of the war and the conflicts on the individual’s life. Survivors were encouraged to share their memorable life experiences, during the shoah in the form of testimonies. Testimonies of the Holocaust are co-constituted through distinctive archival approaches working in dialogue with the individual witness and are not simply captured as ‘raw’ accounts [1]. In Holocaust testimonies, survivors attempt to explain what happened in their life within that specific period of time, combining with their biographical data such as family members (persons) who were victims, locations where survivors were imprisoned, etc which associated with the historic event. As a result, they convert their scattered memories and experiences into language and then into a narrative by specifying specific events. This unstructured nature that exists in testimonial data makes it difficult to identify the biographical factors related to the Holocaust manually [2]. But when recognising biographical factors, it’s difficult to ‘cherry-pick’ all the information from single testimony and mention it capriciously without any methodological implications. Exponential growth of data leads to the conducting of research based on Machine Learning and it is defined as a study of computer algorithms that improve automatically through experience [3]. Natural Language processing (NLP) is a field of ML where machine uses to understand thehuman language [4]. Information extraction (IE) which is an NLP application, is the process of extracting structured information from unstructured text. In the process of IE, data can be represented in the form of entities, semantic relations (connections) or events, and entities that participate in events. Furthermore IE from the textual data can be achieved by the leveraging the ML and NLP techniques such as Name Entity Recognition [5] Holocaust, Testimonies, Natural Language Processing, Network Analysis. Biographical Data , Relation extraction [6],Coreference resolution [7], etc. Therefore, IE techniques can be utilised to extract biographical information from historic documentAlthough it is a challenge to distinguish relationships from unstructured texts, a model will be proposed to extract the biographical data from the Holocaust testimonies, to build a biographical network graph and to recognise the relationships between individual testimonies. In the process of building a biographical network graph, as the first step, Holocaust testimonial data were subjected to preprocessing steps such as removing unnecessary characters, HTML tags, etc, without affecting the granularity of the original textual data. Then the preprocessed data were passed into the information extraction pipeline, which was designed by combining different

NLP techniques such as name entity recognition, relation extraction and coreference resolution. Furthermore, models selected to perform the above NLP techniques were developed using machine learning algorithms. After extracting textual information with the IE pipeline, the importance of individual representations was measured by performing several network centrality mechanisms such as degree centrality, eigenvector centrality and closeness centrality. A limitation of our study is that the models were tested on preprocessed data of comparatively small size. In future research, we aim to utilise deep learning based computational models to improve the results and with more Holocaust testimonial data to identify the individual’s biographical factors bonded with the Holocaust.

When Bias Becomes Discrimination: Formalizing Fairness in Large Language Models (Tatiana Botskina)

This paper presents interdisciplinary research on the understanding of algorithmic language bias found in large language models. The goal of this paper is to provide actionable feedback about how to ensure that large language models generate unbiased outcomes. Explaining language biases through the lens of law can help to formalise the definition of bias, prevent misconception and voluntary interpretation of the harm arising out of or in connection with language models transposing structural societal injustice. 

A Deep Reinforcement Learning Approach to Supply Chain Inventory Management (Francesco Stranieri)

This paper leverages recent developments in reinforcement learning and deep learning to solve the supply chain inventory management (SCIM) problem, a complex sequential decision-making problem consisting of determining the optimal quantity of products to produce and ship to different warehouses over a given time horizon. A mathematical formulation of the stochastic two-echelon supply chain environment is given, which allows an arbitrary number of warehouses and product types to be managed. Additionally, an open-source library that interfaces with deep reinforcement learning (DRL) algorithms is developed and made publicly available for solving the SCIM problem. Performances achieved by state-of-the-art DRL algorithms are compared through a rich set of numerical experiments on synthetically generated data. The experimental plan is designed and performed, including different structures, topologies, demands, capacities, and costs of the supply chain. Results show that DRL performs consistently better than standard reorder policies, such as the static (s, Q)-policy. Thus, it can be considered a practical and effective option for solving real-world instances of the stochastic two-echelon SCIM problem.

SigMaNet: One Laplacian to Rule Them All (Stefano Fiorini)

This paper introduces SigMaNet, a generalized Graph Convolutional Network (GCN) capable of handling both undirected and directed graphs with weights not restricted in sign and magnitude. The cornerstone of SigMaNet is the introduction of a generalized Laplacian matrix: the Sign-Magnetic Laplacian (Lσ). The adoption of such a matrix allows us to bridge a gap in the current literature by extending the theory of spectral GCNs to directed graphs with both positive and negative weights. Lσ exhibits several desirable properties not enjoyed by the traditional Laplacian matrices on which several state-of-the-art architectures are based. In particular, Lσ is completely parameter-free, which is not the case of Laplacian operators such as the Magnetic Laplacian L(q), where the calibration of the parameter q is an essential yet problematic component of the operator. Lσ simplifies the approach, while also allowing for a natural interpretation of the signs of the edges in terms of their directions. The versatility of the proposed approach is amply demonstrated experimentally; the proposed network SigMaNet turns out to be competitive in all the tasks we considered, regardless of the graph structure