Shreya Venugopal
In binary classification, the standard threshold used to obtain deterministic predictions from stochastic ones is 0.5. We show that there is an additional unfairness introduced in this process. To reduce this bias we propose a method with the objective of maintaining fairness while making deterministic predictions. In this method, we have a threshold for each group which can be used as a post-processing step on the stochastic prediction to give a deterministic binary classification with the same level of fairness as the stochastic prediction. This method is based on an existing post-processing method and inherits its asymptotic properties. Our experimental results are competitive to standard baselines.
Andrea Lampis
Polygenic Risk Scores (PRSs) rely on GWAS summary statistics, which exclude non-linear SNP-SNP interactions. Machine Learning (ML) can model these effects, but requires large raw datasets, often inaccessible due to privacy concerns. We used a generative AI model (Latent Diffusion model) to create realistic, privacy-preserving synthetic data, overcoming data access barriers.
Trained on simulated and real UK Biobank data (e.g., CAD, BC), our synthetic data preserved SNP correlations and GWAS beta concordance (>0.75) while ensuring privacy. ML models trained on this synthetic data achieved predictive performance comparable to standard PRSs trained on real data.
This method enables secure data integration, facilitating powerful ML-based risk models that can surpass current PRS limitations.
Dejan Stancevic
We present a time scheduler that selects sampling points based on entropy rather than uniform time spacing, ensuring each point contributes an equal amount of information to the final generation. Our approach is model-agnostic, empirically grounded, and invariant to time reparametrizations.
Adam Pardyl
The real world is messy and unstructured. Uncovering critical information often requires active, goal-driven exploration. It remains to be seen whether Vision-Language Models (VLMs), which recently emerged as a popular zero-shot tool in many difficult tasks, can operate effectively in such conditions. In this paper, we answer this question by introducing FlySearch, a 3D, outdoor, photorealistic environment for searching and navigating to objects in complex scenes. Across three difficulty levels, we find that state-of-the-art VLMs struggle even with simple tasks. We identify a set of central causes, ranging from vision hallucination, through context misunderstanding, to task planning failures, and we show that some of them can be addressed by finetuning.
Christos Ziakas
We propose a test-time adaptation method that enables a progress estimation model to adapt online to the visual and temporal context of test trajectories by optimizing a learned self-supervised objective. To this end, we introduce a gradient-based meta-learning strategy to train the model on expert visual trajectories and their natural language task descriptions, such that test-time adaptation improves progress estimation relying on semantic content over temporal order. Our test-time adaptation method generalizes from a single training environment to diverse out-of-distribution tasks, environments, and embodiments, outperforming the state-of-the-art in-context learning approach using autoregressive vision-language models
Max Weltevrede
In the zero-shot policy transfer setting in reinforcement learning, the goal is to train an agent on a fixed set of training environments so that it can generalise to similar, but unseen, testing environments. Previous work has shown that policy distillation after training can produce a policy that outperforms the original in the testing environments. However, it is not yet entirely clear why that is, or what data should be used to distil the policy. In this paper, we prove a generalisation bound for policy distillation after training. The theory provides two practical insights: you should 1) train an ensemble of distilled policies, and 2) distil it on as much data from the training environments as possible.
Nikita Gushchin
Diffusion bridge models (DBMs) are a promising extension of diffusion models for applications in image-to-image translation. However, like many modern diffusion and flow models, DBMs suffer from the problem of slow inference. To address it, we propose a novel distillation technique based on the inverse bridge matching formulation and derive a tractable objective to solve it in practice. The proposed method can distill models in a one-step generator and use only the corrupted images for training. We evaluate our approach on a wide set of setups, and we show that our distillation technique allows us to accelerate the inference of DBMs from 4x to 100x and even provide better generation quality than the used teacher model, depending on the particular setup.
Ekaterina Sysoykova
Intelligent, adaptive AI can modernize healthcare workflows while meeting rising patient demands and strict privacy regulations. We present a federated few-shot learning (FFSL) approach for epileptic seizure detection, enabling personalized and privacy-preserving deployment. An established centrally trained model (bal. acc: 0.43; Cohen’s κ: 0.42; F1: 0.69) was adapted to a federated setting, then applied in FFSL across four simulated hospital sites with realistic patient and seizure-type distributions, achieving 0.77 bal. acc., 0.62 κ, and 0.73 F1. These results demonstrate FFSL’s promise as a foundation for practical, privacy-preserving AI adoption in real-world healthcare systems.
Julius Pesonen
Lightweight segmentation models can be taught to distinguish safety-critical targets, such as wildfire smoke clouds, from up to ten kilometres away from drones with on-board computation. However, without methods for geo-localising the target, the system is obsolete in areas with poor telecommunication networks. This makes on-board 3D target positioning essential. Perspective-based methods, such as particle filters, can be used to solve the target position but they require costly task-specific engineering. Learning-based alternatives could simplify the software but they suffer from a lack of data. Potential paths to achieve robust learning-based solutions are presented by mono depth estimators and end-to-end models with possible data from simulations augmented by generative video models.
Arina Belova
Modeling stochastic processes with fractional diffusion instead of purely Brownian-driven dynamics may better account for real-world memory effects, long-range dependencies, and anomalous diffusion phenomena that standard Brownian motion fails to capture. We incorporate fractional Brownian motion into aligned diffusion bridges for conformational changes in proteins, utilizing a Markov approximation of fractional Brownian motion to study the effect of this generalized prior reference process on predicting future states of the protein conformations from aligned data. We observe that our generalized dynamics yield a lower root mean-squared deviation of atomic positions in the predicted future state from the ground truth. Moreover, we show the preliminary results on the image deraining task.
Gloria Desideri
In recent years, lifestyle choices, healthy diet, limited smoking and alcohol, have been shown to lower cancer mortality. Yet digital interventions that foster such behaviors must be personalized to sustain engagement, as users differ in ability, learning, and fatigue. Progress is hampered by scarce longitudinal, individual-level datasets, limiting classical RL for personalization. We pose intervention recommendation as meta-RL, using MAML with actor–critic policies for rapid adaptation from minimal data. To address data scarcity, we build a synthetic user simulator to generate diverse profiles. Experiments on cross-task adaptation show our method learns from few interactions while matching the performance of an oracle trained per user.
Ivona Martinovic
The redundancy of the genetic code, where multiple codons encode the same amino acid, creates a vast design space for messenger RNA (mRNA). Synonymous codon choices affect stability, structure, translation, and immunogenicity, key for mRNA therapeutics. We present Prot2RNA, a diffusion language model that generates coding sequences conditioned on a target protein. Prot2RNA is pretrained with masked diffusion on separate human protein and mRNA datasets, then finetuned to generate codons from protein prompts. Trained on human data and tested on unseen high-expression transcripts, Prot2RNA outperforms existing methods in codon accuracy, biological realism, and reproducing wild-type usage patterns, implicitly learning preferences beyond simple codon frequency.
Anna Berdichevskaia
Enzymes are proteins that catalyze reactions via a small subset of amino acids, the catalytic site. Predicting these sites is crucial for drug discovery and assessing mutations but remains difficult due to protein complexity and the rarity of catalytic residues. We present ScanNet-Catalytic, a geometric deep learning model for detecting catalytic sites from protein structures. Using a large dataset of annotated structures and transfer learning, our model generalizes to unseen enzyme classes, identifying catalytic residues in ~60% of cases, versus ~13% and ~27% for baseline methods. While false positives remain high, some confident predictions may correspond to overlooked genuine catalytic sites.
Georgios Manos
The high GPU demand of ML training makes it hard to allocate large clusters of high-end GPUs in a single availability zone. Leveraging heterogeneous GPUs available within and across zones can improve throughput at a reasonable cost. Current systems lack support for efficiently training models on heterogeneous resources, as it introduces significant challenges and a large search space of possible job configurations. Sailor is a system that automates distributed training over heterogeneous, geo-distributed, and dynamically available resources, combining an efficient search space exploration algorithm, accurate runtime and memory footprint simulation, and a distributed training framework that supports different types of heterogeneity to optimize training throughput and cost.
Luis Massny
Correlation between tokens in training data sequences can cause inadvertent privacy leakage about information which a user does not want to share. As a solution, we consider local redaction mechanisms, which remove sensitive tokens from the sequence. Modeling the data correlation by a stationary Markov chain, we first show theoretically that pre-existing approaches have a suboptimal utility. Then, we find a novel family of data-dependent mechanisms, which, in contrast to data-independent pre-existing approaches, improve the utility by leveraging a data-dependent leakage measure.
Styrbjorn Kall
Chemical pollution poses significant threats to ecosystems, biodiversity, and human health. Environmental risk assessment currently relies on toxicity data from a handful of eukaryote species to set environmentally safe boundaries. This is resource-intensive to generate and critically fails to account for the full diversity of ecosystems and species. Here, we demonstrate that transformers, the back-bone of large language models, can accurately capture the intricate relationship between chemical structure, species, and toxicity. We conclude that transformers have the potential to markedly advance computational prediction of chemical toxicity and fill data gaps inhibiting effective risk management.
Svetlana Pavlova
Explicit Flow Matching (ExFM) is a novel method for training and analyzing flow-based generative models. ExFM leverages a theoretically grounded loss function, ExFM loss (a tractable form of Flow Matching (FM) loss), to demonstrably reduce variance during training, leading to faster convergence and more stable learning. Based on theoretical analysis of these formulas, we derived exact expressions for the vector field for model examples. In addition, we also investigated simple cases of diffusion generative models by adding a stochastic term and obtained an explicit form of the expression for score. While the paper emphasizes the theoretical underpinnings of ExFM, it also showcases its effectiveness through numerical experiments on various datasets.
Simone Piaggesi
Post-hoc explainability is crucial for understanding black-box ML models. Surrogate methods are common but limited: local surrogates capture non-linearities yet are costly and sensitive, while global ones are efficient but miss local complexity. We introduce ILLUME, a flexible, interpretable framework leveraging representation learning to unify local and global explanations. ILLUME integrates globally trained surrogates with instance-specific linear transformations via a meta-encoder, producing accurate, robust, and efficient feature attributions and decision rules. Extensive experiments show ILLUME overcomes traditional surrogate limitations, offering a scalable, unified solution for black-box explainability.
Stjepan Pozgaj
Neural algorithmic reasoning (NAR) is a growing field that aims to embed algorithmic logic into neural networks by imitating classical algorithms. In this work, we detail our attempt to build a neural algorithmic reasoner that can solve Knapsack, a pseudo-polynomial problem bridging classical algorithms and combinatorial optimisation, but omitted in standard NAR benchmarks. Our neural algorithmic reasoner is designed to closely follow the two-phase pipeline for the Knapsack problem, which involves first constructing the dynamic programming table and then reconstructing the solution from it.
Juraj Vladika
Fact verification (FV) is a task of assessing the veracity of a factual claim based on credible evidence. The traditional approach uses a three-part pipeline (document retrieval, evidence extraction, veracity prediction) and usually relies on short evidence snippets and encoder-only transformers. In this work, we introduce an automated FV system that utilizes the multi-turn power of LLMs. The system generates questions inquiring additional context and knowledge, and answers them with retrieved evidence until there is enough information to make a verdict. This iterative method makes the FV process more robust and explainable to end users since the reasoning chain can be traced. We apply the system to three medical datasets, showing improvements in final performance over usual approaches.
Dora Omanovic
Simulating the behavior of composite materials requires resolving fine-scale heterogeneities, leading to large systems of equations that are computationally infeasible. We investigate a neural network-based derivative-free loss method to estimate the effective homogeneous coefficient of a heterogeneous composite material. This approach inherently yields an averaged solution, corresponding to macroscopic material behavior. We first apply the framework to a prototypical composite problem, then use a physics-informed neural network to extract the homogenized coefficient. Comparing with standard numerical homogenization shows that our method accurately predicts effective material parameters in the periodic setting, highlighting its potential for data-driven composite analysis.
Adeela Islam
Reassembly is a key challenge in fields like archaeology, genomics, and molecular docking, requiring precise alignment of elements. We introduce ReassembleNet, addressing limitations of existing Deep Learning methods in scalability, multimodality, and real-world applicability. It represents pieces via contour keypoints and uses Graph Neural Network-based pooling to select the most informative ones, reducing complexity and integrating geometric and texture data. With semi-synthetic pretraining and diffusion-based pose estimation, ReassembleNet achieves 57% and 87% improvements in RMSE Rotation and Translation, respectively.
Harindu Jayarathne
Deep learning has shown great potential for the physical layer of wireless communication systems in achieving future goals of 6G and beyond wireless standards. Recently, data-driven techniques have shown the potential to improve the existing transceiver designs. In this research, an end-to-end learning method for joint optimization of the symbol probability distribution and the constellation geometry is investigated under AWGN and Rayleigh channel conditions. Among several existing approaches, a shaping encoder assisted iterative constellation shaping scheme is considered. End-to-end joint training of the transmitter and the iterative receiver is enabled by unfolding the receiver iterations. Resulting constellations are evaluated using bit error rate and channel capacity.
Sameeksha Sriram
Rotary Positional Embeddings (RoPE) have demonstrated exceptional performance consistently outperforming their baselines. While recent work has sought to extend RoPE to higher-dimensional inputs, many such extensions are non-commutative, thereby forfeiting RoPE’s shift-equivariance property. In this work, we explore a quaternion-based approach—Quaternion Rotary Embeddings (QuatRo)—in place of Euler angles, leveraging quaternions’ ability to represent 3D rotations to parameterize the axes of rotation. We show Mixed RoPE and Spherical RoPE to be special cases of QuatRo. Further, we propose a generalization of QuatRo to Clifford Algebraic Rotary Embeddings (CARE) using geometric algebra.
Luka Nedimovic
Geometric deep learning (GDL) provides a powerful framework for modeling complex relational and hierarchical structures, which is particularly relevant in biomedical image analysis. In this work, we propose and evaluate three architectures -- OsteoGNN, OSPNet, and OEHNet -- that leverage graph-based, manifold-based, and hyperbolic embedding representations to encode histopathological features. We systematically study the effect of hyperparameters, patch-level embeddings, and data imbalance strategies, including weighted losses and weighted sampling, on model performance. Our results demonstrate the effectiveness of incorporating geometric priors and manifold representations in improving classification accuracy and robustness.
Hannah Sterz
As they become increasingly multilingual, Large Language Models exhibit more language confusion, i.e., they tend to generate answers in a language different from the language of the prompt or the answer language explicitly requested by the user. We propose ReCoVeR, a novel lightweight approach for reducing language confusion based on language-specific steering vectors. We first isolate language vectors using a multi-parallel corpus and then effectively leverage those vectors for effective LLM steering via both fixed and trainable steering functions. Our extensive evaluation shows that ReCoVeR effectively mitigates language confusion in both monolingual and cross-lingual setups while at the same time---and in contrast to prior language steering methods---retaining task performance.
Edoardo Cecchinato
We conduct a systematic audit of three widely used reasoning benchmarks, SocialIQa, FauxPas-EAI, and ToMi, and uncover pervasive flaws in both benchmark items and evaluation methodology. Using five LLMs as diagnostic tools, we identify structural, semantic, and pragmatic issues in benchmark design, as well as scoring procedures. Through systematic human annotation and reevaluation on cleaned subsets, we find that model scores often improve not due to due to erratic surface wording variations and not to improved reasoning. Infact, further analyses show that model performance is highly sensitive to minor input variations such as context availability and phrasing, revealing that high scores may reflect alignment with format specific cues rather than consistent inference based on the input.
Luka Benić
We have developed an electronic free-energy machine learning model for monolayer NbSe2 that allows us to control the electronic temperature as a parameter of the model. The ionic temperature is modeled via the stochastic self-consistent harmonic approximation. Our approach relies on a machine learning model of the electronic density of states and zero-temperature interatomic potential. This allows us to explore the CDW phase diagram of monolayer NbSe2 both under thermal and laser-induced nonthermal conditions. Our study provides an accurate estimate of the CDW transition temperature at low cost and can disentangle the role of hot electrons and phonons in the nonthermal ultrafast melting process of the CDW phase in NbSe2.
Yuval Rom
DeepDynamics is a deep learning framework that integrates sparse single-cell atlases with large-scale bulk RNA-seq to infer cell-type-specific and clinicopathological dynamics across disease trajectories. Applied to 1,092 Alzheimer’s disease cortical samples, it uncovered accelerated pathology progression and glial/neuronal shifts in APOE4 carriers—especially in females—alongside coordinated molecular changes. DeepDynamics enables scalable mapping of genetic risk and cellular dynamics, offering a powerful tool to decode complex neurodegenerative processes from heterogeneous data.
Stefana Janicijevic
The task of generating grammatically correct and contextually meaningful sentences from lemma definitions is an essential challenge in NLP, especially for low-resource language. In this paper, we explore methods for generating sentences from lexical definitions of lemmas and extracting valuable linguistic knowledge from these definitions to enhance language models. We propose a framework that uses machine learning technique to transform the definitions of the lemmas into coherent sentences. We also introduce a strategy to extract structured knowledge from these definitions, focusing on syntactic and semantic characteristics. The proposed approach can generate meaningful sentences contributing to the development of NLP tools for under-resourced languages.
Piotr Wyrwinski
Symbolic regression (SR) seeks explicit, interpretable formulas that fit observed data, without assuming a predefined model structure. This makes SR a form of program synthesis, but with a highly discontinuous and combinatorial search space. To address these challenges, we propose a neurosymbolic algorithm that frames SR as an iterative, graph-based process. A graph neural network (GNN) guides the expansion of symbolic expressions, using learned embeddings that capture both syntactic structure and semantic behavior. These embeddings are enriched via attention over input–output data to enable task-aware reasoning. Experiments on synthetic benchmarks and the AI Feynman suite show that our method outperforms unguided baselines, highlighting the power of structure-aware, iterative synthesis.
Jelena Lazic
Emotional speech plays a key role in human-computer interaction and affective computing but remains challenging due to its variability and multimodality. Most approaches rely on multimodal data, which is often scarce for low-resource languages. This work explores emotional speech from an information-theoretic perspective, focusing on the link between word-level surprisal and spoken word duration. Using emotional audio recordings, surprisal values estimated by pre-trained language models were used to predict word timing. Results showed that surprisal correlates with duration across emotions and speakers, with variation in model performance. Adjusting surprisal power further modulated predictive strength, suggesting that emotional states shape how information is realized in speech.
Rajit Rajpal
We present a novel autoregressive diffusion world models in a two-stage generative process. In the first stage, a diffusion model synthesizes optical flows and depth maps. In the second stage, a separate diffusion model generates photorealistic frames conditioned on these intermediate representations, together with camera parameters obtained directly from the game engine. This design enables explicit incorporation of scene geometry and motion into the generative process, improving temporal consistency and spatial coherence. Our approach combines the structural advantages of physically informed conditioning with the expressive capacity of diffusion models, offering a scalable pathway toward high-fidelity, temporally stable world simulation.
Matteo Gallo
We introduce a machine learning model that predicts numerical equations while keeping both scalability and interpretability in focus. Unlike existing methods, either transparent but limited or scalable but opaque, our approach targets a structured subclass of equations, combining functions through explicit multiplication and summation steps. This allows us to recover symbolic expressions directly from the trained model. Tests on algebraic systems show the method scales well, maintaining high R^2 scores with only gradual loss increases, even without heavy tuning. For small systems, we recover the exact underlying equations; for larger ones, the structure remains clear. Next, we test the model on noisy and chaotic systems to probe its limits and extend the framework for ODEs.
Md Abu Sufian
Background
CardioFlowFormer is a Transformer-based model for detecting ageing-related cardiomyocyte dysfunction from in vitro microscopy, combining optical flow, morphology, and temporal metrics for early functional decline detection.
Methodology
One million novel iPSC-derived cardiomyocytes were imaged over eight days at high resolution. Cells were labelled Healthy, Aged, or Damaged. Motion signatures, motion entropy, and morphological/temporal metrics were processed through a multi-head self-attention Transformer with explainable AI outputs.
Results & Conclusion
CardioFlowFormer outperformed CNN-LSTM and ResNet, with motion features as the strongest predictors. Balance ageing class justified. It offers interpretable biomarkers and a scalable approach for preclinical ageing research.
Nazreen Shah Ambalath
Autonomous vehicles, IoT, and smart grids generate enormous amounts of edge data. Federated Learning enables privacy-preserving machine learning without transferring this edge data to a central server. Federated unsupervised learning is essential since labeling data at the edge is often impractical. We propose FLIP, a novel framework that uses information projection for probabilistic and federated representation learning. FLIP derives the global latent space distribution as the intersection of local distributions by applying the Pythagorean identity for I-projection. Empirically, FLIP outperforms existing methods in federated unsupervised learning, showing robustness to data heterogeneity, variations in local epochs, and scalability across clients.
Eliza Olariu
The main goal of the project is to use artificial intelligence (AI) techniques to enhance the battery industry's testing phase.
The battery is monitored by numerous sensors in the actual battery industrial environment. All these sensors gather a lot of data during battery testing, which may be analyzed with modern methods to create new AI based solutions. The project explains how to directly get data from sensors, synchronize it, and then load it into NI SystemLinkTM Enterprise so that analysis can start.
Since collecting real data under optimal conditions is challenging, an alternative is to simulate it. In this study, the behavior of the batteries is simulated using an equivalent circuit model. After data is generated, it is suitable to start applying AI models.
Fabian Leon
Early and accurate diagnosis is critical for oral cancer. While deep learning offers potential, current methods struggle with time-consuming manual annotation and limited context from patch-based analysis. This work proposes a deep learning method for automatically segmenting ten tissue types in oral cancer whole slide images (WSIs). To overcome annotation challenges and improve precision, we introduce a super-pixel-based approach and a refinement step. The method was evaluated on two independent WSIs (over 70,000 patches) from different patients, achieving an average accuracy of 92.1% and an IoU of 85.7%. Despite class imbalance and the limitations of local patch analysis, the model performed well, even on rare and morphologically varied tissues.
Rafael Josip Penic
Deep learning (DL) models for RNA structural and functional tasks often struggle since there are very few annotated data. For example, secondary structure datasets have less than 10K samples, there are only 6.5K determined tertiary structures, etc. However, many unlabeled RNAs sequenced over the years can be seen as a golden-mine for DL methods. The vast amount of unannotated RNA sequences potentially hide RNA “grammar” and unexplored knowledge. Motivated by protein language models (LMs), we developed an LM that extracts hidden knowledge and captures the underlying structural information embedded within the RNA sequences.
Laleh Varghaei
Despite growing concern about antibiotic resistance genes (ARGs), the factors driving the transmission and mobilization of ARGs between different bacterial species remain poorly understood. It is also unclear which environments are most favorable to ARG transfer and what triggers this process. To address these questions, we analyze large-scale genomic and metagenomic data using bioinformatics tools to track the mobilization of ARGs in pathogens and across various environments, such as the human gut and wastewater.
Subhosri Basu
The work extends the research presented in Robust DOA Estimation Using Deep Complex-Valued Convolutional Networks with Sparse Prior, which introduced complex-valued (CV) convolutional neural network (CNN) model for 1D direction of arrival (DOA) estimation. That work extended to perform 2D DOA estimation and target velocity estimation using a complex-valued residual network (CVResNet). The signal modeling used for generating training data has been improved by incorporating effects similar to real-world measurement, enhancing performance in challenging conditions like array imperfections and low signal-to-noise ratio (SNR). The models are tested on real-dataset.
Lorenzo Cirillo
Modern generative models produce high-quality fake images. Hence, deepfake detectors must be accurate and have strong generalization performance. We propose a framework that leverages explainability to assess the adversarial robustness of deepfake detectors. Specifically, we generate explainability heatmaps and perform explainability-driven attacks. We measure the accuracy drop and the attack success rate under the perturbed scenarios and test our methodology on state-of-the-art models with strong generalization abilities, providing a comprehensive and explainability-driven evaluation of their robustness.
Giuseppe Carrino
Minimizing loss functions is at the core of Machine Learning training. While first-order methods dominate in practice, higher-order approaches such as Newton’s method offer higher accuracy and faster convergence, but are often avoided due to their high computational cost. In our work, we analyze the finite-precision errors propagation in Newton’s step, proposing a convergence theorem for mixed-precision Newton optimizers, including “Quasi” and “Inexact” variants. The theorem not only guarantees convergence for not too ill-conditioned problems, but also allows predicting in advance the accuracy achievable by the optimizes. Experiments on logistic regression show that our method outperforms Adam on the Australian dataset, while being less computationally expensive than classical Newton.
Filip Mirkovic
Medical data is of vast importance to researchers all around, however sharing it among institutes and researches poses severe privacy concerns. Valid though they are, these privacy concerns make such medical data rare and hard to obtain. Here we propose a solution to both scarcity and privacy issues, based on synthetic data. We provide a working proof of concept by utilizing Copula, and Adversarial Random Forests models to generate clinical and peptidonimical data in patient several cohorts.
Eleonora Poeta
Artificial Intelligence (AI) is increasingly applied in critical domains, where adoption requires not only accuracy but also transparency, fairness, and robustness. This poster presents two complementary contributions. First, I show insights from our survey on Concept-based Explainable AI, which grounds explanations in human-understandable concepts and enables richer interpretability beyond feature attribution. Second, I showcase our work on Trustworthy Medical AI, focusing on fairness and robustness. Together, these efforts advance the development of AI systems that are explainable, fair, and trustworthy in practice.
Sofija Markovic
We evaluate the 2021 Global Health Security Index (GHSI) as a predictor of SARS-CoV-2 Omicron transmissibility using machine learning. Principal component analysis was applied to reduce correlated indicators, followed by random forest regression on the effective reproduction number. Alongside GHSI, we controlled for immunity, age, epidemic onset, mobility, comorbidities, and HDI. After removing covariates, low GHSI and high mobility emerged as strong drivers of transmission—forming a “perfect storm” exemplified by the rapid Omicron surge in Africa. These results showcase the power of ML to disentangle complex interactions between preparedness scores and epidemic dynamics.
Alon Hacohen
Many diseases are driven not by single cells but by the micro-environments in which they live, such as tumor–immune niches or autoimmune lesions. Spatial transcriptomics enable us to study these contexts by combining gene expression with cell locations, yet many clustering methods often fall short: scRNAseq ignores spatial adjacency, GNNs oversmooth, and type-only approaches lose local context. We explore SpaCT-Graph, a graph neural auto-encoder that introduces cell-type super-nodes, linking each cell to its spatial neighbors and annotated type. In a POC study on human skin tissue, SpaCT-Graph improves reconstruction accuracy and reveals clusters extending beyond cell-type annotations. While exploratory, this suggests explicit type-aware graph design may help uncover tissue micro-contexts.
Hajrudin Jupic
The paper presents an analysis of finding the optimal configuration of a radial distribution network using the Ant Colony Optimization (ACO) algorithm. The objective function is minimum power losses maintaining voltage profiles within permissible limits and preserving radial topology. The proposed algorithm adapts the ACO principles by simulating pheromone intensity on network branches explored by individual ant agents. The calculations and result analysis were performed on the IEEE 33-bus test system and a real 54-bus radial system at the 35 kV voltage level. MATPOWER library was used for power flow simulations. The obtained results were compared with other classical, metaheuristic and machine learning algorithms addressing the same problem.
Angela Cortese
Human Activity Recognition (HAR) has emerged as a key research area. While machine and deep learning techniques have significantly advanced HAR, most systems assume a closed set of activities, limiting real-world use where unseen actions may arise. Furthermore, common sensing modalities often raise concerns regarding privacy, stability, or usability. Head-mounted inertial sensors provide a promising alternative, offering stable signals and capturing both broad movements and subtle actions. We propose a lightweight framework, specifically designed to classify head-related activities while addressing the open-set recognition challenge. This approach represents a step toward robust, adaptive, and real-world deployable HAR solutions.
Ayush Paliwal
Holography enables representative measurements in challenging environments, from detecting water on extraterrestrial bodies to advancing high-resolution climate models and analyzing aerosol-mediated disease transmission. However, the vast scale and inherently low signal-to-noise ratio (SNR) of holographic data have rendered both AI-based and conventional processing methods insufficient - often too slow or unreliable for large-scale deployment. Consequently, millions of holograms remain unprocessed, limiting their scientific and practical potential. We present a real-time holography system developed from first principles - integrating theory, embedded AI and hardware - addressing all bottlenecks.
Rashin Gholijani Farahani
Humans express emotions through voice, face, and language, yet most AI systems miss this layer, limiting natural interaction. We present a lightweight multimodal framework combining speech (MFCC+GRU), facial expression recognition (CNN), and text sentiment analysis (DistilBERT). An attention-based fusion layer boosts accuracy to 88%, compared to 74–81% for single modalities. This shows that multimodal AI can create more empathetic and reliable agents, with applications in mental health, education, customer service, and elder care. Our work points toward AI that not only processes information but also understands human emotions for natural collaboration.
Andrey Goncharov
Quantifying uncertainty in large language models (LLMs) remains a key barrier to adoption across industries. To meaningfully attribute and estimate uncertainty, we need to examine the problem from multiple angles. The standard approach is to split uncertainty into aleatoric (nature of the process) and epistemic (lack of knowledge) components. While effective in some applications, this dichotomy does not provide a holistic understanding of the underlying process. We propose a procedure that decomposes uncertainty into lexical and conceptual components. In multilingual LLMs, we identify a single principal direction in representation space that maximizes cross-lingual variance and use it as a projection to attribute uncertainty as primarily lexical or conceptual. https://git.new/c5esWOq
Michele Calabro
The recent success of artificial intelligence has sparked growing interest in developing virtual cells—computational models capable of simulating the behavior of biological cells in silico. Yet, real cells do not operate in isolation: their behavior is shaped by dynamic temporal processes and constant interaction with the surrounding microenvironment. To explore this, we developed an agent-based agent model trained on serial spatial transcriptomics data from healthy and cancerous colon organoids. The model integrates internal cell states together with neighboring cells to simulate the temporal evolution of tissue morphology and gene expression. This framework holds promise as a generative model for synthetic data and as a virtual testbed for in vitro perturbation experiments.
Pablo Bernabeu Perez
As AI models are deployed with increasing autonomy, it is important to ensure they do not take harmful actions unnoticed. As a potential mitigation, we study AI monitoring protocols where a less powerful but trusted model oversees a more advanced but misaligned AI. We compare monitors that oversee reasoning steps (CoT) versus only final outputs, finding CoT access greatly improves detection of subtle sabotage but can be misleading. We propose a hybrid method scoring both reasoning and actions that achieves the best performance in all tested setups, with up to 4× higher detection rates than output-only monitoring for subtle subversion scenarios.
Abdulkader Ghandoura
Eye tracking offers a promising, noninvasive tool for early autism spectrum disorder (ASD) screening in children. Traditional methods often reduce gaze scanpaths to aggregate features, discarding diagnostic signals encoded in the sequence and timing of fixations and saccades. We present a scanpath modeling approach that preserves the temporal pattern and captures fine-grained variability in eye movements for efficient discriminative classification of ASD versus neurotypical development peers.
Davide Traini
Vision Transformers (ViTs) are powerful models for computer vision but their lack of transparency raises interpretability concerns. Attention maps are fast to compute but not causally faithful, while mask-based methods are reliable but computationally expensive. We introduce SIGRATE, a novel explainability framework that generates binary masks from similarity-based graphs built on ViT patch embeddings. Through guided walks, we create faithful explanations preserving both causal evidence and semantic consistency. Evaluations on ImageNet demonstrate significant improvements in faithfulness and localization metrics compared to existing methods.
Claudio Savelli
Neural models often encode multiple features in overlapping representations, a phenomenon known as polysemanticity. While this improves capacity, it also causes interference across tasks, domains, or languages. Machine unlearning, beyond its role in privacy and compliance, offers the possibility of reducing such interference and reallocating capacity, enabling models to become more specialized without requiring retraining from scratch. This perspective could be interesting to explore for LLMs, where overlapping representations are a common phenomenon. As a first step, we provide preliminary evidence using a multilingual RoBERTa model: selectively unlearning other languages improves performance in the retained language (Italian) while reducing it in the languages that are forgotten.
Hrvoje Belani
Health systems in Europe today are significantly challenged by a growing healthcare workforce shortage, long waiting times for patients and an intensifying demand for services driven by an ageing population. As one of the fastest growing industries in the global healthcare market, digital healthcare serves as an enabler to overcome some of these challenges, along with the use of artificial intelligence (AI) as one of the key tools for smart health solutions development. AI4Health.Cro is a non-profit public-private consortium of multidisciplinary experts, gathered in 2021 as the European Digital Innovation Hub (EDIH) in the field of AI, healthcare and startups, that provides services to small and medium-sized enterprises, startups, and innovators developing AI for healthcare and medicine.
Filip Szatkowski
Continual learning is crucial for applying machine learning in challenging, dynamic, and often resource-constrained environments. In this work, we explore the fact that the intermediate representations in neural network layers are less prone to forgetting and leverage their potential to accelerate computation. We propose to use auxiliary classifiers (ACs) to enhance performance and demonstrate that integrating ACs into various continual learning methods consistently improves accuracy across diverse evaluation settings, yielding an average 10% relative gain. We also leverage the ACs to reduce the average cost of the inference by 10-60% without compromising accuracy. The work was presented as a poster at ICML 2025.
Mandana Samiei
Humans are remarkably good at learning new tasks by building on what they already know. This ability relies on schemas, structured mental representations that help us to understand the most likely relationship between objects and sequence of events in an environment. In this work, we investigate whether we can learn schemas using RL.
Shreya Kapoor
A hallmark of human vision is to recognize objects in complex naturalistic scenes. However, the exact mechanism behind the representations of a three-dimensional scene remains obscure. This study proposes a tool to investigate human perception by using a computer graphics approach. We use three-dimensional object meshes to render synthetic scenes and try to study how these scenes will be represented in the brain. We render a collection of datasets with different appearance and pose variations by changing exactly one property at a time. A model is trained on each of these datasets for a classification task and is then evaluated using alignment metrics; deviations in metrics such as Centered Kernel Alignment (CKA) and Representational Similarity Analysis (RSA) indicate the importance of a particular brain region in representing a particular property. In conclusion, we propose a promising method to study the brain using computer graphics to provide valuable insights into human vision.
Angela Lopez Cardona
Advancements in Natural Language Processing (NLP) have led to the emergence of Large Language Models (LLMs) such as GPT, Llama, and Gemini, which excel across tasks but require fine-tuning to align outputs with human expectations. A widely used method for this is Reinforcement Learning from Human Feedback (RLHF), which, despite its success, struggles to accurately model human preferences. We introduce GazeReward, a novel framework that integrates implicit feedback—specifically eye-tracking (ET) data—into the Reward Model (RM). Through ablation studies with different integration methods, LLMs, and ET generators, we show that our approach significantly improves RM accuracy. This work highlights the promise of cognitive data for aligning AI with human values.
Costanza Catalano
Banking supervision is a pillar of economic stability, where analyzing property structures is essential for regulatory compliance. However, the completion of such tasks poses significant challenges due to high volume data sources and the complexity of regulatory formalization, requiring the adoption of powerful scalable automated procedures. We present our experience in Central Bank of Italy, solving these issues via a rule-based AI approach. We provide a formalization of the problems and classify their computational complexity. We then present a rule-based framework built on the Datalog+/- family of languages and Knowledge Graphs, where each problem is expressed by deductive rules, allowing for efficient and parallel computation, while maintaining high explainability.
Mustafa Kadhim
Precise patient positioning and daily anatomical verification in radiotherapy ensure accurate dose delivery and spare healthy tissue. However, current image-guided methods trade volumetric detail for speed and low dose. Reconstructing volumetric images from ultra-sparse X-ray projections can reduce exposure and enable real-time verification. We propose a deep learning framework that generates synthetic cone-beam CT volumes in real time for prostate cancer using two orthogonal projections and a previous CT. The model learns 2D-to-3D mapping, generalizes across patients without retraining, and works with existing hardware. Our approach delivers high-fidelity 3D reconstructions with lower imaging dose and faster throughput for safer, streamlined radiotherapy.
Sebastian Ibarra
Problem: Gadolinium-based DCE-MRI contrast agents have various concerns, including health risks, bioaccumulation, invasiveness, economical and workflow limitations.
Solution: Generative models creating synthetic DCE-MRI is a faster, gadolinium-free, cost-effective, motion artifact-free, and non-invasive DCE-MRI alternative.
Objective: To develop and benchmark conditional diffusion models for synthesizing post-contrast breast MRI from pre-contrast images, focusing on:
1. Diffusion-based Post-Contrast Synthesis
2. Tumor-Aware Conditioning and Evaluation
Ofek Aloni
Synthetic data is valuable for augmentation in AI and data-driven applications. This is especially true for data-hungry models such as foundation models, or in fields where obtaining large numbers of samples is costly, as in environmental science.
We present a method, based on discrete Fourier transform (DFT), for generating synthetic time series based on a single original sample. The method provably preserves the input signal’s mean, variance, and autocorrelation function. A unique feature is a tunable parameter, controlling the level of similarity between the original and synthetic signals.
The method is evaluated on a variety of environmental signals from different domains, and compared to existing methods for time series generation.
Teodora Sreckovic
Adam is known to perform significantly better than SGD in language models, a phenomenon for which a number of explanations have been proposed. In this work, we revisit this "optimizer gap" through a series of comprehensively tuned baseline training runs for language modeling with Transformers. We exhaustively study how momentum, gradient clipping, and batch size affect the gap between SGD and Adam. Our empirical findings show that SGD with momentum can actually perform similarly to Adam in small-batch settings, if tuned correctly. We revisit existing explanations for Adam's advantage, including heavy-tailed class imbalance, directional sharpness, and Hessian heterogeneity, which struggle to directly explain this phenomenon, bringing new insights about the gap.
Uxia Veleiro
Drug–target interaction (DTI) prediction is a challenging albeit essential task in drug repurposing. Learning on graph models has drawn special attention as they can substantially reduce drug repurposing costs and time commitment. However, structural differences in learning architecture hinder fair benchmarking. Here, we perform an in-depth evaluation of current DTI datasets and prediction models through robust benchmarking and show that transductive models lack generalization and lead to inflated performance when traditionally evaluated, making them unsuitable for drug repurposing. We propose a biologically driven strategy for negative-edge subsampling and uncovered unknown interactions via in vitro validation. Finally, we provide a toolbox crucial for fair benchmarking.
Bianca Madalina Zgreaban
Recently, generalization benchmarks have shown that language models lack robustness in natural language inference (NLI). However, manually creating new benchmarks is costly, while creating high-quality ones automatically is difficult, even by modifying existing ones. We propose MERGE (Minimal Expression-Replacements GEneralization), a methodology for evaluating models on high-quality variants of original NLI problems that preserve the original underlying reasoning, obtained by automatically replacing open-class words. Our results show NLI models' lack of generalization through their performance being significantly lowered on the generated variants. We also present analyses of how the word class of the replacements, word probability, and plausibility influence NLI models' performance.
Yudou Tian
State-space models (SSMs) are emerging as effective sequence models with in-context learning ability similar to transformers, yet a complete understanding of how this arises is missing. We present a direct construction showing that SSMs can perform gradient-based learning for in-context tasks. Specifically, a single structured SSM layer with multiplicative input and output gating reproduces the output of an implicit linear model with least-squares loss after one gradient step. Stacking layers yields multi-step GD, while a small MLP extends to nonlinear regression and classification. Experiments on linear and nonlinear regression confirm that trained parameters match analytic predictions.
Claudio Schiavella
The attention module has a quadratic cost with respect to the processed tokens. In dense tasks like Monocular Depth Estimation, this poses challenges, especially in onboard applications. We use efficient attention to balance model quality and speed. Optimizations target each network module. The Pareto Frontier is used to find the best trade-off between modified models and baselines. Optimized networks often outperform baselines and improve inference speed
Federica Parlapiano
Natural Language Processing has been revolutionized by Large Language Models (LLMs). However, their adoption in education carries inherent risks, such as students who naively employ LLMs for their homework. Furthermore, the over-reliance on AI-generated content fosters superficial learning. Conventional countermeasures, such as oral exams and intensive proctoring, either overburden instructors or fail against the usage of LLMs. To overcome these challenges, we propose GradQuiz, a novel framework for generating distractors in multiple-choice quizzes. GradQuiz exploits the target model’s gradients to perturb key entities in each question, generating misleading answers that challenge the model while ensuring assessment rigor, and encouraging students’ critical engagement with the resources.
Daniel Ohayon
Large language models (LLMs) have transformed applications, yet their use in cybersecurity—where tasks are domain-specific and demanding specialized expertise—remains limited. We present CyberPal 2.0, a family of cybersecurity-expert small language models (SLMs) with an accompanying training recipe. We build on a large, high-quality cybersecurity dataset, enriching it with structured, expert-driven reasoning traces and grounded web search to adapt inference to task complexity. Evaluated across diverse cybersecurity tasks, CyberPal 2.0 surpasses its baselines and is competitive with or better than closed-source state-of-the-art general models, despite being a fraction of their size. On threat-investigation tasks, our best SLM exceeds GPT-4o, o1, and o3-mini, ranking second to Sec-Gemini V1.
Ida Maruotto
Biosignals such as EEG, EMG, and Center of Pressure can capture complementary aspects of motor and postural control and are increasingly explored for early Parkinson’s detection. We propose a Deep Learning framework that adapts established architectures into modality-specific encoders, preserving temporal dynamics instead of collapsing them into static vectors. Sequence-level representations are then fused, enhancing robustness over unimodal models. The approach was validated on the BioVRSea protocol, comprising 300 subjects, using subject-wise nested cross-validation. Multimodal fusion after data augmentation achieved a median balanced accuracy of 81.76%, highlighting the benefits of deep learning methods in neurodegenerative disease detection.
Abdel Rahman Jaber
Reliable knee alignment matters for diagnosis and planning. We study inter-patient CT registration of the lower limbs with a focus on the patella, comparing three learning-based models (VoxelMorph, VoxelMorph–DeepAli, TransMorph) and a classical B-spline baseline. We also test two input strategies: low-resolution full-body (FB-LR) versus a cropped, high-resolution patella region (CR-HR). On ~150 de-identified CTs, we evaluate image error, overlap, surface distance, and deformation regularity. CR-HR clearly outperforms FB-LR (e.g., ASSD 2.33 vs 12.04). On the patella ROI, TransMorph gives the lowest image error and boundary distance, while VoxelMorph–DeepAli best preserves regularity.
Bisera Nikoloska
Functional MRI (fMRI) provides rich insights into brain activity but generates complex, high-dimensional data that is difficult to interpret. To address this issue, brain networks can be built from fMRI scans and analyzed with network representation learning, transforming complex networks into low-dimensional embeddings. These embeddings are then used as features for machine learning classifiers, including Random Forests, Logistic Regression, and SVMs. Among the approaches tested, node2vec combined with Random Forest achieved the highest accuracy at 90%, enabling effective differentiation between patients with neurological disorders and typically developing individuals.
Branimir Kolarek
Creating an AI to propose fresco restorations with 90% certainty is not a single task, but a chain of fundamental challenges. First, the AI needs more than just images; it requires a deep understanding of art history—connecting the artist, their other works, and the era's style. Next, the AI must act like a detective, searching this web of knowledge to find the most relevant historical clues for the specific damaged area. Then, the challenge is to ensure the AI uses only these clues to fill in the gaps, preventing it from inventing historically inaccurate details. Finally, since the perfect original is lost, we face the ultimate problem: how to prove the AI's suggestion is a historically sound proposal and not just a convincing fake.
Arush Sachdeva
NLG systems, including LLMs, often encounter challenges such as semantic ambiguity and hallucination. This research investigates the connection between semantic uncertainty and hallucinations in NLG, demonstrating that higher semantic uncertainty corresponds to higher hallucination rates. By employing semantic entropy as a novel metric for uncertainty estimation, we refine existing methods and establish meaningful insights into model behavior. Using question-answering datasets (SQuAD, CoQA) and robust models (RoBERTa, Muwa-1.3B), we integrate state-of-the-art tools like bi-directional entailment clustering and SelfcheckBERT to assess correctness. Our findings emphasize semantic entropy's utility in predicting hallucinations and evaluating model reliability, particularly in larger models.
Mariona Coll Ardanuy
Cultural heritage institutions have made great efforts to digitise their handwritten documentary collections, which hold a unique source of untapped knowledge about our past. AI has shown immense potential to transform the way documentary collections can be accessed, by enabling (1) automatic transcription via handwritten text recognition (HTR) and (2) LLM-based semantic analysis at scale. However, several challenges remain. I will present work centred on the automatic transcription of Catalan manuscripts from the Late Middle Ages, focusing on notarial documents from the 13th to 15th centuries. Our long-term goal is to contribute new methodological advances to the application of artificial intelligence and data science techniques to interrogate large amounts of historical data at scale.
Claudia Rabaioli
This poster is about my ongoing Phd project proposal. The project explores how computational models can support positive mood through multisensory stimuli. Starting from the distinction between emotion and mood, the aim is to design a system capable of detecting and supporting users’ mood states in everyday life. Different challenges will be invoved: Multimodal mood profiling models; Longitudinal data analysis; Learning for dataset production; User-centric adaptive model.
Federico Alvetreti
This paper introduces Attention-based Double Compression (ADC), a communication-efficient Split Learning (SL) framework for Vision Transformers. ADC applies two parallel compression strategies: (i) merging similar sample activations using average attention scores from the client’s last layer—class-agnostic to preserve generalization—and (ii) discarding less informative tokens to further cut transmission cost. Experiments show ADC substantially reduces communication overhead while maintaining high accuracy, outperforming state-of-the-art SL methods.
Elsun Nabatov
A hybrid framework has been developed for reconstructing 3D brain volumes from sparse 2D radiographs. A 3D GAN generates a coarse prior, which is subsequently refined by a diffusion model with physics-guided projection-consistency. Interpretability is enhanced through a Vision–Language Model that provides captions and guidance. The method achieves SSIM 0.93, Dice 0.90, and PSNR 35.9, operating three times faster than diffusion-only approaches, with uncertainty maps further improving trust.
Sukanya Singh
We address parameter estimation in stochastic differential equations (SDEs) with Lévy noise, focusing on the Ornstein–Uhlenbeck process, genetic toggle switch, and Duffing oscillator. While traditional methods like MLE and PENN are effective, their accuracy is limited for nonlinear systems and small datasets. We propose a hybrid deep learning framework that combines BiLSTMs for temporal encoding with transformers for long-range dependency modeling. Scaling training data to 50k samples with 10k tests, our model achieves consistently lower MAE across all parameters compared to PENN and MLE.
Elena Masserini
Task-based chatbots are increasingly being used to deliver services, yet testing methodologies to assess their reliability, robustness, and security remain underexplored. To support systematic assessment, we present two contributions. First, we built TOFU-R, a large-scale snapshot of Rasa chatbots from GitHub, and BRASATO, a curated selection of complex and useful subjects, providing interesting benchmarks for chatbot assessment. Second, we developed MutaBot, a mutation approach for Rasa and Dialogflow, that generates faulty chatbot variants by injecting conversational defects, to challenge and evaluate testing techniques. Together, these resources pave the way for advancing research on chatbot testing and quality assessment.
Tomas Garriga
We present a novel application of time series counterfactual estimation for the pharmaceutical industry problem of knowing the impact of generic drug entry to the market. We introduce a novel method for time series counterfactual estimation based on SCMs and autoencoder architectures. We show that sparse regularization is an effective mechanism for autoencoder based counterfactual estimation, and prove that it can outperform the commonly used VAE. We validate our model using synthetic and semi-synthetic data, and company’s real-world sales data.
Josipa Lipovac
Metagenomic classification aims to identify organisms in environmental samples by analyzing DNA sequences. At the lowest taxonomic level - strain level, this is challenging due to high genome similarity and the lack of clear thresholds to define distinct genomes. Traditional metrics like Average Nucleotide Identity (ANI) often fail in such cases. We frame genome clustering as a representation learning problem based on mapping profiles and explore whether aggregated embeddings learned from genomic foundation models, such as EVO 2, can better capture strain-level taxonomic relationships than traditional methods or binary read mapping profiles, highlighting their potential for fine-grained genome comparison in metagenomics.
Riccardo Gibello
Hierarchical Text Classification (HTC) and its multi-label variant (MLHTC) assign one or more labels from a taxonomy to input texts, with applications in news, biomedical indexing, and medical device coding. We present ShT5-HierGen, an adaptation of T5 for hierarchical text generation. It replaces the BPE tokenizer with a character-level one and uses a shallow decoder to match the simpler label sequences. This design reduces parameters by 40% and training time by 64%, while lowering performance by only 5.5%. Unlike vanilla T5, ShT5-HierGen keeps the Inconsistency Rate (IR) near zero, ensuring stable label generation. It also improves interpretability through per-level attention and supports interactive online correction, making it a promising direction for further study.
Benedetta Donato
This PhD research rethinks software development by focusing on effective collaboration between developers and AI assistants. Current tools often provide passive, prompt-based assistance, lacking adaptability, authorship tracking, and workflow integration. Misunderstood suggestions may cause inefficiency or errors, highlighting the need for transparent and reliable collaboration mechanisms. The study addresses three challenges: (1) understanding developer–AI interaction, (2) rethinking development practices by designing adaptive workflows, and (3) enabling multi-AI/user collaboration with coordination models. To explore these, we are developing MultiMind, a VS Code plug-in that orchestrates AI-assisted tasks and serves as a research platform for studying hybrid human–AI teamwork
Simone Arreghini
How and when do people decide to interact with a robot? Can proactive robots improve attitudes toward autonomous agents? We predict intention from body posture and facial cues, deploying our system in the wild. By adapting its behavior, the robot increased interactions by 22% compared to staying idle, showing that reading non-verbal signals makes robots feel more natural, acceptable, and engaging in daily life.
Marcos Alfaro Perez
Vision sensors are an effective solution to address place recognition due to their versatility and low cost, but images are sensitive to changes in environmental conditions. Multi-modal approaches can overcome this limitation, but the integration of different sensors often increases costs and complexity. This paper proposes combining omnidirectional views and depth maps, generated with Depth Anything v2, through a late fusion approach that weights each modality according to the confidence in the prediction of deep models. The experiments conducted across different indoor environments and lighting conditions reveal that the proposed method consistently improves the performance in every tested scenario, presenting itself as a precise, cost-effective solution for place recognition.
Anne Hoekman
Typing clinical notes is time-consuming and reduces interaction with patients. We conducted a phased study of three ambient listening providers, where conversations between patients and physicians were converted from speech-to-text into AI-generated summaries. Phase 1 addressed technical implementation. Phase 2 tested simulated consultations across ten scenarios, including interruptions, evasive answers and dialects. Summaries were rated by five physicians and three medical students for completeness, correctness, and conciseness (1–5 scale), averaging 4.03, with completeness as the main challenge. Phase 3 is underway, testing these systems with real patients across ten departments. This evaluation shows both the promise and current limits of generative AI in reducing documentation workload
Rohan Walia
Imitation learning is central to robot skill acquisition, yet methods like kinesthetic teaching or teleoperation are often impractical in constrained spaces due to hardware needs and workflow disruptions. We introduce a lightweight framework for scalable, passive, robot-free data collection using only a consumer XR headset and a stationary workplace camera. Hand tracking, AR robot overlays, and depth sensing provide feedback on safety and reachability, ensuring collision-aware, feasible demonstrations. A unified pipeline integrates human and virtual robot data, enabling generalizable policies across embodiments and environments while lowering barriers to large-scale, in-the-wild robot learning.
Entropy-Lens: the Information Signature of Transformer Computations
Riccardo Ali
Transformer models map input token sequences to output token distributions, layer by layer. While most interpretability work focuses on internal latent representations, we study the evolution of these token-level distributions directly in vocabulary space. However, such distributions are high-dimensional and defined on an unordered support, making common descriptors like moments or cumulants ill-suited. We address this by computing the Shannon entropy of each intermediate predicted distribution, yielding one interpretable scalar per layer. The resulting sequence—the \textit{entropy profile}—serves as a compact, information-theoretic signature of the model’s computation.
We introduce \entropylens, a model-agnostic framework that extracts entropy profiles from frozen, off-the-shelf transformers. We show that these profiles (i) reveal family-specific computation patterns invariant under depth rescaling, (ii) are predictive of prompt type and task format, and (iii) correlate with output correctness. We further show that Rényi entropies yield similar results within a broad range of $\alpha$ values, justifying the use of Shannon entropy as a stable and principled summary. Our results hold across different transformers, without requiring gradients, fine-tuning, or access to model internals.
Alexandre Galashov
Diffusion models generate high-quality synthetic data. They operate by defining a continuous-time forward process which gradually adds Gaussian noise to data until fully corrupted. The corresponding reverse process progressively "denoises" a Gaussian sample into a sample from the data distribution. However, generating high-quality outputs requires many discretization steps to obtain a faithful approximation of the reverse process. This is expensive and has motivated the development of many acceleration methods. We propose to accomplish sample generation by learning the posterior distribution of clean data samples given their noisy versions, instead of only the mean of this distribution.
Deep Generative Approaches to Network Science for Social System Simulations
Mariia Sinkevich
We propose a scalable framework for synthetic social network generation combining Score-Based Diffusion and Iterative Local Expansion. Our approach produces graphs that preserve global structure and local dependencies, while enabling network fusion of sociocentric and egocentric data to integrate demographic and behavioural features. This allows large-scale, privacy-conscious simulations of social systems, supporting applications in epidemiology, policy modelling, and AI-driven decision-making.
Artificial Market Expectations: Asset Pricing in the Age of AI
Matteo Muntoni
AI tools, and in particular Large Language Models, are increasingly applied in finance to guide portfolio strategies and generate forecasts, often outperforming traditional methods. Yet the general equilibrium and welfare effects of widespread adoption remain understudied. We adapt the experimental design of Hommes et al. (2008), replacing human forecasters with LLM traders in simulated asset markets. Whereas humans tend to generate bubbles, LLM traders in stable market conditions often coordinate more closely on fundamentals, pointing to a stabilizing potential. After bullish market shocks, however, stabilization frequently fails as some models amplify deviations from fundamental values. Early results suggest that prompting design and lower temperature settings may foster faster market stabilization. Ongoing work evaluates the relative “animal spirits” of LLMs compared with humans and seeks to infer learning rules that enable structural counterfactuals. This study represents a first step toward a new research agenda in economics that combines structural modelling, causal econometrics, and experimental lab settings to build welfare benchmarks and analyze the general equilibrium effects of large-scale AI adoption in economic environments.