IJCAI-ECAI 2026 · Half-day Tutorial

Data Centric AI: Addressing Missing Data Imputation in Image and Tabular Contexts

A practical, hands-on tutorial on the theory and methods for handling missing data in tabular and imaging settings, from statistical baselines to autoencoders and generative adversarial networks.

Format Half-Day · Two 1h45 Slots
Language English · Python
Venue IJCAI-ECAI 2026 · Bremen

View Tutorial Outline IJCAI-ECAI 2026 ↗

Overview

Missing data is a common problem in both tabular and imaging datasets. One strategy is to remove records containing missing values, but this is often impractical, especially with small datasets or when missingness is not purely random. Imputation, as in replacing missing values with estimated ones, is the more common alternative, though different methods suit different scenarios. This tutorial presents the advantages and disadvantages of these strategies and helps participants identify when to use each one.

The tutorial is organized into two parts. The first addresses missing data in tabular datasets: a lecture covering missingness mechanisms (MCAR, MAR, MNAR) and imputation methods, from statistical baselines to autoencoders and generative adversarial networks (GANs), followed by a hands-on session on a real-world tabular problem. The second part covers missing data in imaging datasets, including convolutional autoencoders and GANs for image reconstruction, with a practical session using healthcare imaging data. All exercises are provided in Python, with programming requirements kept accessible.

Theory

Missingness mechanisms (MCAR, MAR, MNAR), evaluation strategies, and the trade-offs of removal vs. imputation.

Methods

From statistical baselines through autoencoders to GANs, both for tabular records and image reconstruction.

Practice

Two hands-on Python sessions on real-world tabular and healthcare imaging data, comparing methods end-to-end.

Tutorial Outline

Two parts, each combining a theoretical lecture with a hands-on session.

Part 1

Missing Data in Tabular Datasets

Theoretical Foundations and Tools for Tabular Missing Data

Concepts and challenges of missing data in tabular settings, including patterns and mechanisms of missingness (MCAR, MAR, MNAR) and the principal statistical and machine learning approaches used to handle them. Covers Python libraries commonly used for tabular imputation.

Hands-On Session: Missing Data in Tabular Datasets

Detect, analyze, and impute missing values in a real-world tabular dataset. The focus is on implementing autoencoders and conditional tabular GANs for imputation and comparing their performance against simpler baseline methods.

Part 2

Missing Data in Imaging Datasets

Theoretical Foundations and Tools for Image Missing Data

Specific challenges of missing or corrupted data in imaging datasets, such as missing pixels and corrupted patches. Covers the architectures used to handle these problems, particularly convolutional autoencoders and GANs for image reconstruction, and introduces the relevant Python tools.

Hands-On Session: Missing Data in Imaging Data

Work with imaging data from a real-world healthcare domain. Apply convolutional autoencoders and GANs to reconstruct missing pixels and image patches, and compare predictive results obtained with and without addressing these issues.

Audience & Prerequisites

Who should attend

Data science and machine learning practitioners who regularly work with incomplete datasets, as well as computer science students (BSc, MSc, or PhD) dealing with missing data in their research or thesis work. Since the content covers both tabular and imaging contexts, it is relevant to a range of application domains.

The practical examples progress in complexity, allowing participants with different levels of experience to follow along at their own pace.

Prerequisites

Basic understanding of Python programming.
Familiarity with core machine learning concepts.
No prior knowledge of missing data theory or imputation techniques is required.

Presenters

Joana Cristo dos Santos

Assistant Professor · LASIGE, Faculty of Sciences, University of Lisbon

PhD in Informatics Engineering from the University of Coimbra with a background in Biomedical Engineering. Research focuses on machine learning for medical imaging, particularly deep learning and generative models, with emphasis on imputation and inpainting methods for oncology. Best Application Paper Award at an A-ranked conference.

Ricardo Cardoso Pereira

Assistant Professor · Department of Informatics Engineering, CISUC/LASI, University of Coimbra

Research and teaching in artificial intelligence, focusing on missing data, machine learning fairness, synthetic data generation, and deep learning. Editor of the International Journal of Data Science and Analytics. Has co-organized 5 conferences and served on 15+ program committees, with 20+ publications in leading journals and conferences.

Pedro Henriques Abreu

Associate Professor with Habilitation · Department of Informatics Engineering, CISUC/LASI, University of Coimbra

Coordinator of the MSc in Informatics Engineering at the University of Coimbra and the Data Centric AI group. Area editor of Information Fusion, editor of the Data Science and Analytics journal and ACM AI Letters. Author of 100+ refereed papers on missing and imbalanced data and data fairness.

Contact

For questions about the tutorial, materials, or logistics, please reach out to any of the presenters.

Joana Cristo dos Santos jcsantos@ciencias.ulisboa.pt
Ricardo Cardoso Pereira rdpereira@dei.uc.pt
Pedro Henriques Abreu pha@dei.uc.pt