The program consists of talks on the accepted papers and two keynotes from academia.
We plan to have a poster session to spark more discussion and networking.
September 1st (all times are in London Time)
08:45 - 09:00
Opening Remarks
09:00 - 10:00
Session 1: Morning Keynote - Chair: Hazar Harmouch
Keynote Title: Model Lakes
Keynote Speaker: Renée J. Miller (University of Waterloo, Canada)
Keynote Abstract: Given a set of learning (AI) models, it can be hard to find models appropriate to a task, understand the models, and characterize how models are different one from another. Currently, practitioners rely on manually-written documentation (model cards) to understand and choose models. However, not all models have complete and reliable documentation. As the number of models increases, the challenges of finding, differentiating, and understanding models become increasingly crucial. Inspired from research on data lakes, we introduce the concept of model lakes. We explore the question of why should be care about data quality in model lakes?
Keynote Speaker Bio: Renée J. Miller is the Canada Excellence Research Chair in Data Intelligence at the University of Waterloo. She is a Fellow of the Royal Society of Canada, Canada’s National Academy of Science, Engineering and the Humanities. She received the US Presidential Early Career Award for Scientists and Engineers (PECASE), the highest honor bestowed by the United States government on outstanding scientists and engineers beginning their careers. She received an NSF CAREER Award, the Ontario Premier’s Research Excellence Award, and an IBM Faculty Award. She formerly held the Bell Canada Chair of Information Systems at the University of Toronto and a University Distinguished Professorship at Northeastern University. She is a Fellow of the ACM and the AAAS. Her work has focused on the long-standing open problem of data integration and has achieved the goal of building practical data integration systems. She and her colleagues received the ICDT Test-of-Time Award and the 2≈rk, she has received the CS Canada Lifetime Achievement Award in Computer Science. Professor Miller was an Editor-in-Chief of the VLDB Journal and former president of the non-profit Very Large Data Base (VLDB) Foundation. She received her PhD in Computer Science from the University of Wisconsin, Madison and bachelor’s degrees in Mathematics and Cognitive Science from MIT.
10:00 - 10:30
Break
10:30 - 12:00
Session 2: Morning Research Session - Chair: Lisa Ehrlinger
Out in the Wild: Investigating the Impact of Imperfect Data on a Tabular Foundation Model
Vasileios Papastergios and Anastasios Gounaris
Exploring Privacy-Preserving Record Linkage: A Holistic Framework for Dataset Generation and Detailed Result Analysis
Florens Rohde, Victor Christen, and Erhard Rahm
Dynamic Knowledge Graph-based Measurement of Data Quality
Johannes Schrott, Rainer Meindl, Christian Lettner, Stefan Hammer, and Magdalena Leitner
Evolving Gracefully: Building Robust and Self-Adaptive Data Cleaning Pipelines for Schema Evolution and Uncertainty
Kevin Kramer, Valerie Restat, and Uta Störl
12:00 - 13:30
Lunch
13:30 - 15:00
Session 3: Afternoon Keynote - Chair: Hazar Harmouch
Keynote Title: From XAI to XEE through Influence and Provenance, and optimising models for fairness when data drifts over time: some work in progress on connecting data and models to ensure quality and trust in both.
Keynote Speaker: Paolo Missier (University of Birmingham, UK)
Keynote Abstract: In the “Data-to-AI” value chain (leading to decisions, new knowledge, insights, etc), interventions aimed at improving the quality of the underpinning data are increasingly driven by model optimisation goals. Embracing this view, over the past few years the popular “Data-Centric AI” (DCAI) paradigm has been producing many interesting examples of model-driven quality improvement methods, including model-driven data cleaning, model-driven dataset pruning, and many more. These are all “engineering” tasks, focused on improving the effectiveness and efficiency of the entire value chain. Complementary to this, one can view “trust” in the models as a user-centred manifestation of quality, which translates into transparency requirements, and thus into the need to provide effective explanations that encompass both models and the data used to train them.
Within this setting, in this talk we present two strands of ongoing work. Firstly, we aim to generalise DCAI both on the data side, controlling the impact of data drift on the model over time, and on the model side, adding fairness as a quality metric and complementary to accuracy. Combining these two elements raises interesting new challenges.
And secondly, we aim to provide “eXplainaibility End-to-End” (XEE) by combining established XAI techniques, namely Influence Functions, with our own work on tracking the provenance of training datasets through data processing pipelines.
As this is mostly work in progress, expect fewer results and more preliminary ideas, hopefully leading to stimulating interactions through the talk.
Keynote Speaker Bio: Paolo Missier is Chair in Computer and Data Science at the University of Birmingham, UK, where he also serves as Director of the Data and AI Institute. He previously held academic positions at Newcastle University (2011–2023) and was a Fellow of the Alan Turing Institute (2018–2023). His career spans both academia and industry, including applied research at Bellcore in the USA (1994–2001) and consultancy for the Italian Government and private sector on data quality (2001–2004). He holds a PhD in Computer Science from the University of Manchester, where he focused on data quality in scientific workflows. His research interests lie in improving data science to improve science and health data science at scale. Since 2016, he has been Senior Associate Editor of the ACM Journal of Data and Information Quality (JDIQ).
14:30
Poster session (continues during the coffee break)
15:00 - 15:30
Break
15:30 - 16:50
Session 4: Afternoon Research Talks - Chair: Lorena Etcheverry
Label Flipping For Group Fairness
Shashank Thandri and Romila Pradhan
PBE Meets LLM: When Few Examples Aren’t Few-Shot Enough
Shuning Zhang and Yongjoo Park
Towards an SLM-based Auditing of Relational Schemas and Data Quality for Practical Data Governance
Antony de Medeiros
16:50 -17:00
Closing Remarks
↑ top