Nederlandse Testdag 2025
31st of October 2025.
The 28th edition of the Dutch Testing Day (Nederlandse Testdag) will take place in Restaurant Zuiver in Utrecht. For years, the Nederlandse Testdag has been the main event where science, education and the business world share new ideas and insights in the field of testing.

Organisation
The 28th Nederlandse Testdag organization members:

Dr. Machiel van der Bijl
CEO and founder Axini

Bart Knaak
IT Consultant at ABN AMRO Bank N.V.via Professional Testing
The Nederlandse Testdag board members:

Prof. dr. Tanja Vos
Hoogleraar Software Engineering Open Universiteit

Dr. Petra van den Bos
Assistent Professor at Universiteit Twente
Program
08:45 – 09:15 Inloop
09:15 – 09:30 Welcome
09:30 – 10:15 Keynote – Michael Bolton
10:15 – 10:45 Coffee break and marketplace
10:45 – 12:25 Talks
10:45 – 11:10 Burcu Külahçıoğlu Özkan (TU Delft) – Concurrency Testing as a Sampling Problem
11:10 – 11:35 Natalia Silvis-Cividjian (VU) – VU-BugZoo: Bug Hunting Games to Educate Gen Z Software Testers
11:35 – 12:00 Freddy de Weerd (Polteq) – From Design to Disaster: why great UX Designs still fail in practice
12:00 – 12:35 Linda van de Vooren (Bartozs) – Known and unknown unknowns
12:25 – 13:30 Lunch, networking, marketplace with stands from tool providers, universities, service providers, etc.
13:30 – 14:15 Keynote – Andy Zaidman – The Inconvenient Truth of Software Testing: What is Our Environmental Impact?
14:15 – 14:40 Leon Schrijvers en Petra Heck (Fontys) – Automated Validation of RAG-pipelines
14:40 – 15:05 Ronald van Doorn (Bilihome) – Model-based testing for medical software
15:05 – 15:45 Coffee break and marketplace
15:45 – 16:30 Talks
15:45 – 16:10 Kyra Hameleers (Foreside) – AI Based Testing versus Model-Based Testing
16:10 – 16:35 Thomas Rooijakkers (TNO) – REST API bug detection with WuppieFuzz
16:35 – 17:00 Closing
17:00 Drinks, snacks, networking, marketplace with stands from tool providers, universities, service providers, etc.
KEY NOTE SPEAKER
Michael Bolton
Test Consultant / Quality Coach / Speaker at Rapid Software Testing
Michael Bolton is a consulting software tester and testing teacher who helps people to solve testing problems that they didn’t realize they could solve. In 2006, he became co-creator (with creator James Bach) of Rapid Software Testing (RST), a methodology and mindset for testing software expertly and credibly in uncertain conditions and under extreme time pressure. Since then, he has flown over a million miles to teach RST in 35 countries on six continents.
Michael has over 30 years of experience testing, developing, managing, and writing about software. For over 20 years, he has led DevelopSense, a Toronto-based testing and development consultancy. Prior to that, he was with Quarterdeck Corporation for eight years, during which he managed the company’s flagship products and directed project and testing teams in-house and around the world.
Known and unknown unknowns
When I’m learning something new I always struggle with my unknown unknowns: the things I don’t know that I don’t know. This is also a big topic in software testing. Because when you find no bugs, is it because they aren’t there or is it because you don’t know that you’re looking at it the wrong way?
In this talk I will explain my troubled past with unknown unknowns, and how I deal with them. One spoiler: Ask a lot of questions. Questions about things everyone already seems to know. Also: ask things you already think you know.
You, the audience, also get to participate! We’ll do a round of seemingly easy tasks to do together, where we’ll find that it doesn’t work at all! For a few brave souls I also have an assignment to do on the stage with me.
After the talk you’ll be prepared to challenge things you think you know, and to face your unknown unknowns without fear.
Key learnings:
- Unknown unknowns always exist and you’ll never know you have them.
- Looking at learning with an open mind uncovers new unknowns
- Assume there is learning in a topic of you, even if you are versed
Linda van de Vooren
Test Consultant / Quality Coach / Speaker at Bartosz ICT
In daily life I am an amateur (baritone!) saxophonist, and an experienced software tester. Living in the center of Netherlands, you can find me exploring nature, visiting at a concert or the theater. I enjoy working in complex environments, and do not shy away from a challenge, wether it be complexity through technical difficulties or because of a political environment. In any free time that is left, I am an avid gamer (Nintendo!) and enjoy reading (mostly fantasy, currently).
Even though we see an increase in the investments in UX designs, there is still little attention on the usability of the actual systems being delivered. This gap between design and end result can have a disastrous impact on the business results, despite the good intentions of focusing on well designed UX designs. In this presentation we’ll explore the validity of the reasons for an increased focus on UX/Usability and question the outcome of the investments associated with UX/Usability, using both statistical data and our combined experience, spanning several business segments and various software systems.
Key take aways
- The importance of UX/Usability in relation to other key factors like price, functionality and performance
- Quality measures that can attribute to good UX/Usability
- Bridging the gap between UX Designs and the actual system
Freddy de Weerd
Test Architect at Polteq
Concurrent and distributed systems are prone to concurrency bugs due to the nondeterminism in the interleavings of concurrent events. Detecting and diagnosing concurrency bugs in such systems is critical since unforeseen interleavings of concurrent events can result in unexpected, erroneous system behavior. However, these bugs are hard to detect as they are triggered only in some subtle interleavings of the events.
In this talk, I’ll give an outline of my research on testing concurrent and distributed systems from a perspective of probabilistic sampling. While naïve random stress testing is unlikely to discover difficult bugs, our recent testing algorithms offer effective testing methods by carefully sampling test executions.

Burcu Külahçıoğlu Özkan Assistant Professor at TU Delft
Applying Behaviour-Driven Development for Game Creation
How to test computer games? In computer games, a player interacts with advanced (AI) agents, and deals with extensive game worlds. While computer games can be immensely complex, and bugs show up in well-known games, testing has not been picked up as much in the game software engineering community, as it has in traditional software engineering. In this talk I will show how Behavior-Driven Development, which is a popular technique for specification and testing in traditional software engineering, can be applied in game software engineering as well. Specifically, I will present the highlights of (i) a framework to help express game behaviors as BDD scenarios, (ii) a method to apply BDD in game development, and (iii) tooling to apply BDD in Unity 3D, a major game development platform.
Petra van den Bos
Assistent Professor at Universiteit Twente
I am an assistant professor in the Formal Methods and Tools group of the University of Twente. My current research focusses on software correctness and software quality in general, and on model-based testing specifically. I like working on theory (formal methods), that can be applied in practice as well. Previously, I had a postdoc position in the Formal Methods and Tools group of the University of Twente. Before that, I had a PhD position in the Software Science group of the Radboud University, where I completed my thesis “Coverage and Games in Model-Based Testing”.
In these disruptive times, when code and tests can be created in a few minutes by generative AI, students are less than ever motivated to learn. In an attempt to alleviate the situation, we developed VU-BugZoo, an innovative teaching platform that instead of asking students to create test plans on paper, for hypothetical code, it engages them in exciting bug detective games in executable, standalone- and embedded- code. While grading, we praise an elegant test strategy above just finding the bugs. The results show that the approach succeeds in increasing the excitement and engagement in today’s software testing classrooms.
We push the button and let our tests run. But what really happens? What machinery is put into motion when we test and what are the environmental consequences of our quality assurance process? In this talk, I will take you through some of the insights that we have made while closely examining the potential environmental impact of building and testing 204 Java open source projects. The outlook is that testing is perhaps not as green as we might have thought.

Andy Zaidman
Full professor software engineering at the TU Delft
His mission is to make software easier to evolve, and easier to test. For his work on software testing, he was the laureate of a Vidi mid-career grant in 2013, while in 2019 he received the most prestigious Vici career grant from the Dutch Science Foundation NWO. In 2024, Andy Zaidman became the head of department for the Department of Software Technology, which brings together around 200 researchers, educators and support staff in the area of the design, engineering, and analysis of complex, distributed, and data-intensive software and computer systems.
Het gebruik van model-based testen in het test proces voor medische software
TBD

Ronald van Doorn
Project & program manager | Quality manager at Bilihome
Ronald van Doorn is the Chief Program and Quality Officer at Bilihome, with over 25 years of experience in industrial automation, software engineering, and the development of medical devices. Throughout his career, he has specialized in bringing complex, safety-critical technologies to market, with a strong focus on innovative solutions, functional safety and regulatory compliance. Within Bilihome, Ronald van Doorn leads both the technical innovations and the quality management strategy behind the development of a wearable medical device designed to treat newborns diagnosed with jaundice. The device enables newborns to be treated safely and effectively with phototherapy at home, combining clinical reliability with user-centered design. Throughout our design verification and validation, we used an automatic testing philosophy while applying model based testing (MBT) techniques from Axini.
Today’s world depends on many digital services and the communication between them. To facilitate this communication between applications, standardised and well-specified application programming interfaces (APIs) are often used.
In particular, the use of well-defined representational state transfer (REST) architectural constraints for APIs is popular. As an entry point to many applications, these APIs provide an interesting attack surface for malicious actors. Furthermore, since APIs often control access to business logic, a security lapse can have high-impact undesirable consequences.
Thorough testing of these APIs is therefore essential to ensure business continuity. Manual testing cannot keep up, so automated solutions are needed. In this talk, we introduce and demonstrate WuppieFuzz, an open-source, automated testing tool that makes use of fuzzing techniques and code coverage measurements to find bugs, errors and/or vulnerabilities in REST APIs.
Thomas Rooijakkers
cyber security researcher at TNO
Thomas Rooijakkers is a dedicated cyber security researcher at TNO deeply committed to enhancing software quality and software security. He believes that high-quality software is the foundation of secure systems and has a strong drive to bring innovative solutions to industry.
In his current role as Lead Scientist for the Early Research Programme on Cyber-secure Systems by Design, Thomas is transforming product and system development by integrating cybersecurity at every stage of the engineering process. This ensures secure, reliable, and resilient cyber-physical systems, enabling the digital transition to advance further.
Additionally, Thomas is serving as the Interim Portfolio Manager for cybersecure products and systems, where he oversees the strategic direction and development of TNO’s cybersecurity portfolio.
Automated Validation of RAG-pipelines
Take away
This presentation explores how automated white-box validation following the LLM-as-a-Judge paradigm can be used to assess and improve RAG pipelines, offering practical insights into maintaining the trustworthiness and effectiveness of RAG-based applications over time.
Abstract
With the advent of foundation models and generative AI, especially the rapid rise in Large Language Models (LLMs), we see our students and companies around us build a popular type of AI-enabled systems: private document chatbots based on Retrieval-Augmented Generation (RAG). From these systems, we learned that it is not hard to build something that works from a technical point of view, but that the true challenge lies in building solutions that are thrustworthy and capable of delivering consistently high-quality results.
Validating RAG pipelines in practice is challenging for several reasons: 1) the document retrieval step can be configured in many ways and is sensitive to the structure and quality of input documents; 2) inference of high-quality answers depends on the relevance of the retrieved documents; 3) assessing answer quality is subjective and often require domain expertise; 4) improving RAG pipelines involves experimentation with new techniques and configurations, which demands structured and consistent evaluation methods.
Most existing validation frameworks treat RAG systems as black boxes, evaluating only the final output without insight into how that output was produced. In contrast, our approach takes a white-box perspective: by leveraging knowledge about the inner components of the RAG architecture and evaluating the outputs of intermediate stages, we gain a deeper understanding of system behavior and failure points.
To support this, we are developing an automated validation framework, that combines both traditional, calculable metrics with more nuanced, LLM-based evaluation techniques following the LLM-as-a-Judge paradigm. This provides more granular insights, supports targeted improvements, and allows for more reliable quality assurance.
In this presentation, we share key insights from our work and show how automated white-box validation can be applied in practice to assess and improve RAG pipelines. We explore the benefits and challenges of this approach and illustrate how it enables ongoing quality monitoring to help RAG-based applications remain trustworthy and effective over time.
Leon Schrijvers
Lecturer in Software Engineering and Artificial Intelligence at Fontys ICT
Leon Schrijvers has been active in the IT industry since 2005, starting as a software engineer before moving into roles as senior software architect and technical director. Across all these positions, he maintained a strong focus on software quality assurance and security.
In 2018, he joined Fontys ICT as a Lecturer in Software Engineering and Artificial Intelligence, to educate the next generation of software engineers. He also contributes to educational development initiatives aimed at shaping the future of the field.
Since 2023, he has been researching LLM engineering, with a focus on building trustworthy LLM systems. In his research, he enjoys combining theory with a hands-on approach, delivering practical results that educate and inspire others. Leon holds a Master’s degree in Computer Science from Eindhoven University of Technology.
Automated Validation of RAG-pipelines
Take away
This presentation explores how automated white-box validation following the LLM-as-a-Judge paradigm can be used to assess and improve RAG pipelines, offering practical insights into maintaining the trustworthiness and effectiveness of RAG-based applications over time.
Abstract
With the advent of foundation models and generative AI, especially the rapid rise in Large Language Models (LLMs), we see our students and companies around us build a popular type of AI-enabled systems: private document chatbots based on Retrieval-Augmented Generation (RAG). From these systems, we learned that it is not hard to build something that works from a technical point of view, but that the true challenge lies in building solutions that are thrustworthy and capable of delivering consistently high-quality results.
Validating RAG pipelines in practice is challenging for several reasons: 1) the document retrieval step can be configured in many ways and is sensitive to the structure and quality of input documents; 2) inference of high-quality answers depends on the relevance of the retrieved documents; 3) assessing answer quality is subjective and often require domain expertise; 4) improving RAG pipelines involves experimentation with new techniques and configurations, which demands structured and consistent evaluation methods.
Most existing validation frameworks treat RAG systems as black boxes, evaluating only the final output without insight into how that output was produced. In contrast, our approach takes a white-box perspective: by leveraging knowledge about the inner components of the RAG architecture and evaluating the outputs of intermediate stages, we gain a deeper understanding of system behavior and failure points.
To support this, we are developing an automated validation framework, that combines both traditional, calculable metrics with more nuanced, LLM-based evaluation techniques following the LLM-as-a-Judge paradigm. This provides more granular insights, supports targeted improvements, and allows for more reliable quality assurance.
In this presentation, we share key insights from our work and show how automated white-box validation can be applied in practice to assess and improve RAG pipelines. We explore the benefits and challenges of this approach and illustrate how it enables ongoing quality monitoring to help RAG-based applications remain trustworthy and effective over time.

Petra Heck
Associate Professor Software & AI Engineering at Fontys ICT
Petra Heck is an accomplished academic and researcher with extensive experience in software engineering, AI development, and educational innovation. She began her career in 2012 as a Lecturer in Software Engineering at Fontys University of Applied Sciences, where she later advanced to Senior Researcher in the AI & Big Data research group. Her postdoctoral research (2019–2021) focused on production-ready machine learning systems, contributing to the emerging field of AI engineering.
Currently serving as Associate Lector AI Engineering, Petra leads applied research projects, mentors students and colleagues, and facilitates knowledge transfer between academia and industry. She is also active in educational development and innovation.
Her academic journey includes a Master’s in Computer Science from Eindhoven University of Technology and a PhD from TU Delft, with a dissertation on agile requirements quality. Petra’s work is characterized by a transdisciplinary approach, bridging technical expertise with pedagogical innovation.
AI Based Testing versus Model-Based Testing
What is AI based testing?
And what is Model Based Testing?
Both techniques have been around for a while, but have been very hype sensitive.
With the current AI hype and the history of MBT, I see much confusion on what is what, and would like to share my vision on what they mean and how they differ but also compliment each other. It is always hard to predict what the future holds, but these two techniques will help evolve the test field.
Kyra Hameleers
Test Engineer at Foreside
Kyra Hameleers is a senior Test Engineer, with a background in software development. She has a specialization in Model Based Testing (MBT) but any test related technology has her interest. Kyra has been working with multiple parties in investigating AI and how it can advance the test field further.
In her free time Kyra is a stereotypical nerd; playing games, reading books, writing stories. She also loves technology and spends time researching it.
Speakers

Michael Bolton
Test Consultant & Coach at Rapid Software Testing

Andy Zaidman
Full professor Software Engineering at TU Delft

Linda van de Vooren
Test Consultant & Coach at Bartosz ICT

Natalia Silvis-Cividjian
Assistant professor at the Vrije Universiteit

Petra Heck
Associate Professor Software & AI Engineering at Fontys ICT

Burcu Kulahcioglu Ozkan
Assistant Professor at TU Delft

Freddy de Weerd
Testarchitect, trainer en coach bij Polteq

Leon Schrijvers
Lecturer in Software Engineering and Artificial Intelligence at Fontys ICT

Ronald van Doorn
Project & program manager | Quality manager at Bilihome