Reproducibility in AI, and What Computing Professionals Should Know for Supporting Researchers


With AI revolutionizing our world and the ever expanding computational methods available to scientists, so much is focused on the use of AI with little attention to the crisis of reproducibility and the role of systems. We take for granted that if the infrastructure and software stack is functioning and results come out, that the environment is in working order.  Running the same machine learning processes on different clusters, even using different flavors of Linux, can impact any AI process that depends on randomness, e.g. random forests. Reproducibility, a concept central to the Scientific Method, is a commonly misunderstood term.  Reproducibility, repeatability, and replicability are often confused for one another. This session will discuss the significance of each concept and expose you to leading theories on reproducibility in AI.  

A description of the experiment, code and/or data are required to reproduce an AI experiment. Often overlooked in papers is a description of the computing environment, or laboratory, in which the experiment was performed. This often overlooked detail is critical to the thorough reproduction of the experiment. Research computing professionals can play a crucial role in working closely with researchers to ensure this is well documented, so that later research that qualifies variance, can be appropriately applied to results. This session will discuss strategies for capturing and managing this information, as well as for introducing this concept to researchers you partner with.

Quantifying how inter-laboratory differences or computer architecture differences between compute systems, such as operating system, ancillary software, software versions and processing units, affect the performance of different AI methods will increase the understanding of which AI methods are sensitive to variations in inter-laboratory differences and which are not.

Understanding the effects of inter-laboratory differences, especially where it would influence the interpretation of results, can be used to help interpret whether an empirical result of a given type of AI method needs contextual adjustment given the computational environment it was conducted in or whether the experiment has to be conducted in more environments before being fully trusted. A brief overview of related concepts including FAIR (Findable, Accessible, Interoperable, Reusable) in AI and how it can influence reproducibility, will also be discussed.