*Why pretrained machine learning models are often unusable and irreproducible — and what we can do about it.*

## Introduction

A useful approach to designing software is through *contracts*. For every function in your codebase, you start by writing its contract: clearly specifying what inputs are expected and valid for that function (the *precondition*), and what the function will do (the *postcondition*) when provided an appropriate input. This is often explicitly stated in the docstring of a function. Consider this example from the **math** module in Python (implemented in C):

/* Approximate square root of a large 64-bit integer. Given `n` satisfying `2**62 <= n < 2**64`, return `a` satisfying `(a - 1)**2 < n < (a + 1)**2`. */ static uint64_t _approximate_isqrt(uint64_t n) { uint32_t u = 1U + (n >> 62); u = (u << 1) + (n >> 59) / u; u = (u << 3) + (n >> 53) / u; u = (u << 7) + (n >> 41) / u; return (u << 15) + (n >> 17) / u; }

The contract in the docstring has two parts:

**Precondition**: input should be an integer between 2⁶² and 2⁶⁴**Postcondition**: output is an integer within 1 of the square root of the input

The contract is powerful because when the code is published, other developers **do not need to test the function** themselves, **nor consider its internal implementation**. They can read off the range of valid inputs for the function and start using it immediately. Conversely, they operate knowing that *if the precondition is not satisfied, then neither is the postcondition guaranteed*.

Nowadays, **pretrained machine learning models are increasingly being deployed as functions and APIs**. They are part of companies’ internal codebases [1], released externally for use through APIs [2], and, in research, pretrained models are published as part of the review and reproducibility processes [3].

To continue reading this article, click here.