How To Build And Deploy A Reproducible Machine Learning Pipeline

Sole from Train in Data
12 min readJan 27, 2020

As companies and researches rush to implement more and more machine learning practices into their organizations, occasionally they sacrifice understanding the complexities of statistics practices in order to achieve results faster. People rush to implement statistical methods without fully understanding the intricacies of the methods themselves, or what they sacrifice by rushing through the processes without putting the right controls in place.

Subsequently, the public’s weariness of manipulated statistics has increased, and reproducibility in any methodology becomes extremely important. Though on the surface, reproducibility in machine learning pipelines might seem as simple as documenting these processes, actually deploying these pipelines introduces many unforeseen, but non-negligible challenges in creating true reproducibility.

In this blog post, I will give a brief introduction as to what deploying reproducible machine learning pipelines actually means, its main hindrances, and one proposed solution for overcoming these challenges.

For details about the technical implementation visit our online course Deployment of Machine Learning Models.

What Is Model Deployment?

--

--

Sole from Train in Data

Data scientist, book author, online instructor (www.trainindata.com) and Python open-source developer. Get our regular updates: http://eepurl.com/hdzffv