Moving Your ML Proof of Concept to Production Part 3: Ensuring Results Are Reliable
On August 9, 2024 in All, Computing, General by Becks Simpson
Building a machine learning (ML) proof of concept (POC) might sound like a process that is deeply experimental and scrappy, with more focus on getting it to work than making it robust in the long run. However, in order to take an ML POC to the next stage, reserving some extra time in the beginning is important to ensure that any results are reproducible and trustworthy. So far, this series has set the stage for developing an ML POC by describing how to identify business objectives and then translate them to ML metrics, as well as how to build a relevant, project-specific dataset. You may think that model development and iteration are ready to begin once these two important steps are complete, but one piece that’s often overlooked is the experimentation environment. This setup is crucial because the reliability and reproducibility of whichever model is in the POC hinge on understanding what elements helped create the model and which were discarded. In particular, the dataset used for development, model architecture, and artifacts from training, as well as parameters of the successful experiments themselves, all need to be tracked. Fortunately, a wealth of tools and resources is available to ensure the development environment is correct from the start.
This blog will cover requirements for such an environment, considerations for available platforms, and some examples of setups.
Why the Environment Matters
Experimentation that is robust and reproducible is the cornerstone of a successful ML project. The same results should be produced if an experiment is run again at a later date, and if a newer model is needed, the original model should have the potential to be extended by varying any of the original pieces used to make it. The understanding of how the best model was made in terms of inputs, parameters, or configurations should also be clear, so that if something changes, you understand why. You need to be able to identify what didn’t work at a glance and compare new lines of experimentation to pick the best one. Lastly, other key concerns include ensuring trust in the POC model’s performance so that it can move to production with reduced difficulty in that transition.
However, prototyping and building a POC is meant to be quick and involve rapid iteration. In ML terms, “spinning up” a variety of experiments using different models, variations of the dataset with transformation and configuration parameters should be fast. Equally, if the POC involves large datasets or models that necessitate large or complex processes, orchestrating any processing of data or training of models as part of an experiment and repeating that process in parallel with a number of experiments should be fairly straightforward.
Although these two requirements seem at odds, both are possible. Often, the environment to facilitate both of these seemingly competing goals is overlooked in favor of building something rapidly. Since many pieces of the ML puzzle—code, datasets, configuration files, and model artifacts—need to be tracked, having everything connected in a self-contained experimentation platform or environment is crucial. Thus, leveraging the available tooling and open-source or hosted platforms for ML experimentation is helpful.
Resources for an Experimentation Environment
Much like baking a cake, various ingredients make a model work. In order to reproduce results and instill trust in the model’s performance, all of these pieces need to be tracked. At a bare minimum, a platform should be chosen that can facilitate this process easily, even if it doesn’t have other useful features like orchestration of multiple training jobs across different resources or on-the-fly data labeling capabilities. The open-source tools MLFlow and ClearML are two popular options for tracking the elements of ML experiments. They are very similar, and depending on your use case, one or the other might be preferable. For example, ClearML has an enterprise edition with more features available.
Both can be deployed as standalone microservices and come with software development kits that allow you to integrate the code used for experimentation to track a variety of things in the ML life cycle. The storage used is customizable for those developing with cloud or internal resources. Each has a user interface as well to inspect individual experiment results and to compare across runs for selecting the best model. Using one of these tools will enable you to keep a record of the initial data source, model artifacts like architecture configuration and weights files, and track any variable experiment parameters such as learning rate and batch size, among others. The latter two describe how to build the model in terms of layers and the output of a training run, which combines the input data, the experiment parameters, and the model configuration. You could also link experiment runs to specific versions of code used to develop the model through versioning tools like GitHub. Moreover, ClearML has extensions that make it useful for experimenting with large language models (LLMs) as well.
In addition to these two options, you may also want to connect more tools for the other aspects of ML experimentation, from data transformation and provenance to ongoing labeling efforts and process orchestration. DagsHub is one platform that facilitates this, aiming to combine all elements of the ML life cycle into a single place. With several integrations, a variety of industry-preferred tools such as Label Studio, Dagster, MLFlow, and Data Version Control could work together. For LLM-based POCs, other features like human-in-the-loop labeling, evaluation, and prompt tracking are available too.
Conclusion
Once the ML development portion of POC-building is ready to begin, with the metrics established and dataset built, spending some time appropriately structuring an experimentation environment is the next step before starting any coding. Doing so not only helps facilitate rapid experimentation once that process starts, but more importantly, is conducive to generating results that are reproducible, robust, and trustworthy. Fortunately, with the maturity of the ML ecosystem, several high-quality software tools exist that can be deployed and connected easily to cover all aspects of the ML development life cycle—even for newer models like LLMs.
So far, this series has covered three of the important first steps to success with an ML project—establishing goals and translating them to metrics, getting the dataset ready, and, now, structuring the environment that makes ML experimentation robust yet quick. Keep following along to learn the remaining equally critical stages involved in developing an ML POC in an agile but reproducible fashion and in putting the resulting outputs into production. The remaining articles will cover resources and approaches for developing the POC, including open-source models, guidelines, and focus points when extending to a production-ready version, as well as advice about what to anticipate and monitor post-deployment.