Machine Proliferation

How “Machine Learning-in-a-box” may be a blessing and a curse


Building complex machine learning models has never been simpler. Thanks to software offered by companies like Amazon Web Services (SageMaker), Microsoft Azure (Machine Learning Studio), DataRobot and RapidMiner, users just need to point to their dataset, specify their objective, and the wizard-style software takes care of the rest. No coding required.

At the click of a button, data gets churned through a cluster of processors in the cloud, and models are trained in a matter of minutes if not seconds. Once the process is done, users are presented with numerous models that can include neural networks, gradient boosted trees and regularized regressions, hyper-parameter optimized and supported by associated test results, charts and in some cases model documentation. This process is facilitated by a slick and dynamic interface that any business analyst can use.

This recent “ML-in-a-box” or “ML democratization” trend aims to bring machine learning capabilities to the masses, including those with no background in statistics or programming.


In the right hands, ML-in-a-box can be a powerful tool that boosts productivity. Data scientists and quantitative teams can be more efficient in how they develop and deploy models and can focus on value-add tasks such as testing hypotheses, investigating input data and interpreting results instead of spending time debugging code or manually documenting results. However, companies should be wary of putting such powerful tools in the hands of employees that do not fully understand what is going on “inside the box”.  

The idea of giving anyone the power to extract competitive insights from massive datasets and to make accurate predictions using machine learning is enticing. With these tools, the average analyst can build complex models for a host of business applications such as segmenting and targeting customers, recommending products, forecasting financials, estimating losses, and picking investments.

Although experimenting with new technology can have its benefits, an unchecked proliferation of complex machines can also create significant business risk

Some large institutions have been making such software widely accessible to their employees and encouraging them to experiment with very limited or no oversight. According to DataRobot, customers have already built over 800 million models using their software. Although experimenting with new technology can have its benefits, an unchecked proliferation of complex machines can also create significant business risk, as an increasing number of critical decisions that impact employees and customers are made based on models that are not well-understood or governed.


“Explainability” of outputs is a well-known challenge in the machine learning field. Even seasoned data scientists and experts can have a hard time explaining why and how a machine learning model produced a certain output and how the output may be impacted by changes in input. This challenge has been one of the biggest concerns associated with the rapid spread of machine learning applications. Failing to fully understand the mechanics behind complex models have resulted in inaccurate, biased or unethical outcomes that have made the headlines. This concern is naturally amplified when people without statistics or programming backgrounds are given tools to build and use highly complex models in business decisions.


Overcoming these challenges, and effectively developing and deploying machine learning models not only requires an understanding of and appreciation for the statistical and technical intricacies of these tools, but also a grasp of the business impact and intuition behind the data that is being used to train the model and the predictions that are being created.

We recommend a controlled approach to rolling out ML-in-a-box. Start by putting the appropriate governance and approval frameworks in place: anyone who wants to download these tools and use them for business purposes should go through training not only on the tool, but on the best practices of using the tool. Any models that are put into production need to be subject to a robust review, challenge, and approval process. At any given time, you should know which businesses and functions are using these tools and for what purpose. Ensure ML-in-a-box models are independently and appropriately reviewed and tested before use. Once deployed, continue monitoring the model results to ensure everything is fine – especially if the model is implemented in a way that results in continuous improvement (for example, leveraging reinforcement learning algorithms).

With the right governance framework and controls in place, ML-in-a-box can deliver tremendous value to organizations. However, left unchecked, the outcomes can be dire for the organization and individuals using them.

Machine Proliferation