Learning Design Patterns from Big Machine Learning Libraries

Publish Date

Starting a machine learning project is difficult because there so much to think about. Most of the time there is little brain power left to think about code architecture. However, since the machine learning pipe-line does not differ so much.

This goal of this post is to provide some insight on how we can structure code by looking at well known machine learning libraries.

The Machine Learning Pipeline

https://medium.com/microsoftazure/how-to-accelerate-devops-with-machine-learning-lifecycle-management-2ca4c86387a0

Scikit Learn

Scikit Learn is probably the most well known machine learning library out there. It is built on core python tools such as numpy, scipy and matplotlib.

Key Design Points

Scikit-Learn has three fundamental APIs: Estimator, Predictor, and Transformer.

Estimators:

All learning algorithms implement the estimator interface and expose a fit method
The instantiation of an Estimator (hyperparameters) is decoupled from the learning process (training data).

Predictors:

Extends the Estimator and implements a predict method.

Transformers:

All hyper-parameters for estimators/transformers are public attributes.

Simplify by getting rid of get/set methods.

Core data representations a based on Numpy multi-dimensional arrays.

Reduces the barrier to entry because there is no need to learn a new data class.
Ensures performance since numpy is optimized for performance using C.

Easy Composition through pipelines

Uniform interfaces across core components allow chaining. This allow for code like the following:

my_pipeline = Pipeline([('imputer', SimpleImputer(strategy='median')), 
                        ('std_scaler', StandardScaler())
                       ])

transformed_X_train = my_pipeline.fit_transform(X_train)

arxiv.org

Scikit-Learn Design Principles

This blog post is a brief reflection on the elegance of the design principles of the Scikit-Learn library. To be clear: this is not meant to be a tutorial in using Scikit-Learn. Scikit-Learn is a powerful, rich, and extensive Python library for implementing machine learning.

towardsdatascience.com

Skorch

The goal of skorch is to make it possible to use PyTorch with sklearn. skorch abstracts away the training loop, making a lot of boilerplate code obsolete. A simple net.fit(X, y) is enough.

net = NeuralNetClassifier(...)
net.fit(X_train, y_train)
net.predict(X_test)

skorch documentation - skorch 0.10.1dev documentation

A scikit-learn compatible neural network library that wraps PyTorch. The goal of skorch is to make it possible to use PyTorch with sklearn. This is achieved by providing a wrapper around PyTorch that has an sklearn interface. In that sense, skorch is the spiritual successor to nolearn, but instead of using Lasagne and Theano, it uses PyTorch.

skorch.readthedocs.io