Starting a machine learning project is difficult because there so much to think about. Most of the time there is little brain power left to think about code architecture. However, since the machine learning pipe-line does not differ so much.
This goal of this post is to provide some insight on how we can structure code by looking at well known machine learning libraries.
The Machine Learning Pipeline
Scikit Learn is probably the most well known machine learning library out there. It is built on core python tools such as numpy, scipy and matplotlib.
Key Design Points
- Scikit-Learn has three fundamental APIs: Estimator, Predictor, and Transformer.
- All learning algorithms implement the estimator interface and expose a fit method
- The instantiation of an Estimator (hyperparameters) is decoupled from the learning process (training data).
- Extends the Estimator and implements a predict method.
- All hyper-parameters for estimators/transformers are public attributes.
- Simplify by getting rid of get/set methods.
- Core data representations a based on Numpy multi-dimensional arrays.
- Reduces the barrier to entry because there is no need to learn a new data class.
- Ensures performance since numpy is optimized for performance using C.
- Easy Composition through pipelines
- Uniform interfaces across core components allow chaining. This allow for code like the following:
my_pipeline = Pipeline([('imputer', SimpleImputer(strategy='median')), ('std_scaler', StandardScaler()) ]) transformed_X_train = my_pipeline.fit_transform(X_train)
Scikit-Learn Design Principles
This blog post is a brief reflection on the elegance of the design principles of the Scikit-Learn library. To be clear: this is not meant to be a tutorial in using Scikit-Learn. Scikit-Learn is a powerful, rich, and extensive Python library for implementing machine learning.
The goal of skorch is to make it possible to use PyTorch with sklearn. skorch abstracts away the training loop, making a lot of boilerplate code obsolete. A simple
net.fit(X, y) is enough.
net = NeuralNetClassifier(...) net.fit(X_train, y_train) net.predict(X_test)
skorch documentation - skorch 0.10.1dev documentation
A scikit-learn compatible neural network library that wraps PyTorch. The goal of skorch is to make it possible to use PyTorch with sklearn. This is achieved by providing a wrapper around PyTorch that has an sklearn interface. In that sense, skorch is the spiritual successor to nolearn, but instead of using Lasagne and Theano, it uses PyTorch.