خواندن ۲ دقیقه·۳ سال پیش

ترجمه مقاله زیر رو در عکس ها دنبال کنید.

Cross-Validation in Machine Learning: How to Do It Right

In machine learning (ML), generalization usually refers to the ability of an algorithm to be effective across various inputs. It means that the ML model does not encounter performance degradation on the new inputs from the same distribution of the training data.

For human beings generalization is the most natural thing possible. We can classify on the fly. For example, we would definitely recognize a dog even if we didn’t see this breed before. Nevertheless, it might be quite a challenge for an ML model. That’s why checking the algorithm’s ability to generalize is an important task that requires a lot of attention when building the model.

To do that, we use Cross-Validation (CV).

In this article we will cover:

What is Cross-Validation: definition, purpose of use and techniques
Different CV techniques: hold-out, k-folds, Leave-one-out, Leave-p-out, Stratified k-folds, Repeated k-folds, Nested k-folds, Complete CV
How to use these techniques: sklearn
Cross-Validation in Machine Learning: sklearn, CatBoost
Cross-Validation in Deep Learning: Keras, PyTorch, MxNet
Best practises and tips: time series, medical and financial data, images

What is Cross-Validation

Cross-validation is a technique for evaluating a machine learning model and testing its performance. CV is commonly used in applied ML tasks. It helps to compare and select an appropriate model for the specific predictive modeling problem.

CV is easy to understand, easy to implement, and it tends to have a lower bias than other methods used to count the model’s efficiency scores. All this makes cross-validation a powerful tool for selecting the best model for the specific task.

There are a lot of different techniques that may be used to cross-validate a model. Still, all of them have a similar algorithm:

Divide the dataset into two parts: one for training, other for testing
Train the model on the training set
Validate the model on the test set
Repeat 1-3 steps a couple of times. This number depends on the CV method that you are using

As you may know, there are plenty of CV techniques. Some of them are commonly used, others work only in theory. Let’s see cross-validation methods that will be covered in this article.