مهدی نساجی
مهدی نساجی
خواندن ۲ دقیقه·۳ سال پیش

ترجمه مقاله زیر رو در عکس ها دنبال کنید.

Cross-Validation in Machine Learning: How to Do It Right

In machine learning (ML), generalization usually refers to the ability of an algorithm to be effective across various inputs. It means that the ML model does not encounter performance degradation on the new inputs from the same distribution of the training data.

For human beings generalization is the most natural thing possible. We can classify on the fly. For example, we would definitely recognize a dog even if we didn’t see this breed before. Nevertheless, it might be quite a challenge for an ML model. That’s why checking the algorithm’s ability to generalize is an important task that requires a lot of attention when building the model.

To do that, we use Cross-Validation (CV).

In this article we will cover:

  • What is Cross-Validation: definition, purpose of use and techniques
  • Different CV techniques: hold-out, k-folds, Leave-one-out, Leave-p-out, Stratified k-folds, Repeated k-folds, Nested k-folds, Complete CV
  • How to use these techniques: sklearn
  • Cross-Validation in Machine Learning: sklearn, CatBoost
  • Cross-Validation in Deep Learning: Keras, PyTorch, MxNet
  • Best practises and tips: time series, medical and financial data, images

What is Cross-Validation

Cross-validation is a technique for evaluating a machine learning model and testing its performance. CV is commonly used in applied ML tasks. It helps to compare and select an appropriate model for the specific predictive modeling problem.

CV is easy to understand, easy to implement, and it tends to have a lower bias than other methods used to count the model’s efficiency scores. All this makes cross-validation a powerful tool for selecting the best model for the specific task.

There are a lot of different techniques that may be used to cross-validate a model. Still, all of them have a similar algorithm:

  1. Divide the dataset into two parts: one for training, other for testing
  2. Train the model on the training set
  3. Validate the model on the test set
  4. Repeat 1-3 steps a couple of times. This number depends on the CV method that you are using

As you may know, there are plenty of CV techniques. Some of them are commonly used, others work only in theory. Let’s see cross-validation methods that will be covered in this article.

  • Hold-out
  • K-folds
  • Leave-one-out
  • Leave-p-out
  • Stratified K-folds
  • Repeated K-folds
  • Nested K-folds
  • Complete

ادامه این مقاله به زبان انگلیسی در سایت

https://neptune.ai/blog/cross-validation-in-machine-learning-how-to-do-it-right

ترجمه در عکسهای پایین


شاید از این پست‌ها خوشتان بیاید