Ignore some weak “Folds” for Cross-validation.
Calculate and compare accuracy, before using each “Fold”.
Cross-validation is a smart and widely used technique that allows us to utilize our data in a better way. Usually to avoid “Overfitting” and get better results; The number of folds is carefully determined based on the number of samples and computational power. And or for example, “StratifiedKFold” is used instead of “KFold”. But the parameters, as well as the hyper-parameters, are set the same for all folds, and the calculations of all the folds start and end in a certain loop.
If you carefully examine the results of different folds when training, sometimes the accuracy of the results of some of these folds is not as good as the others. That is, they are not trained like other folds. For example, local maximums or local minimums may have locked our calculations, and there may be other reasons. But the question is:
Is it possible to ignore some folds in the cross-validation method? Is it possible that this will cause our score to increase? And what points should be observed?
In this article, we try to answer the above questions. We also address a few other important points. This article was written in December 2021 by Somayeh Gholami and Mehran Kazeminia.
Ignore some weak Folds
Suppose you want to perform regression or classification using cross-validation. For example, you specify five folds for your model. After training and validation, you check all the results and you are surprised to find that the results of one or two folds are not as good as the other folds and the training in these folds was not good. This means that these folds may have a negative effect on the final results. In these cases, you can, for example, change the value of random_state (or seed) and try again. Or you can try the instructions in this article. Do not forget that only the final result is important and the division of folds is a method and a tool.
If you ignore one or two weak folds out of five, use only the results of three folds; The results of the validation phase may be better, but the risk of “Overfitting” increases. This means that the result of the testing phase will not improve in practice. So if you do not have a problem with computing power, increase the number of folds to ten, for example, and when calculating, ignore three or four or five folds (which have relatively poor results). Probably with this simple task, the final result will be improved.
Another evaluation for selecting weak Folds
So increasing the number of folds can always reduce the risk of “Overfitting” and only in this case, it is possible to ignore a few weak folds. Of course, if we plan this from the beginning, we can achieve another goal, which is sometimes very important. This means that we can use a new type of evaluation to select weak folds. This new evaluation can be different from the challenge (or project) evaluation and somehow complement it.
As you know, it is not possible to run code on a medium platform. We have written a notebook in the following address that contains the details and codes of the topics of this article and you can run the codes too. This notebook is for a Kaggle challenge called “Tabular Playground Series “ which was held in October 2021.
[1] TPS Oct 21 — LGBM & AUC Evaluation
Ignore some folds in the cross-validation method & MORE
In this challenge, the assessment was based on the “area under the ROC curve” and we conducted the training based on the same type of assessment. But we used “accuracy_score” to select weak folds.
Once again, training with all the dataset
In the above notebook, we did the training with all the datasets once again and ensembled the result with the previous results, and finally, the final score was a little better. Of course, the score may not always improve.
Please note that at this stage you must set some parameters. Because “valid_X” and “valid_y” no longer exist. For example, if you are using LightGBM, you will need to set the value of n_estimators again.
Can changing the value of “random_state” improve our score?
Yes, there is a possibility that the score will improve. You can even go one step further. This means that you can use a fixed model but change the value of “random_state” several times and finally ensemble the results of these calculations together. This simple task may improve your score a little. We have seen this trick by clever participants many times in various Kaggle challenges. If you want more information, you can look at the public notebooks below, for example.