• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
  • Contact
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

Stratified-K-Fold in Machine Learning

March 6, 2021 by systems

Rahul Mishra

Importance of #StratifiedKfold in #machinelearningmodels

When we want to train our ML model we split our entire dataset into training_set and test_set using train_test_split() class present in sklearn. The problem we face is with using different random numbers because of which we get different accuracies and hence we canโ€™t exactly point out the accuracy for our model.

The train_test_split() splits the dataset into training_test and test_set by #randomsampling.

?ๅฝก ๐–๐ก๐š๐ญ ๐ข๐ฌ ๐ซ๐š๐ง๐๐จ๐ฆ ๐ฌ๐š๐ฆ๐ฉ๐ฅ๐ข๐ง๐  ๐š๐ง๐ ๐’๐ญ๐ซ๐š๐ญ๐ข๐Ÿ๐ข๐ž๐ ๐ฌ๐š๐ฆ๐ฉ๐ฅ๐ข๐ง๐  ?ๅฝก

Suppose you want to take a survey and decided to call 1000 people from a particular state, If you pick either 1000 male completely or 1000 female completely or 900 female and 100 male (randomly) to ask their opinion on a particular product.Then based on these 1000 opinion you canโ€™t decide the opinion of that entire state on your product.This is random sampling.

But in Stratified Sampling, Let the population for that state be 51.3% male and 48.7% female, Then for choosing 1000 people from that state if you pick 531 male ( 51.3% of 1000 ) and 487 female ( 48.7% for 1000 ) i.e 531 male + 487 female (Total=1000 people) to ask their opinion. Then these groups of people represent the entire state. This is called as Stratified Sampling.

๐–๐ก๐ฒ ๐ซ๐š๐ง๐๐จ๐ฆ ๐ฌ๐š๐ฆ๐ฉ๐ฅ๐ข๐ง๐  ๐ข๐ฌ ๐ง๐จ๐ญ ๐ฉ๐ซ๐ž๐Ÿ๐ž๐ซ๐ž๐ ๐ข๐ง ๐ฆ๐š๐œ๐ก๐ข๐ง๐ž ๐ฅ๐ž๐š๐ซ๐ง๐ข๐ง๐  ?

Letโ€™s consider a binary-class classification problem. Let our dataset consists of 100 samples out of which 80 are negative class { 0 } and 20 are positive class { 1 }

โ–‚ โ–ƒ โ–… โ–† โ–ˆ Random sampling: โ–ˆ โ–† โ–… โ–ƒ โ–‚

If we do random sampling to split the dataset into training_set and test_set in 8:2 ratio respectively.Then we might get all negative class {0} in training_set i.e 80 samples in training_test and all 20 positive class {1} in test_set.Now if we train our model on training_set and test our model on test_set, Then obviously we will get a bad accuracy score.

โ–‚ โ–ƒ โ–… โ–† โ–ˆ Stratified Sampling: โ–ˆ โ–† โ–… โ–ƒ โ–‚

In stratified sampling, The training_set consists of 64 negative class{0} ( 80% 0f 80 ) and 16 positive class {1} ( 80% of 20 ) i.e. 64{0}+16{1}=80 samples in training_set which represents the original dataset in equal proportion and similarly test_set consists of 16 negative class {0} ( 20% of 80 ) and 4 positive class{1} ( 20% of 20 ) i.e. 16{0}+4{1}=20 samples in test_set which also represents the entire dataset in equal proportion.This type of train-test-split results in good accuracy.

๐–๐ก๐š๐ญ ๐ข๐ฌ ๐ญ๐ก๐ž ๐ฌ๐จ๐ฅ๐ฎ๐ญ๐ข๐จ๐ง ๐Ÿ๐จ๐ซ ๐ฆ๐ž๐ง๐ญ๐ข๐จ๐ง๐ž๐ ๐ฉ๐ซ๐จ๐›๐ฅ๐ž๐ฆ๐ฌ?

The solution for the first problem where we were able to get different accuracy score for different random_state parameter value is to use K-Fold Cross-Validation. But K-Fold Cross Validation also suffer from second problem i.e. random sampling.

The solution for both first and second problem is to use Stratified K-Fold Cross-Validation.

๐–๐ก๐š๐ญ ๐ข๐ฌ ๐’๐ญ๐ซ๐š๐ญ๐ข๐Ÿ๐ข๐ž๐ ๐Š-๐…๐จ๐ฅ๐ ๐‚๐ซ๐จ๐ฌ๐ฌ ๐•๐š๐ฅ๐ข๐๐š๐ญ๐ข๐จ๐ง?

Stratified k-fold cross-validation is same as just k-fold cross-validation, But in Stratified k-fold cross-validation, it does stratified sampling instead of random sampling.

Filed Under: Artificial Intelligence

Primary Sidebar

Stay Ahead: The Latest Tech News and Innovations

Cryptocurrency Market Updates: What’s Happening Now

Emerging Trends in Artificial Intelligence: What to Watch For

Top Cloud Computing Services to Secure Your Data

The Future of Mobile Technology: Recent Advancements and Predictions

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2025 NEO Share

Terms and Conditions - Privacy Policy