• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
  • Contact
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

Elo Ranks for Visual Task Adaptation Benchmark

February 19, 2021 by systems

To address these problems we propose Elo-based Predictive Power (EPP) ranking method.

Elo measure

The Elo ranking system is used for calculating the relative skill levels of players in games, such as chess or soccer but Elo is also popular in MOBA (Multiplayer online battle arena) games.

The difference in Elo scores of two players is a predictor of their match result. Elo is calculated based on the player’s historical wins and losses. After each match winner gains Elo points and the loser loses points. The amount of gained/lost points depends on the strength of the opponent. Winning against a better player gives more Elo points. The most important property of Elo is that the difference between two scores can be transformed into the probability of a player’s win against the opponent.

The idea of the Elo rating measure can be transferred into the Machine Learning world. The EPP — a concept of Elo for ranking ML models, is in the diagram below. Colors represent machine learning algorithms, gradients represent sets of hyperparameters, border styles represent data sets.

One can think of the ratings of models as ratings of players in the tournaments with the Elo system. Each data set is a tournament. Each algorithm can have different values of hyperparameters, such as countries that have players who represent them. Sets of hyperparameters (players) are compared on different data sets (tournaments) divided into train/test splits (rounds). There might be only one split such as in VTAB. The measures of model performances on test splits (results of matches) are aggregated into the Elo ratings. Elo for machine learning models we call EPP due to differences in the way the rankings are estimated.

EPP Ranks for VTAB

Below, we show the comparison of the mean score and EPP for models included in the Visual Adaptation Benchmark. Each black dot represents one model, the overall trend for the mean score and EPP is similar, however, there are some differences in the rankings. For example, Semi-Rotation-10% has a higher mean than Rotation, but lower EPP. It is caused by the fact that EPP only takes into account whether a model was better or worse than another, while the mean depends on the difference in results.

As we can see, the mean for the top 2 models is almost the same, but with the EPP scores, we can calculate the probability that on a new data set Sup-Rotation-100% will perform better than Sup-Exemplar-100%. The probability of winning is the inverse logit of the difference of scores. Therefore Sup-Rotation-100% (EPP=3.41) will obtain higher performance than Sup-Exemplar-100% (EPP=3.16) with the probability equals exp(3.41–3.16)/(1+exp(3.41–3.16)) = 0.56.

Filed Under: Machine Learning

Primary Sidebar

website design carmel

Website Design in Carmel: Building an Online Presence That Works

Carmel WordPress Help

Carmel WordPress Help: Expert Support to Keep Your Website Running Smoothly

Stay Ahead: The Latest Tech News and Innovations

Cryptocurrency Market Updates: What’s Happening Now

Emerging Trends in Artificial Intelligence: What to Watch For

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2025 NEO Share

Terms and Conditions - Privacy Policy