• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
  • Contact
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

Data Representation Techniques to Supercharge your ML model — Part I

March 6, 2021 by systems

How to do feature engineering beyond scaling and one-hot encoding

Rezatama Fathullah
Photo by Annie Spratt on Unsplash

Being a data scientist is like being a craftsman. You are equipped with a set of tools and required to create something beautiful yet functional out of simple material. Some of your work might be done by automatic machinery, but you know that your true power is creativity and intricate skill to work by hand.

In this series, we will hone your skillset by exploring several approaches to represent data as a feature. These approaches could improve the learnability of your model especially if you have tons of data in hand.

Imagine that you have data with the following pattern, where the horizontal axis represents a feature X₁, and the vertical axis represents another feature X₂ and each instance (point in the plot) can only belong to either -1 or 1 group (represented by red and green).

source: jp.mathwork.com

Now let me challenge you to draw a linear boundary that can separate the different classes on the data. I bet you can’t, and indeed this is an example of non linearly separable data.

There are several ways to handle this kind of data like using inherently non-linear classification models such as decision tree or a complex neural network. However, there is a simple technique we can use to make a simple linear classifier work very well on this kind of data.

Here is the trick, first, let’s discretize our continuous features into two buckets according to the colour shown on the plot. For X₁, let A denote its positive values and B denote its negative counterpart. Similarly, let C denote positive values of X₂, and D denote negative values of X₂.

Then, we can create a new categorical feature by combining all possible combinations of our newly created buckets.

  • AD = {X₁ > 0 and X₂ < 0}
  • AC = {X₁ > 0 and X₂ > 0}
  • BC ={X₁ < 0 and X₂ > 0}
  • BD ={X₁ < 0 and X₂ < 0}

With this brand new feature, we can now easily classify the class of an instance by only using simple binary classification.

the variables here are indicator function which has value either 0 or 1

If you are careful enough, you can immediately see the appropriate set of weights are the labels themselves (w₀ = -1, w₁ = 1, w₂ = 1, w₃ = -1).

The transformation we just did is an example of feature crosses where we concatenate multiple features into a new one to help our model learning better.

Filed Under: Machine Learning

Primary Sidebar

Stay Ahead: The Latest Tech News and Innovations

Cryptocurrency Market Updates: What’s Happening Now

Emerging Trends in Artificial Intelligence: What to Watch For

Top Cloud Computing Services to Secure Your Data

The Future of Mobile Technology: Recent Advancements and Predictions

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2025 NEO Share

Terms and Conditions - Privacy Policy