• Skip to main content
  • Skip to secondary menu
  • Skip to primary sidebar
  • Skip to footer
  • Home
  • Crypto Currency
  • Technology
  • Contact
NEO Share

NEO Share

Sharing The Latest Tech News

  • Home
  • Artificial Intelligence
  • Machine Learning
  • Computers
  • Mobile
  • Crypto Currency

Machine Learning Applied to Mammogram Classification

January 21, 2021 by systems

Step 1 — Data exploration

The data contains 961 instances of masses detected in mammograms, and contains the following attributes:

  1. BI-RADS assessment (ordinal) — Assessment of how confident the severity classification is, ranked from 1 to 5.
  2. Age (integer) — Patient’s age in years.
  3. Mass shape (nominal) — round=1 oval=2 lobular=3 irregular=4
  4. Mass margin (nominal) — circumscribed=1; micro-lobulated=2; obscured=3; ill-defined=4; spiculated=5
  5. Mass density (ordinal) — high=1; iso=2; low=3; fat-containing=4
  6. Severity (binomial) — benign=0 or malignant=1

Here are some statistics of each feature:

Fig. 1 — Features information

Step 2 — Handling missing values

In Figure 1, we can observe there are quite a few missing values in the dataset (2 for “BI-RADS”, 5 for “age”, 31 for “shape”, 48 for “margin” and 76 for “density”).

Before dropping every row that’s missing data, it is important to make sure we don’t bias our data by doing so.

Let’s look at how missing values are distributed (Fig. 2 shows the missing values distribution for “age”). If it appears there are any sort of correlation to what sort data has missing fields, we’d have to impute that data in with a suitable method (eg. KNN, MICE).

Fig. 2 — “Age” missing values distribution

In our case, missing data seems randomly distributed. We can therefore move on and drop rows containing missing values:

Fig. 3 — Features information (missing values dropped)

Step 3 — Feature selection

Now, data must be split into two arrays:

  1. A multi-dimensional input array (X) containing values of features relevant to predict the output. In our case, relevant features are age, shape, margin and density. The attribute BI-RADS (assessment of how confident the severity classification is) is dropped because it is not a “predictive” attribute.
  2. A 1D array (Y) containing classification data (values of the feature ‘severity’).
Fig. 4 — Input data matrix (X) and classification data matrix (Y)

Step 4 — Normalization

Finally, some models require input data to be normalized so let’s go ahead and normalize our matrix X:

Fig. 5 — Normalized input data matrix (X)

Filed Under: Machine Learning

Primary Sidebar

Carmel WordPress Help

Carmel WordPress Help: Expert Support to Keep Your Website Running Smoothly

Stay Ahead: The Latest Tech News and Innovations

Cryptocurrency Market Updates: What’s Happening Now

Emerging Trends in Artificial Intelligence: What to Watch For

Top Cloud Computing Services to Secure Your Data

Footer

  • Privacy Policy
  • Terms and Conditions

Copyright © 2025 NEO Share

Terms and Conditions - Privacy Policy