Borrowed from the domain of statistics, linear regression is a handy model with emerging popularity in machine learning algorithms. Particularly useful for predictive analytics, the goal is to make the most accurate predictions possible based on historical data. Linear regression models the relationship between independent and dependent variables.

When one dependent variable is being evaluated, the process is termed **simple linear regression**; when more than one is considered, the process is called **multiple linear regression**. Thankfully, the process to run both scenarios is possible on datasets imported in R.

In this particular case study, I wanted to see if there was a significant linear relationship between the number of fish meals consumed per week and the total mercury levels found amongst fishermen. The dataset used in this analysis is attached as an appendix item at the end of the article. Since we have data between two variables only, I looked at applying a simple linear regression model to the dataset in question.

This article focuses on practical steps for conducting linear regression in R, so there is an assumption that you will have prior knowledge related to linear regression, hypothesis testing, ANOVA tables and confidence intervals. In case you require additional background on these topics, I recommend checking out the tutorials listed at the end of this article on the prior-mentioned topics.

**Step 1: Save the data to a file (excel or CSV file) and read it into R memory for analysis**

This step is completed by following the steps below.

1. Save the CSV file locally on desktop

2. In RStudio, navigate to “Session” -> “Set Working Directory” ->“Choose Directory” -> Select folder where the file was saved in Step 1

3. In RStudio, run the commands:

*data <- read.csv(“*fisherman_mercury_levels.csv*”)*

*attach(data)*

**Step 2: To get a sense of the data, generate a scatterplot. Consciously decide which variable should be on the x -axis and which should be on the y-axis. Using the scatterplot, evaluate the form, direction, and strength of the association between the variables.**