Skip to Main Content

Data & Digital Scholarship Tutorials

Introduction

This tutorial will guide researchers through creating a scatter plot using Microsoft Excel.

We will cover the following material:

  • Create a Basic Scatter Plot
  • Direct vs. Inverse Relationship
  • Regression (Trend) Line
  • Dynamic Dropdown Menus for X & Y Axes (time and interest permitting)

 

Tutorial Procedures

Download the sample spreadsheet for this tutorial.

This sample data contains the following columns:

  1. County [every county in the United States]
  2. State [state where the county resides]
  3. White [% White]
  4. Black [% Black/African American]
  5. AIAN [% American Indian or Alaskan Native]
  6. Asian [% Asian]
  7. NHOPI [% Native Hawaiian or Pacific Islander]
  8. Hispanic [% Hispanic]
  9. Married [% over 15 population that is married (excludes separated)]
  10. Bachelor-Degree [% over 25 population with a Bachelor degree or higher]
  11. Income [median household income]
  12. Property-Value [median property value of occupied residential properties]

5-Year 2016 ACS Data downloaded from Social Explorer.

Step #1:

Open provided sample data in Excel

Step #2:

Hold CTRL and click J and K along the top of the spreadsheet to select the Bachelor-Degree and Income columns.

Step #3:

Select the Insert Ribbon at the top of Excel and in the Charts category, click the Scatter Plot image.

Step #4:

Move to clear portion of spreadsheet or Cut/Paste to new worksheet.

 

 

Step #1:

Single click the chart on a whitespace. Then in the Design Ribbon, select from the available chart styles.

Step #2:

Single click the chart on a whitespace. Then in the Design Ribbon, click Quick Layout and select Layout 1.

Step #3:

Adjust the title, x-axis label, and y-axis label by single-clicking and typing.

Step #4:

Adjust the legend label by right-clicking on empty space on the chart and click Select Data.

 

Step #5:

Under Legend Entries (Series) on the left, click Edit.

Change the Series Name to ="County"

OK

OK

Step #6:

Adjust axes units by formatting the cells of the respective columns.

Right-click K (above the Income field) and select Format Cells. Change to Currency with 0 Decimal Places.

 

Direct Relationship: As the X-Axis increases, the Y-Axis also increases. This is evidenced by an upward trending positive distribution of points.

Inverse Relationship: As the X-Axis increases, the Y-Axis decreases. This is evidenced by an downward trending negativer distribution of points.

Test!!

Is the relationship between income and college education direct or inverse?

 

 

 

 

What is linear regression?

From http://www.stat.yale.edu/Courses/1997-98/101/linreg.htm

Linear regression attempts to model the relationship between two variables by fitting a linear equation to observed data. One variable is considered to be an explanatory variable, and the other is considered to be a dependent variable.

A linear regression line has an equation of the form Y = a + bX, where X is the explanatory variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the value of y when x = 0).

What is R-Squared?

From http://statisticsbyjim.com/regression/interpret-r-squared-regression/

R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively. R-squared measures the strength of the relationship between your model and the dependent variable on a convenient 0 – 100% scale.

Step #1:

Single-click the chart and click the + and check Trendline.

Then click the triangle to the right of the checkbox.

Step #2:

Right-click the Trendline and select Format Trendline.

  • Under Trendline Name, select Custom and name it Regression Line.
  • Check Display Equation on Chart.
  • Check Display R-Squared Value on Chart.

 

Step #3:

Move the linear equation and R-Squared to another section of the chart.

 

Step #1:

Move the chart so there are at least two empty rows above and two empty columns to the left.

Step #2:
Select a cell above the chart and then in the Data Ribbon, click the Data Validation button.

Step #3:

In the Data Validation menu, change Allow to List.

Under Source click the arrow-up button and select the headers from White to Property-Value and hit Enter.

OK

You now have a dropdown menu!

Step #4:

Copy and Paste the resulting dropdown 2 or 3 cells to the right.

Type X to the left of the first dropdown and Y to the left of the second dropdown.

Step #5:

To the right of Property-Value (should be M-column) type X. To the right of that type Y.

The =SUMIF() function https://support.office.com/en-us/article/sumif-function-169b8c99-c05c-4483-a712-1697a653039b

Step #6:

Select the cell beneath the newly created X column, and then in the formula bar above paste the following but do not hit enter afterwards!

=SUMIF($C$1:$L$1,

  • Then click the dropdown menu representing X
  • Press F4 to add dollar signs so the values will be static (not imcrement)
  • Then paste the following:

,$C2:$L2)

Now hit Enter

Copy the formula down through the entire column

___________

Repeat the above steps for the Y column. The only difference is to click the dropdown representing Y.

Voila!

Step #7:

Delete and recreate chart using the new X and Y columns.

University Libraries

One Bear Place #97148
Waco, TX 76798-7148

(254) 710-6702