Scatter diagrams and bivariate data
Bivariate data: Data with two variables (e.g., height and weight). Plotted as points (x,y) on a scatter diagram.
Reading scatter diagrams: Each point is one observation. Position shows relationship between variables.
| Pattern | Meaning |
|---|---|
| Points trend upward | Positive correlation |
| Points trend downward | Negative correlation |
| Points scattered random | No correlation/weak |
Always plot bivariate: Plotting reveals patterns that raw numbers hide (e.g., Anscombe quartet).
Types of correlation
Correlation strength: Strong: points follow clear line pattern. Weak: points scattered. None: no pattern.
| Type | r value | Scatter pattern |
|---|---|---|
| Strong positive | 0.7 to 1.0 | Points close to upward line |
| Moderate positive | 0.4 to 0.7 | Scattered but trend up |
| Weak positive | 0.0 to 0.4 | Very scattered, slight up |
| Strong negative | -0.7 to -1.0 | Points close to downward line |
| No correlation | Near 0 | Random scatter |
Pearson correlation coefficient r: Measures strength and direction. Always between -1 and +1.
See how examiners mark answers
Access past paper questions with model answers. Learn exactly what earns marks and what doesn't.
Interpreting correlation
Worked example
Dataset: height (cm) vs weight (kg) for 10 students. Scatter shows clear upward trend with r=0.85. What does this mean?
Interpretation
- r=0.85 is positive (height increases, weight increases)
- 0.85 is close to 1 (strong correlation)
- Points cluster near a line - predictable relationship
- But this is not causation: tall students dont cause weight gain
Final answer
Strong positive correlation exists, but we cannot conclude causation.
Correlation vs causation: Strong correlation does NOT mean one variable causes the other. Both may depend on a third variable.
Outliers and their influence
Effect of outliers: One extreme point can dramatically change correlation coefficient and line of best fit.
Worked example
Dataset 1: r=0.90 (clean). Dataset 2: same data plus one outlier point far from trend. New r=0.50. Why did r drop so much?
Explanation
- Outlier is extreme value far from main pattern
- r measures how well points fit overall trend
- One outlier increases scatter, weakening correlation
- Always identify and consider removing outliers
Final answer
Outliers can mask or exaggerate true relationships. Report correlation with and without outliers.