

There is quite a bit of scatter, but there are many observations, and there is a clear linear trend. It suggests a weak (r=0.36), but statistically significant (p<0.0001) positive association between age and systolic blood pressure. The scatter plot below illustrates the relationship between systolic blood pressure and age in a large number of subjects. The four images below give an idea of how some correlation coefficients might look on a scatter plot. Also, keep in mind that even weak correlations can be statistically significant, as you will learn shortly.
#WEAK NEGATIVE CORRELATION SCATTER PLOT HOW TO#
The table below provides some guidelines for how to describe the strength of correlation coefficients, but these are just guidelines for description. 0.2917043 Describing Correlation Coefficients For example, we could use the following command to compute the correlation coefficient for AGE and TOTCHOL in a subset of the Framingham Heart Study as follows: Instead, we will use R to calculate correlation coefficients. You don't have to memorize or use these equations for hand calculations. Where Cov(X,Y) is the covariance, i.e., how far each observed (X,Y) pair is from the mean of X and the mean of Y, simultaneously, and and s x 2 and s y 2 are the sample variances for X and Y. Nevertheless, the equations give a sense of how "r" is computed. We will use R to do these calculations for us. However, you do not need to remember these equations. The equations below show the calculations sed to compute "r". The scatter plot suggests that measurement of IQ do not change with increasing age, i.e., there is no evidence that IQ is associated with age.Ĭalculation of the Correlation Coefficient

Possible values of the correlation coefficient range from -1 to +1, with -1 indicating a perfectly linear negative, i.e., inverse, correlation (sloping downward) and +1 indicating a perfectly linear positive correlation (sloping upward).Ī correlation coefficient close to 0 suggests little, if any, correlation. Since 10mm is much higher than the highest rainfall recorded, we cannot assume that the line of best fit would still follow the pattern when the rainfall is 10mm, so the value of 64 umbrellas is not a reliable estimate.The sample correlation coefficient (r) is a measure of the closeness of association of the points in a scatter plot to a linear regression line based on those points, as in the example above for accumulated saving over time. This process is called extrapolation, because the value we are using is outside the range of data used to draw the scatter graph. This gives a value of approximately 64 umbrellas sold.

If there was 10mm of rainfall, we could extend the graph and the line of best fit to read off the number of umbrellas sold. Draw a line by going across from 3 mm and then down.Īn estimated 19 umbrellas would be sold if there was 3 mm of rainfall. The value of 3mm is within the range of data values that were used to draw the scatter graph.įind where 3 mm of rainfall is on the graph. To estimate the number sold for 3mm of rainfall, we use a process called interpolation. For example, how many umbrellas would be sold if there was 3mm of rainfall? What if there was 10mm of rainfall? The line of best fit for the scatter graph would look like this: Interpolation and extrapolationįrom the diagram above, we can estimate how many umbrellas would be sold for different amounts of rainfall. It should also follow the same steepness of the crosses. Lines of best fitĪ line of best fit is a sensible straight line that goes as centrally as possible through the coordinates plotted. No correlation means there is no connection between the two variables. Negative correlation means as one variable increases, the other variable decreases. Positive correlation means as one variable increases, so does the other variable. Graphs can either have positive correlation, negative correlation or no correlation. If data plotted on a scatter graph shows correlation, we cannot assume that the increase in one of the sets of data caused the increase or decrease in the other set of data – it might be coincidence or there may be some other cause that the two sets of data are related to. However, it is important to remember that correlation does not imply causation. On days with higher rainfall, there were a larger number of umbrellas sold. The graph shows that there is a positive correlation between the number of umbrellas sold and the amount of rainfall. The number of umbrellas sold and the amount of rainfall on 9 days is shown on the scatter graph and in the table. Scatter graphs are a good way of displaying two sets of data to see if there is a correlation, or connection.
