Pearson’s correlation coefficient: what it is and how it is used
When researching psychology, descriptive statistics are frequently used, offering ways to present and evaluate the main characteristics of the data through tables, graphs and summary measures.
In this article we will know the Pearson’s correlation coefficient , a measure of descriptive statistics. It is a linear measure between two quantitative random variables, which allows us to know the intensity and direction of the relationship between them.
Descriptive statistics
The Pearson correlation coefficient is a type of coefficient used in descriptive statistics. Specifically, is used in descriptive statistics applied to the study of two variables .
Descriptive statistics (also called exploratory data analysis) brings together a set of mathematical techniques designed to obtain, organize, present and describe a set of data, in order to facilitate their use. In general, it uses tables, numerical measurements or graphs as support.
Pearson’s correlation coefficient: what is it for?
Pearson’s correlation coefficient is used to study the relationship (or correlation) between two quantitative random variables (minimum interval scale); for example, the relationship between weight and height.
It is a measure that gives us information about the intensity and direction of the relationship . In other words, it is an index that measures the degree of covariation between different linearly related variables.
We must be clear about the difference between relationship, correlation or covariation between two variables (= joint variation) and causality (also called prognosis, prediction or regression), since they are different concepts.
How do you interpret it?
Pearson’s correlation coefficient comprises values between -1 and +1 . Thus, depending on its value, it will have one meaning or another.
If Pearson’s correlation coefficient is equal to 1 or -1, we can consider that the correlation between the studied variables is perfect.
If the coefficient is greater than 0, the correlation is positive (“A plus, plus, and minus). If it is less than 0 (negative), the correlation is negative (“A plus, minus, and to minus, plus). Finally, if the coefficient is equal to 0, we can only state that there is no linear relationship between the variables, but there may be some other type of relationship.
Considerations
Pearson’s correlation coefficient increases if the variability of X and/or Y (the variables) increases, and decreases in the opposite case. On the other hand, to affirm whether a value is high or low, we must compare our data with other investigations with the same variables and in similar circumstances .
To represent the relationships of different variables that combine linearly, we can use the so-called matrix of variances-covariances or the matrix of correlations; in the diagonal of the first one we will find values of the variance, and in the one of the second one we will find ones (the correlation of a variable with itself is perfect, =1).
Coefficient squared
When we square the Pearson correlation coefficient, its meaning changes , and we interpret its value in relation to the predictions (indicates causality of the relationship). That is, in this case, it can have four interpretations or meanings:
1. Associated Variance
Indicates the proportion of the variance of Y (one variable) associated with the variation of X (the other variable). Therefore, we will know that “1-Pearson squared coefficient” = “the proportion of the variance of Y that is not associated with the variation of X”.
2. Individual differences
If we multiply the correlation coefficient of Pearson x100, it will indicate the % of the individual differences in Y that are associated / dependent on / are explained by the variations or individual differences in X . Therefore, “1-Pearson squared x 100” = % of the individual differences in Y that are not associated / dependent on / are explained by the individual variations or differences in X.
3. Error reduction rate
The Pearson squared correlation coefficient can also be interpreted as an index of the reduction of error in forecasts ; that is, it would be the proportion of the mean square error eliminated using Y’ (the regression line, elaborated from the results) instead of the mean of Y as a forecast. Here again, the coefficient would be multiplied by 100 (indicating the %).
Therefore, “1-square Pearson coefficient” = error that is still made when using the regression line instead of the mean (always multiplied by 100 = indicates the %).
4. Index of approximation of points
Finally, the last interpretation of Pearson’s correlation coefficient squared would indicate the approximation of the points to the commented regression line. The higher the value of the coefficient (closer to 1), the closer the points are to Y’ (to the line).
Bibliographic references:
- Botella, J. Sueró, M. Ximénez, C. (2012). Data analysis in psychology I. Madrid: Pirámide.
- Lubin, P. Macià , A. Rubio de Lerma, P. (2005). Mathematical Psychology I and II. Madrid: UNED.
- Pardo, A. San MartÃn, R. (2006). Data analysis in psychology II. Madrid: Pirámide.