In statistics, there are various tests to analyse the relationship between variables. Nominal variables are those that allow for relationships of equality and inequality, such as gender.

In this article we will know one of the tests to analyze the independence between nominal or superior variables: the chi-square test, through hypothesis contrast (goodness-of-fit tests).

What is the chi-square test?

The chi-square test, also called Chi-square (Χ2) , is among the tests belonging to the descriptive statistics, specifically the descriptive statistics applied to the study of two variables. On the other hand, the descriptive statistic focuses on extracting information about the sample. Inferential statistics, on the other hand, extract information about the population.

The name of the test is specific to the Chi-square distribution of the probability on which it is based. This test was developed in 1900 by Karl Pearson .

The chi-square test is one of the best known and most widely used tests to analyse nominal or qualitative variables, i.e. to determine whether or not there is independence between two variables. That two variables are independent means that they are unrelated, and therefore one does not depend on the other, nor vice versa.

Thus, with the study of independence, a method is also created to verify whether the frequencies observed in each category are compatible with the independence between both variables.

How is independence between variables obtained?

To evaluate the independence between the variables, the values that would indicate absolute independence are calculated, which are called “expected frequencies”, by comparing them with the frequencies of the sample .

As usual, the null hypothesis (H0) indicates that both variables are independent, while the alternative hypothesis (H1) indicates that the variables have some degree of association or relationship.

Correlation between variables

Thus, like other tests for the same purpose, the chi-square test is used to see the sense of correlation between two nominal or higher level variables (for example, we can apply it if we want to know if there is a relationship between sex [being male or female] and the presence of anxiety [yes or no]).

To determine this type of relationship, there is a table of frequencies to consult (also for other tests such as Yule’s Q).

If the empirical frequencies and the theoretical or expected frequencies coincide, then there is no relationship between the variables, i.e. they are independent. On the other hand, if they coincide, they are not independent (there is a relationship between the variables, for example between X and Y).

Considerations

The chi-square test, unlike other tests, does not place restrictions on the number of modes per variable, and does not require the number of rows and the number of columns in the tables to match .

However, it does need to be applied to studies based on independent samples, and when all expected values are greater than 5.

In addition, to use the chi-square test, the measurement level must be nominal or higher. It does not have an upper limit, i.e. does not allow us to know the intensity of the correlation . In other words, the chi-square takes values between 0 and infinity.

On the other hand, if you increase the sample, you increase the chi-square value, but we must be cautious in its interpretation, because that does not mean that there is more correlation.

Chi-square distribution

The chi-square test uses an approximation of the chi-square distribution to assess the probability of a discrepancy equal to or greater than that between the data and the expected frequencies under the null hypothesis.

The accuracy of such an assessment will depend on whether the expected values are not too small, and to a lesser extent whether the contrast between them is not too high.

Yacht Correction

Yates’ correction is a mathematical formula that is applied with 2×2 tables and with a small theoretical frequency (less than 10), to correct the possible errors of the chi-square test.

Generally, the Yates correction or also “continuity correction” is applied when a discrete variable approaches a continuous distribution .

.

Hypothesis Contrast

In addition, the chi-square test belongs to the so-called goodness-of-fit or contrast tests , which aim to decide whether the hypothesis that a given sample comes from a population with a probability distribution fully specified in the null hypothesis can be accepted.

The contrasts are based on the comparison of the observed frequencies (empirical frequencies) in the sample with those that would be expected (theoretical or expected frequencies) if the null hypothesis were true. Thus, the null hypothesis is rejected if there is a significant difference between the observed and expected frequencies.

Operation

As we have seen, the chi-square test is used with data belonging to a nominal scale or higher. From chi-square, a null hypothesis is established that postulates a specified probability distribution as the mathematical model of the population that has generated the sample.

Once we have the hypothesis, we must perform the contrast, and to do so we have the data in a table of frequencies . The absolute observed or empirical frequency is indicated for each value or range of values. Then, assuming that the null hypothesis is true, for each value or range of values the expected absolute frequency or expected frequency is calculated.

Interpretation

The chi-square statistic will take a value equal to 0 if there is perfect agreement between the observed and expected frequencies; on the contrary, the statistic will take a large value if there is a large discrepancy between these frequencies , and consequently the null hypothesis should be rejected.

Bibliographic references:

  • Lubin, P. Macià, A. Rubio de Lerma, P. (2005). Mathematical Psychology I and II. Madrid: UNED.
  • Pardo, A. San Martín, R. (2006). Data analysis in psychology II. Madrid: Pirámide.