If you’ve studied psychology or other related careers, you may be familiar with the concept of reliability. But… what exactly is it? Reliability in psychometry is a quality or property of measuring instruments (for example, tests), which allows you to verify whether they are accurate, consistent and stable in their measurements.

In this article we tell you what this property consists of, we name some examples to clarify the concept and we explain the different ways to calculate the reliability coefficient in psychometry.

What is reliability in psychometry?

Reliability is a concept included in psychometry, the discipline in charge of measuring the psychological variables of the human being through different techniques, methods and tools. Thus, reliability in psychometrics, for example redundancy, consists of a psychometric property, which implies the absence of measurement errors of a certain instrument (for example, a test).

It is also known as the degree of consistency and stability of the scores obtained in different measurements through the same instrument or test. Another synonym for reliability in psychometry is “accuracy” . Thus, we say that a test is reliable when it is precise, it does not present errors and its measurements are stable and consistent throughout repeated measurements.

Beyond reliability in psychology, in what fields does this concept appear and be used? In different fields, such as social research and education.

Examples

To better illustrate what this psychometric concept consists of, let’s think about the following example: we use a thermometer to measure the daily temperature in a classroom. We take the measurement at ten o’clock every morning for a week.

We will say that the thermometer is reliable (it has a high reliability) if, when doing more or less every day the same temperature, the thermometer indicates it (that is to say, the measurements are close to each other, there are not big jumps or big differences).

On the other hand, if the measurements are totally different from each other (the temperature being more or less the same every day), it will mean that this instrument does not have good reliability (because its measurements are not stable or consistent over time).

Another example to understand the concept of reliability in psychometry: imagine that we weigh a basket with three apples every day, for several days, and we write down the results. If these results vary a lot throughout the successive measurements (that is, as we repeat them), this would indicate that the reliability of the scale is not good, since the measurements would be inconsistent and unstable (the antagonists of reliability).

Thus, a reliable instrument is one that shows consistent and stable results in repeated measurement processes of a given variable.

The variability of measurements

How do we know if an instrument is reliable? For example, from the variability of its measurements. That is, if the scores we obtain (by measuring the same thing repeatedly) with that instrument are very variable with each other, we will consider that its values are not precise, and that therefore the instrument is not reliable (not trustworthy).

Extrapolating this to the psychological tests and to the responses of a subject to one of them, we see how the fact that the subject answered the same test in the same conditions, repeatedly, would provide us with an indicator of the reliability of the test, based on the variability in the scores .

The calculation: reliability coefficient

How do we calculate reliability in psychometry? From the reliability coefficient, which can be calculated in two different ways: from procedures involving two applications or just one. Let’s see the different ways to calculate it, within these two big blocks:

1. Two applications

In the first group we find the different ways (or procedures) that allow us to calculate the reliability coefficient from two applications of a test . Let’s get to know them, as well as their disadvantages:

1.1. Parallel or equivalent forms

With this method, we obtain the measure of reliability, in this case also called “equivalence”. The method consists of applying, simultaneously, the two tests: X (the original test) and X’ (the equivalent test we have created). The disadvantages of this procedure are basically two: the fatigue of the examinee and the construction of two tests.

1.2. Test-retest

The second method, within the procedures for calculating the reliability coefficient from two applications, is the test-retest, which allows us to obtain the stability of the test. It basically consists of applying a test X, letting a period of time pass, and applying the same test X again to the same sample .

The disadvantages of this procedure are: the learning that the examined subject may have acquired in that lapse of time, the person’s evolution, which may alter the results, etc.

1.3. Test-retest with alternative forms

Finally, another way of calculating reliability in psychometrics is to start from the test-retest with alternative forms. This is a combination of the two previous procedures , so, although it may be useful in certain cases, it accumulates the disadvantages of both.

The procedure consists of administering test X, allowing a time lapse, and administering test X’ (i.e. the equivalent test created from the original, X).

2. A single application

On the other hand, the procedures for calculating reliability in psychometry (reliability coefficient) from a single application of the test or measurement instrument are divided into two subgroups: the two halves and the covariance between items. Let’s look at it in more detail, so that it can be better understood:

2.1. Two halves

In this case, simply divides the test in two . Within this section, we find three types of procedures (ways of dividing the test):

  • Parallel shapes: the Spearman-Brown formula is applied.
  • Equivalent forms: the Rulon or Guttman-Flanagan formula is applied.
  • Generic forms: Raju’s formula is applied.

2.2. Covariance between items

The covariance between items involves analysing the relationship between all the items in the test . Within it, we also find three methods or formulas typical of psychometry:

Croanbach’s alpha coefficient: its value is between 0 and 1.
Kuder-Richardson (KR20): it is applied when the items are dichotomous (i.e. when they acquire only two values).
Guttman.

3. Other methods

Beyond the procedures that involve one or two applications of the test to calculate the reliability coefficient, we find other methods, such as: the interjudge reliability (which measures the consistency of the test), the Hoyt method, etc.

Bibliographic references:

  • Kaplan, R.M., & Saccuzzo, D.P. (2010). Psychological Testing: Principles, Applications, and Issues. (8th edition). Belmont, CA: Wadsworth, Cengage Learning.
  • Martinez, M.A., Hernandez, M.J. and Hernandez, M.V. (2014). Psychometry. Madrid: Alianza.
  • Martínez Arias, R. (2006). Psychometry. Madrid: Anaya.
  • Morales Vallejo, Pedro (2007). Statistics applied to the social sciences. The reliability of tests and scales. Madrid: Universidad Pontificia Comillas. p. 8.
  • Prieto, Gerardo; Delgado, Ana R. (2010). Reliability and validity. Papers of the psychologist (Spain: Consejo General de Colegios Oficiales de Psicólogos) 31(1): 67-74.