Since in colloquial language they have very similar meanings, it is easy to confuse the terms reliability and validity when we talk about science and, specifically, psychometry.

With this text we intend to elucidate the main differences between reliability and validity . We hope that you find it useful in clarifying this very common doubt.

What is reliability?

In psychometry, the concept of “reliability” refers to the accuracy of an instrument ; specifically, the reliability coefficients inform us of the consistency and stability of the measurements taken with that tool.

The higher the reliability of an instrument, the less random and unpredictable errors will occur when using it to measure certain attributes. Reliability excludes predictable errors, i.e. those under experimental control.

According to classical test theory, reliability is the proportion of the variance that is explained by true scores. Thus, the direct score on a test would be composed of the sum of the random error and the true score.

The two main components of reliability are temporal stability and internal consistency . The first concept indicates that scores change little when measured on different occasions, while internal consistency refers to the degree to which the items that make up the test measure the same psychological construct.

Therefore, a high reliability coefficient indicates that test scores fluctuate little internally and over time and, in summary, that the instrument is free of measurement errors .

Definition of validity

When we talk about validity, we mean whether the test correctly measures the construct it is intended to measure. This concept is defined as the relationship between the score obtained in a test and another related measure ; the degree of linear correlation between both elements determines the coefficient of validity.

Similarly, in scientific research, high validity indicates the degree to which the results obtained with a given instrument or in a study can be generalised.

There are different types of validity, depending on the way it is calculated; this makes it a term with very different meanings. Basically we can distinguish between content validity, criterion (or empirical) validity and construct validity .

Content validity defines to what extent the items of a psychometric test are a representative sample of the elements that make up the construct to be evaluated. The instrument must include all the fundamental aspects of the construct; for example, if we want to do an adequate test to measure depression we must necessarily include items that evaluate mood and decreased pleasure.

Criterion validity measures the ability of the instrument to predict aspects related to the feature or area of interest.Finally, construct validity seeks to determine whether the test measures what it purports to measure , for example from convergence with scores obtained in similar tests.

Differences between reliability and validity

Although these two psychometric properties are closely related, the truth is that they refer to clearly differentiated aspects. Let’s see what these differences consist of .

1. The object of analysis

Reliability is a characteristic of the instrument, in the sense that it measures the properties of the items that make it up. In contrast, validity does not refer exactly to the instrument but rather to the generalisations made from the results obtained through it.

2. The information they provide

Although this is a somewhat simplistic way of looking at it, it is generally stated that validity indicates that a psychometric tool actually measures the construct it is intended to measure, while reliability refers to whether it measures it correctly, without error.

3. The way they are calculated

There are basically three procedures used to measure reliability: the two-half method, the parallel-shaped method and the test-retest . The most widely used is the two-half procedure, in which the items are divided into two groups once the test has been answered; the correlation between the two halves is then analysed.

The method of parallel or alternative forms consists of creating two equivalent tests to measure the extent to which the items correlate with each other. The test-retest is simply based on passing the test twice, under conditions as similar as possible. Both procedures can be combined, giving rise to the parallel form test-retest, which consists of leaving an interval of time between the first form of the test and the second.

Validity is calculated in different ways depending on the type , but in general all methods are based on the comparison between the score in the target test and other data from the same subjects in relation to similar traits; the aim is that the test can act as a predictor of the trait.

Among the methods used to assess validity are factorial analysis and multi-method-multi-trait matrix technique. Furthermore, content validity is often determined by rational, non-statistical analysis; for example, it includes apparent validity, which refers to the subjective judgement of experts on the validity of the test.

4. The relationship between both concepts

The reliability of a psychometric instrument influences its validity: the more reliable it is, the greater its validity . Therefore, the validity coefficients of a tool are always lower than those of reliability, and validity informs us indirectly about reliability.