Within behavioural procedures, operative or instrumental conditioning is probably the one with the most numerous and varied applications.
From the treatment of phobias to overcoming addictions such as smoking or alcoholism, the operating scheme allows to conceptualize and modify virtually any habit from the intervention on a few elements.
But what exactly is operant conditioning? In this article we review the key concepts to understand this paradigm and detail its most frequent applications, both to increase behaviors and to reduce them.
History of operant conditioning
Operant conditioning as we know it was formulated and systematized by Burrhus Frederic Skinner based on ideas previously put forward by other authors.
Ivan Pavlov and John B. Watson had described classical conditioning, also known as simple or Pavlovian conditioning.
For his part, Edward Thorndike introduced the law of effect, the clearest antecedent to operative conditioning. The law of effect states that if a behavior has positive consequences for the person who performs it, it will be more likely to be repeated, while if it has negative consequences this probability will decrease. In the context of Thorndike’s work, operant conditioning is called “instrumental”.
Difference between classical and operant conditioning
The main difference between classical and operant conditioning is that the former refers to learning information about a stimulus, while the latter implies learning about the consequences of the response .
Skinner believed that behavior was much easier to modify if its consequences were manipulated than if stimuli were simply associated with it, as in classical conditioning. Classical conditioning is based on the acquisition of reflex responses, which explains a smaller amount of learning and its uses are more limited than those of the operator, since the operator refers to behaviors that the subject can control at will.
Concepts of Operant Conditioning
We will now define the basic concepts of operant conditioning to better understand this procedure and its applications.
Many of these terms are shared by behavioural orientations in general, although they may have specific connotations within the operating paradigm.
Instrumental or Operant Response
This term designates any behaviour that entails a certain consequence and is liable to change as a result of it. Its name indicates that it serves to obtain something (instrumental) and that it acts on the medium (operant) instead of being provoked by it, as in the case of classical or respondent conditioning.
In behavioral theory the word “response” is basically equivalent to “behavior” and “action”, although “response” seems to refer more to the presence of background stimuli.
In behavioral and cognitive-behavioral psychology a consequence is the result of a response. The consequence may be positive (reinforcement) or negative (punishment) for the subject carrying out the behaviour; in the first case the probability of the response being given will increase and in the second case it will decrease.
It is important to take into account that the consequences affect the response and, therefore, in the operating conditioning what is reinforced or punished is that behavior, not the person or the animal that carries it out. At all times, work is carried out with the intention of influencing the way in which stimuli and responses are related , since from a behavioural philosophy, we avoid starting from an essentialist vision of people, placing more emphasis on what can change than on what always seems to remain the same.
This term designates the consequences of behaviours when they make it more likely to recur. Reinforcement can be positive, in which case we will be talking about obtaining a reward or prize for the execution of a response, or negative, which includes the disappearance of aversive stimuli.
Within the negative reinforcement we can distinguish between avoidance and escape responses . Avoidance behaviours prevent the appearance of an aversive stimulus; for example, a person with agoraphobia who does not leave the house because he does not feel anxiety is avoiding this emotion. On the other hand, escape responses make the stimulus disappear when it is already present.
The difference with the word “enhancer” is that it refers to the event that occurs as a result of the conduct rather than the procedure of rewarding or punishing. Therefore, “reinforcer” is a term that is closer to “reward” and “prize” than to “reinforcement”.
A punishment is any consequence of a determined behavior that decreases the probability of its repetition.
Like reinforcement, punishment can be positive or negative. Positive punishment corresponds to the presentation of an aversive stimulus after the response has occurred, while negative punishment is the withdrawal of an appetite stimulus as a result of the behavior.
Positive punishment can be related to the general use of the word “punishment”, while negative punishment refers more to some kind of sanction or fine. If a child does not stop shouting and gets slapped by his mother to shut up, he is being punished positively, while if he takes away the console he is playing instead, he is being punished negatively.
Discriminatory and delta stimulation
In Psychology, the word “stimulus” is used to designate events that provoke a response from a person or animal. Within the operating paradigm, the discriminative stimulus is one whose presence indicates to the subject of learning that if he carries out a certain behaviour it will have as a consequence the appearance of a reinforcer or a punishment .
On the other hand, the expression “delta stimulus” refers to those signals that, being present, inform that the execution of the response will not entail consequences.
What is operant conditioning?
Instrumental or operant conditioning is a learning procedure based on the fact that the probability of a given response being given depends on the expected consequences . In operant conditioning, behaviour is controlled by discriminative stimuli present in the learning situation that convey information about the probable consequences of the response.
For example, an “Open” sign on a door tells us that if we try to turn the knob it will most likely open. In this case the sign would be the discriminative stimulus and the opening of the door would work as a positive reinforcement of the instrumental response of turning the knob.
B. F. Skinner’s applied behavioral analysis
Skinner developed techniques of operant conditioning that are included in what we know as “applied behavior analysis”. This has proved to be particularly effective in the education of children, with a special emphasis on children with developmental difficulties.
The basic scheme of applied behavioural analysis is as follows. Firstly, a behavioural goal is set, which will consist of increasing or reducing certain behaviours. Based on this, the behaviours to be developed will be reinforced and the existing incentives to carry out the behaviours to be inhibited will be reduced.
In general the removal of reinforcers is more desirable than positive punishment since it generates less rejection and hostility from the subject. However, punishment can be useful in cases where the problem behaviour is very disruptive and requires a rapid reduction, for example if violence occurs.
Throughout the process it is essential to monitor progress systematically so that we can objectively check whether the desired objectives are being achieved. This is done mainly through data recording.
Operating techniques for developing behaviors
Given the importance and effectiveness of positive reinforcement, operative techniques to increase behavior have a proven usefulness. Below we will describe the most relevant of these procedures.
1. Instigation techniques
Instigation techniques are considered to be those that depend on the manipulation of discriminatory stimuli to increase the probability of behaviour occurring.
This term includes instructions that increase certain behaviors, physical guidance, which consists of moving or placing parts of the trained person’s body, and modeling, in which a model is observed performing a behavior in order to imitate it and learn what the consequences are. These three procedures have in common that they focus on directly teaching the subject how to perform a given action , either verbally or physically.
It consists of gradually bringing a given behavior closer to the target behavior, starting with a relatively similar response that the subject can make and modifying it little by little. It is carried out by steps (successive approximations) to which reinforcement is applied .
Molding is considered especially useful in establishing behaviors in subjects who cannot communicate verbally, such as people with profound intellectual disabilities or animals.
Fading refers to the gradual withdrawal of aids or instigators that had been used to reinforce a target behaviour. It is intended that the subject consolidates a response and can subsequently carry it out without the need for external help.
It is one of the key concepts of operant conditioning , since it allows the progress made in therapy or training to be generalized to many other areas of life.
This procedure basically consists of replacing a discriminatory stimulus with a different one.
A behavioural chain, i.e. a behaviour composed of several simple behaviours, is separated into different steps (links). The subject must then learn to execute the links one by one until the complete chain is achieved.
Chaining can be done forward or backward and has the peculiarity that each link reinforces the previous one and works as a discriminating stimulus of the next one.
In certain aspects, many of the skills that are considered talents because they show a high degree of skill and specialization in them (such as playing a musical instrument very well, dancing very well, etc.) can be considered the result of some form of enchainment, since from the basic skills one progresses to others that are much more worked on.
5. Reinforcement programs
In an operative learning procedure, reinforcement programs are the guidelines that establish when behavior will be rewarded and when it will not.
There are two basic types of reinforcement programs: reasoning and interval programs. In reason programs, the booster is obtained after a specific number of responses are given, while in interval programs this happens after a certain time has passed since the last booster behavior and it occurs again.
Both program types can be fixed or variable, indicating that the number of responses or the time interval required to obtain the booster can be constant or fluctuate around an average value. They can also be continuous or intermittent; this means that the reward can be given every time the subject carries out the target behaviour or from time to time (though always as a result of a desired response being given).
The continuous reinforcement is more useful for establishing behaviours and the intermittent one for maintaining them. Thus, theoretically a dog will learn to give the paw faster if we give it a prize every time it offers us the paw, but once it has learned the behaviour it will be more difficult for it to stop if we give it the booster one out of every three or five attempts.
Operating techniques to reduce or eliminate behavior
When applying operative techniques to reduce behaviour it is advisable to keep in mind that, as these procedures can be unpleasant for subjects, it is always preferable to use the least aversive ones when possible. Likewise these techniques are preferable to positive punishments .
Below is a list of these techniques in order of lowest to highest potential for generating aversion.
A behaviour that had been reinforced previously is no longer rewarded. This decreases the probability that the response will occur again. Formally, extinction is the opposite of positive reinforcement.
In the long term, extinction is more effective in eliminating responses than punishment and the rest of the operating techniques for reducing behaviour, although it may be slower.
A basic example of extinction is getting a child to stop kicking by simply ignoring it until he realizes that his behavior does not have the desired consequences (e.g. parental anger, which would work as a reinforcer) and gets fed up.
2. Skip training
In this procedure, the subject’s behavior is followed by the absence of the reward; that is, if the answer is given, the booster will not be obtained . An example of the omission training could be that some parents prevent their daughter from watching TV that night because she has spoken to them in a disrespectful way. Another example would be not going to buy the toys the children ask for, if they misbehave.
In educational settings, it also serves to encourage that the efforts that other people make to please the children are valued more, and that the children, having become accustomed to these treatments, do not value.
3. Differential reinforcement programs
They are a special subtype of reinforcement program that is used to reduce (not eliminate) target behaviors by increasing other alternative responses. For example, a child could be rewarded for reading and exercising rather than playing a game if the latter behaviour is intended to lose reinforcing value.
In the differential reinforcement of low rates, the response is reinforced if a certain period of time passes since the last time it occurred. In the omitted differential reinforcement, the reinforcement is obtained if, after a certain period of time, the response has not occurred. The differential reinforcement of incompatible behaviours consists of reinforcing responses that are incompatible with the problem behaviour ; this last procedure is applied to tics and onychophagy, among other disorders.
4. Response cost
Variation of the negative punishment in which the execution of the problem behaviour causes the loss of a booster . The point card for drivers that was introduced in Spain a few years ago is a good example of a response cost programme.
Time out consists of isolating the subject, usually children, in a non-stimulating environment in case the problematic behaviour occurs. Also a variant of the negative punishment, it differs from the response cost in that what is lost is the possibility of accessing the reinforcement , not the reinforcer itself.
The reinforcement obtained by carrying out the behaviour is so intense or substantial that it loses the value it had for a subject. This can take place by response satiation or mass practice (repeating the behaviour until it is no longer appetizing) or by stimulus satiation (the reinforcer loses his appetite due to excess).
Over-correction consists of applying a positive punishment related to the problem behaviour . For example, it is widely used in cases of enuresis, in which the child is asked to wash the sheets after urinating on them during the night.
Contingency management techniques
Contingency organisation systems are complex procedures through which some behaviours can be reinforced and others punished .
The token economy is a well-known example of this type of technique. It consists in giving out tokens (or other equivalent generic reinforcers) as a reward for the performance of the target behaviors; subjects can then exchange their tokens for variable value prizes. It is used in schools, prisons and psychiatric hospitals.
Behavioral or contingency contracts are agreements between several people, usually two, by which they agree to perform (or not perform) certain behaviors. The contracts detail the consequences if the agreed conditions are met or not met.
- Domjam, M. (2010). Basic principles of learning and behavior. Madrid: Thomson.
- Labrador, F. J. (2008). Behavior modification techniques. Madrid: Pirámide.