SHAPING - Learning in Baby Steps
"Training is about rewarding every good try" - Andrew McLean, 2007, the Shaping Principle
ANTONIA J.Z. HENDERSON, EQUINE PSYCHOLOGIST
Canadian Horse Annual 2016
A grey Andalusian stallion gallops at liberty across the Cavalia stage, comes to a sliding stop in front of his handler and dances with her in a breathtaking choreographed sequence. A grand prix dressage horse seems to skip effortlessly through one-tempi changes in time to a Rachmaninov piano concerto. An unbroken semi-feral pony from the research lab at the New Bolton Centre willingly stands for an injection without halter or handler. Incredulous observers ponder "How on earth were these feats accomplished?"
There is a great deal of learning theory jargon that is unnecessarily obscure - reinforcement contingencies, discriminative stimuli and negative reinforcement, to name a few. But learning theorists nailed it with the term shaping. Just as an artist shapes a colourless lump of clay into an exquisite vessel, complex behaviours are shaped in small increments, progressing ever closer to the eventual target behaviour. Shaping forms the foundation of all equine training whether it be with positive reinforcement (+R), negative reinforcement (-R), or even when we had no intention of training our horses anything.
There are some things we want horses to do that they perform of their own accord (standing, moving, urinating, etc.). These behaviours can be "put on cue" by giving the behaviour a name, or an associated signal, and rewarding it when it naturally occurs (+R). This is known as "capturing." After surprisingly few repetitions, the horse will perform the behaviour with the command or signal alone. Similarly, we can reward a behaviour by contriving its occurrence (e.g. pinching a horse's tendon until he picks up a foot), and rewarding it with release (-R).
However, there are many more behaviours that we want from horses that will never occur spontaneously, nor through physical manipulation - jumping over 1.60m obstacles, full canter pirouettes, or racing around a barrel to name a few. This is where shaping comes in. In learning jargon, shaping is the rewarding of closer and closer approximations of the desired behaviour.
McLean stresses that all training, regardless of discipline, is built upon breaking down the task into single trainable units, repeating until the horse offers the correct response to the aid each time, and then building each additional unit until the final outcome is reached. Training units, or "criteria," increase the difficulty, or the duration, or the expected quality of the exercise, and so on. Shaping needs to be consistent, occur in a particular and systematic order, and each component needs to be firmly established before progressing in order to give the horse the opportunity to get the right answer. By making rewards contingent on desired behaviours, and ignoring unwanted responses, increasingly complex behaviours can be molded or shaped.
Violating these principles exerts a cost, not only in extending the time for the horse to learn the desired response, but by encouraging conflict behaviours (such
IN THE BOX
POSITIVE AND NEGATIVE REINFORCEMENT
Horses, like people and most other animals, learn through both positive and negative reinforcement because their actions become associated with consequences.
Positive reinforcement (+R) occurs when a behaviour is strengthened because it is followed by the presentation of a rewarding stimulus. If you feed a horse that is barging and banging at his stall door, you positively reinforce the behaviour and make it more likely that you will see it every time you bring out the grain cart. If you shout at him to be quiet, you may even be able to get him to do it on command, as the shouting will become a cue that stall banging is expected and a food reward will follow.
Negative reinforcement (-R) occurs when a behaviour is strengthened because it is followed by the removal of an aversive or unpleasant stimulus. Many equine scientists have lamented this terminology because of the confusion it fosters when positive and negative are conceived of in their moral sense of good and bad, rather than the intended arithmetical sense of adding and subtracting. Think of -R as providing relief from an aversive event. In the previous example, when we feed the stall-banging horse, the banging ceases, and we have been relieved of the obnoxious event. The horse has trained us through -R, to feed him quickly. Andrew McLean, a leading equine scientist, provides a salient visual of the pressure/release principle behind negative reinforcement in the words of Tom Roberts. Roberts asks, "When you sit on a pin, why do you get off?" Most of us would answer, "Because it hurts," but Tom corrects them, saying "No, you get off because it STOPS hurting."
as tail swishing, balking, shying, rearing, etc.). Chronic conflict states result in negative physiological consequences for horses such as ulcers, poor condition and self-mutilation. McLean reminds us that "When horses give wrong responses, you cannot expect them to know what is right - only you know that" (2007). A shaping program is always modifiable. If it isn't working, shift tactics or reduce criteria. There are many shaping avenues to reach the desired behaviour.
CLICKERS, CLIMATE CHANGE AND SECONDARY REINFORCERS
Primary reinforcers, such as food, water, sex and companionship, are intrinsically rewarding to living creatures because they are critical to survival; we do not need to learn that these events are desirable.
Secondary reinforcers are rewarding because they have been consistently presented together with a primary reinforcer and so come to have an associated value. Even though you cannot eat a $100 bill, it feels delicious because of the strong associations we have about what money can buy - like horses and Hermes accessories.
A clicker, a hand-held device that makes a distinctive sound for which the horse is unlikely to have formed previous associations, has no inherent value to a horse. However, if we consistently pair that sound with a food reward, a horse will come to associate the clicker with good events. The term "Clicker Training" is misleading because of its emphasis on the clicker. If I said "climate change" and always followed that with a food reward, the words "climate change" would become a reliable secondary reinforcer and eventually take on value for my horse.
The advantage of a secondary reinforcer is that we can mark the precise behaviour we want to reward, and eventually introduce delay between click and reward. Shawna Karrasch, a prominent equine trainer specializing in +R training, calls the clicker a bridge signal as it provides a bridge between the desired behaviour and the moment of reward. This is particularly useful when chaining a few behaviours together (such as the Cavalia horse's performance) or when using +R under saddle (I can click and mark the behaviour I like - such as a good jumping effort - and reward when we are no longer airborne).
IN THE BOX
SHAPING A PARTY TRICK WITH +R: THE SMILE
1. If you are looking for party tricks, teach your horse something for which he shows a natural proclivity. Pavan is an oral kind of guy. Smiling came easily to him. I begin by tickling the end of his nose hoping for any movement. You may try putting something odd smelling on your fingers to get him to move his muzzle.
2. As soon as I see any movement, I click and reward. I repeat this several times until I am sure he understands the behaviour I am looking for. I name it "smile," and include a hand gesture. Next, I will up the ante, asking for a little more movement before I click and reward.
3. Eventually he gives me the full smile with a hand gesture alone. End your sessions on a high note, but end them mindfully. Since horses enjoy positive reinforcement training, ending can be construed as a punishment.
IN THE BOX
SHAPING WITH +R: TEACHING YOU HORSE TO STAND
1. I begin by holding Pavan in the large ring. With a novice horse, I would begin in a stall or small pen.
2. Next, I move away one step and give him the verbal command "stand" and a hand signal.
3. I immediately return, click and reward him for standing still. Initially, I make both the duration and the distance away short, so that I have many opportunities to reward the behaviour I want, and few instances of needing to correct him. If he does move, I was probably shaping too quickly. I simply return, reposition him, back up to where he can reliably succeed, and reshape. I build one criterion at a time - either distance away or duration. I want each step solidified before increasing a criterion. There is no downside to building in more rewards than I might have actually needed.
4 As I increase the distance, Pavan is motionless, but very focused on me. He knows that his reward will be coming soon. When introducing a new criterion, I temporarily relax the old ones. Pavan may have learned to stand in the sand ring unattended for five minutes. At a horse show, where the atmosphere raises the criteria, I may begin in kindergarten, and build again.
IN THE BOX
SHAPING WITH +R: AND +R SIMULTANEOUSLY: CLIPPING
1. Pavan is comfortable with the clippers, so I begin with his chin, which he enjoys. With a horse that had a previously learned and established fear of clippers, I would create a schedule of a few minutes a day to reprogram that reinforcement history, backing up the sequence as far away as necessary so as to elicit minimal or no reaction. Since this non-reaction is the target behaviour I want, I remove the clippers (or even turn them off) as soon as I see it, ending the aversive event, and thus rewarding the horse through -R for his non-reactive composure.
When I remove the clippers, I can also click and reward the behaviour that I like (+R). (Note the treat bag on my hip.) The click marks the exact behaviour that I want, so I wait until his head position and attitude are perfectly relaxed. I may need to back up the sequence in order to get the relaxation I am looking for.
2. When I start on his ears, he raises his head and pulls away. Since I only want to reward non-reactive behaviours, and not reward flight responses - even the understated ones - I turn off the clippers, or remove them, only when I see the calm behaviour that I am looking for.
I want to keep the clippers in contact with his ears so that I am not rewarding him, with -R, for this behaviour. Instead, I keep the connection and wait until he lowers his head. (If my horse became more fussed, I would slide my hand further down his neck while still maintaining contact until his fussing subsided, and slowly rebuild). Once I get the lowered head I want, I reward him by removing the clippers (-R), and click and reward (+R). He learns that raising his head does not get rid of the clippers, but that lowering his head does. And, in the process, discovers the clippers are actually not so bad. Gradually, I can build up the clipping time, always removing the clippers and rewarding only when I see the non-reactive behaviour I am after.
3 Now he is leaning into the clippers, so I immediately reward this behaviour - by removing the clippers (-R), clicking and giving him a treat (+R). There is a strong temptation to keep on clipping now that the horse is being compliant, but you will be much more successful to stop often, reinforce the behaviour you like, and firmly establish that his lowered head and quiet disposition are the behaviours you want and will reward. Frequent pauses in the beginning, and slowly building his clipping tolerance, will eventually give you a horse that stands motionless for an entire body clip.
4. I stop clipping and reward. He is a little close to me here and I don't want to reward his pushy behaviour. So, I wait until he moves away. (You may have to step into him or gently push him away.)
5 Now I can reward him.
SHAPING TO REDUCE FEAR RESPONSES
The horse is evolutionary programmed to take scary things seriously, learn them quickly and not forget about them. And flee them. McLean comments "Fear is quickly learned, not easily forgotten, and is strongly associated with the movement of the horse's legs" (2007). Fearful stimuli receive special attention in the amygdala region of the brain, directly triggering the flight response, which activates the horse's entire body. Unlike almost anything else we teach horses to do, which requires numerous repetitions, the flight response can be learned in just one experience, and is extraordinarily resistant to extinction (i.e. it is difficult, probably impossible, to completely eradicate a fear response). Trial and error learning is not evolutionarily adaptive on the savannah where you could well be the next meal for a hungry predator.
Flight responses may not always be volatile, but may also present in more subtle behaviours such as raising the head or stepping sideways. These milder reactive behaviours quickly escalate, however, to dangerous behaviours because they are often rewarded through principles of -R. When any of these movements results in creating distance between the horse and the feared object, we are likely to see more of that behaviour in the future. Ending the aversive event is the reward.
Ingrained phobias are often learned when reactive behaviours made the feared stimulus go away, and greater reactivity made it go away more quickly.
An established history that particular procedures mean bad things will follow can, however, also offer many shaping opportunities to relearn that this procedure is now associated with good events. The more we are able to reinforce and entrench this new agreeable history with both -R and +R, the less likely it will be for that fear response, still lurking in the ancient regions of the brain's amygdala, to resurface. For most procedures, even unpleasant ones, the panic reaction and the subsequent forcible control are much worse than the procedure itself. Once the horse understands that the latter can be eliminated, the clipping, or the injection, or the farrier's hammering become quite tolerable.
A last caveat here: because fear responses are so deeply embedded in the horse's memory, in spite of our best efforts, they often reappear in what learning theorists call "spontaneous recovery." Do not despair. Rewind and reshape the calm behaviour you are seeking. The shaping proceeds more quickly with each re-shaping sequence, and further solidifies the new reinforcement history.
Critics often argue that you don't need a lot of scientific mumbo-jumbo to train horses just use good old common sense. Unfortunately, our good old common sense is coloured by our human-centred viewpoint, and does not do the best job of seeing the horse's world perspective. Science can help us here.
When our horses are difficult, and even when they are fabulous, it is tempting to attribute their behaviour to their deficient or exemplary disposition. But we are always better served by learning theory explanations. Horses are always learning, whether we are intentionally training them or not. Shawna Karrasch reminds us of a simple truth about training horses: "If any behaviour, either desired or undesired is increasing in frequency, there is something in the environment that is reinforcing it." It behooves us to do some sleuthing to discover where we might be reinforcing behaviours that we wish would go away, and to systematically shape and reward the behaviours we would like to see more often.