# Using the Chain Rule

## Using the Chain Rule

There are many different rules in Calculus to allow you to take derivatives manually. We just saw an example of the power rule. This rule states that given the equation:
$$f(x)=x^n$$
the derivative of $\mathrm{f}(\mathrm{x})$ will be as follows:
$$f^{\prime}(x)=n x^{n-1}$$
This allows you to quickly take the derivative of any power. There are many other derivative rules, and they are very useful to know. However, if you do not wish to learn manual differentiation, you can generally get by without it by using a program such as $R$.

However, there is one more rule that is very useful to know. This rule is called the chain rule. The chain rule deals with composite functions. A composite function is nothing more than when one function takes the results of a second function as input. This may sound complex, but programmers make use of composite functions all the time. Here is an example of a composite function call in Java.
System.out.printin( Math.pow $(3,2)$ );
This is a composite function because we take the result of the function pow and feed it to println.

The first step is to calculate the gradients of the neural network. The gradients are used to calculate the slope, or gradient, of the error function for a particular weight. A weight is a connection between two neurons. Calculating the gradient of the error function allows the training method to know that it should either increase or decrease the weight. There are a number of different training methods that make use of gradients. These training methods are called propagation training. This book will discuss the following propagation training methods:

• Backpropagation
• Resilient Propagation
• Quick Propagation
This chapter will focus on using the gradients to train the neural network using backpropagation. The next few chapters will cover the other propagation methods.
• First of all, let’s look at what a gradient is. Basically, training is a search. You are searching for the set of weights that will cause the neural network to have the lowest global error for a training set. If we had an infinite amount of computation resources, we would simply try every possible combination of weights and see which one provided the absolute best global error.
• Because we do not have unlimited computing resources, we have to use some sort of shortcut. Essentially, all neural network training methods are really a kind of shortcut. Each training method is a clever way of finding an optimal set of weights without doing an impossibly exhaustive search.
• Consider a chart that shows the global error of a neural network for each possible weight. This graph might look something like Figure 4.1 .

