Sigmoid Change
Step functions enforce some threshold on a variable e.g.: discard anything below a certain value. Given u(x) any value equal to or greater than 0 would trigger a positive output (1 in this case), everything else just equals zero. With 3u(x−8) everything equal to or greater than 8 triggers a non-zero output (3 in this case).
u(x){0ifx<01ifx≥0Step functions are like switches. We could utilise them to trigger a signal whenever a threshold is crossed. Imagine opening the floodgates whenever the water level crosses a certain point or turning of the electric lighting when the ambient light brightness exceeds a given measure.
Perceptrons, binary classifiers and the neurons in artificial neural networks require an activation function in order to produce any output. The step function is a valid option for the activation function but poses a challenge in analysis because of the jump discontinuity at x=0.
The sigmoid function σ(x), because of its differentiability , is
sometimes used as an alternative to the step function u(x).
Some may choose to introduce a weight β to the input variables in order
to obtain a sigmoid curve more reminiscent of the step we’re trying to mimic.
By tweaking the β term, one can obtain a result that approaches the step
function, yet remains differentiable throughout its entire domain . The
graph below demonstrates the output of a sigmoid function with a β of
1, 2, 10 and 100, where a plot of β=100 approximates the step
function quite closely in comparison to the other plots.
However enticing, during this post we’ll keep our eyes set on the unweighted sigmoid.
Derivation of the Sigmoid
Let’s find the derivative of the unweighted sigmoid.
ddxσ(x)=ddx11+e−xLet’s stick to the Lagrange notation for a bit where we notate the derivative to f(x) as f′(x). Given f(x)=a(x)b(x), the quotient rule would state that:
f′(x)=a′(x)⋅b(x)−a(x)⋅b′(x)b(x)2Now that we know how to work out the derivative to a quotient we can basically work out a′(x) and b′(x) in order to fill in the blanks later.
a(x)=1a′(x)=ddx1=0b(x)=1+e−xb′(x)=ddx1+ddxe−x=−1e−xWorking out the math for the derivative to 11+e−x leads to the following result, given the quotient rule:
f′(x)=−1⋅(−1e−x)(1+e−x)2=e−x(1+e−x)2It’s pretty clear that e−x is equal to (1+e−x)−1. That part is trivial to understand, however; it is pretty brilliant to have the insight to organize the tokens as such in order to be able to eliminate tokens in the next step through this substitution.
(1+e−x)−1(1+e−x)2Go the extra mile to separate the expression into two separate terms:
(1+e−x)(1+e−x)2−1(1+e−x)2We can simplify the first term since (1+e−x) occurs in both the numerator and denominator. The second term can be simplified by expressing the value as the square of something which is simply 1=12, yet this operation paves the way to squaring the entire term since anbn=(ab)n.
1(1+e−x)−(1(1+e−x))2Since σ(x) equals 11+e−x we can simplify our result to:
σ(x)−σ(x)2which gives us the beautiful derivative of a sigmoid.
They don’t make it any simpler than this .