Why GANs?
Artificial intelligence has seen huge advances in recent years, with notable achievements like computers being able to compete with humans at the notoriously difficult to master ancient game of go, self-driving cars, and voice recognition in your pocket.Much of that recent progress has been enabled by the ability to train large neural networks as computing power has become cheaper. The training of neural networks with many layers as become known as deep learning, although that terms does cover other many-layered learning models too.
The key benefit of deep, or many-layered, neural networks is that they can learn which elements of the data are useful features. These features can usefully be reasoned about to make higher level decisions. For example, a face recognition system might learn features such as eyes and mouth. Previously we had to work out, or guess, what the right low-level features should be.
Neural networks are typically used to distill lots of data into smaller information, like a yes/no decision or a classification. But they can also be used to generate data - which can include images.
Even more recently, a new architecture emerged that led to spectacular results for generated images. The following faces are not real, they were created by a generative network (source).
In October 2018, the world-leading art auction house Christies sold the Portrait of Edmond Belamy for $432,500.
That portrait was not painted by a person, but created using a generative neural network.
The neural network architecture that generates these compelling results is known as a generative adversarial network, or GAN.
The name describes the unique adversarial way in which the networks learn.
Generative Adversarial Learning
Before we look at this unique adversarial way of learning, let's first look the typical approach to machine learning.A model, often a neural network is fed training data, and the output of that model is compared to what the right output should be. The difference, the error, guides how internal parameters of that model are updated in an attempt to reduce the error.
For a neural network, the error is used to update the link weights that connect nodes in the network, using a method known as back propagation of the error.
This typical approach has been pervasive across many forms of machine learning.
Although he wasn't the first to explore the idea, Ian Goodfellow's 2014 paper (pdf) kicked off a period of intense interest in a new approach.
In this approach we still have a learning model that is fed examples to learn from. This time, the learning model is trained to distinguish between real and fake examples of data.
You can see from the picture above that the learning model is fed examples of real data as is trained to recognise them as real. You can also see that same learning model is also fed data from another source, and is trained to recognise them as false.
In the picture above, you can see we're not using a data set for the fake examples, but something that generates that data. It makes sense to call it a generator.
So its job is to get good at spotting the real examples from the fake ones - that's why it is called a discriminator.
So far that's very much like the standard approach to machine learning.
What's new is that while the discriminator is learning to get good at separating real data from fake data, the generator is learning to get better at creating data that can fool the discriminator!
As training progresses:
- the discriminator gets better and better at telling real and fake data apart
- the generator gets better and better at creating data that looks like real data
The discriminator and generator are pitted against each other - their aims are adversarial.
Ingenious!
Let's look think a little bit more about how the generator is trained, as it is not often explained well.
Unlike the discriminator, we don't have examples of what the correct output of the generator should be. All we know is that if the generator does a good job, the output of the discriminator should be a "true" classification.
This sounds like a problem, but we can actually train the generator if we consider the combination of the generator and discriminator as a longer machine learning model.
Machine learning models have parameters that are adjusted during training. If the learning models are neural networks, these parameters are the link weights. In this example, we calculate the weight updates as if we were training a long neural network (generator + discriminator) but only update the generator's weights.
This neat idea solve our apparent problem, and avoids training the discriminator to say that generated data is real.
Again, ingenious!
In practice, this method of training the generator works either badly, or very well. In the wider context, GANs are a new method and like all machine learning methods, lots still needs to be learned to improve the performance and stability of learning. When they work, the results can be impressive!
(Over?) Simplified Adversarial Learning
Let's see if we can build a generative adversarial learning system that is as simple as we can make it. The aim is to see the adversarial learning process in action - but avoid the complexity of neural networks and data that needs to be transformed and messed about with.Imagine a very simple discriminator node that has only had one adjustable parameter.
The node takes an input x and multiplies it by parameter p to give the an output o. We can't get simpler than that!
Now imagine the inputs x, are examples of real data. Let's say real data is around the value 1.0 so the training examples are in the range 0.9 to 1.1. The following code shows a very simple function that creates these real data examples:
# function to generate real data
def generate_real():
return random.uniform(0.9, 1.1)
As a really simple task, let's say the job of the learning node is to output 1 when the input is real. That means the adjustable parameter p must approach 1 as it learns. Let's set it to start at 0.1. That means during training that parameter needs to increase towards 1.
Here's a simple class for the discriminator showing the initial parameter at 0.1, a very simple test() method which calculates the output, and a train() method that adjusts the parameter according to the error and a learning rate, which here is 0.05.
# disciminitator node with adjustable parameter
class Discriminator:
def __init__(self):
self.parameter = 0.1
# accumulator for progress
self.progress = []
pass
def test(self, x):
return x * self.parameter
def train(self, x, target):
output = self.test(x)
error = target - output
# use error to adjust parameter, learning rate is 0.05
self.parameter += 0.05 * error * x
# accumulate progress
self.progress.append([error, self.parameter])
pass
def plot_progress(self):
df = pandas.DataFrame(self.progress, columns=['error', 'parameter'])
df.plot(figsize=(16,8))
pass
pass
Just like the standard machine learning approach, if the output is close to the target, then the error is small and the parameter doesn't need to be adjusted by much.
There is some extra code in there to accumulate the error and parameter as they evolve in a list so we can plot them later.
The following simple code shows how we can create an instance of a discriminator and train it to output a target of 1.0. You can see we're training it 300 times.
# create Discriminator
D = Discriminator()
# train Discriminator
for i in range(300):
# train discriminator on true
D.train(generate_real(), 1.0)
pass
Let's see plot a graph of the error and parameter as they change over the training period.
D.plot_progress()
As expected, we can see the parameter starts at 0.1 and grows towards 1.0. We can also see error start at around 0.9 and fall towards zero.
So far we've not done anything particularly special. We have trained a very simple node in a very simple scenario.
Let's now think about a generator node, keeping it as simple as possible.
This node doesn't take any input. It has an adjustable parameter p, and the output o is simply that parameter p. We can use the difference between the output and a target value, the error, to adjust the parameter p, just like before.
The following shows the class for this simplified generator. The parameter is initially 0.1 which means the first generated value will be 0.1.
# generator node with adjustable parameter
class Generator:
def __init__(self):
self.parameter = 0.1
# accumulator for progress
self.progress = []
pass
def generate(self):
return self.parameter
def train(self, target):
output = self.generate()
error = target - output
# use error to adjust parameter, learning rate is 0.05
self.parameter += 0.05 * error
# accumulate progress
self.progress.append([error, self.parameter])
pass
def plot_progress(self):
df = pandas.DataFrame(self.progress, columns=['error', 'parameter'])
df.plot(figsize=(16,8))
pass
pass
The code almost identical to the discriminator because both have an adjustable parameter, and both update the parameter in a similar way.
Let's now train the discriminator on both the real data and on the fake data coming from the generator. The code below shows the target for the real data is 1.0 but for the fake data it is 0.0. The aim is to get the discriminator good at telling real and fake data apart.
# create Discriminator and Generator
D = Discriminator()
G = Generator()
# train Discriminator and Generator
for i in range(300):
# train discriminator on true
D.train(generate_real(), 1.0)
# train discriminator on false
D.train(D.test(G.generate()), 0.0)
# train generator
G.train(1.0)
pass
You can also see we're also training the generator. We telling it that it should target 1.0 when generating data.
Let's see how the generator parameter and error changes during training.
We can see that over time, the parameter grows from the initial 0.1 towards 1.0. This means the generator is getting better at creating data that looks like real data - which was in the range 0.9 to 1.1. As expected, the error falls towards zero.
Great!
let's look again at what's happening with the discriminator now that it is being trained against both the real data and data from the generator.
That's interesting. The parameter is no longer rising towards 1.0. The error is not smoothly falling to zero. The reason for this is that as the generator gets better, the discriminator finds it harder to distinguish between the real and generated data. It is being told the target for the generated data, which is getting closer to 1.0, should be 0.0 - hence the errors. Over time, the error parameter might approach 0.5 reflecting the fact that it can't decide between the two data sources.
This is what happens with real GANs, the discriminator never learns to discriminate between the real data and the ever improving generated data.
Although this has been a very simple, perhaps oversimplified, example - we have seen the key elements: template code for the
You can find the code and graphs in a notebook on github:
Next Time - Neural Networks
In Part II we'll progress to develop a discriminator and generator that are neural networks to see if we can generate more interesting data.We'll also see a key difference between GANs using neural networks and our simplified example - which is that the generator learns to create
More Reading
The following are useful additional resources:- Ian Goodfellow doing a talk on GANS which has a lot of insight in 30 minutes: https://www.youtube.com/watch?v=9JpdAg6uMXs
- GAN overview and tutorial with PyTorch: https://medium.com/@devnag/generative-adversarial-networks-gans-in-50-lines-of-code-pytorch-e81b79659e3f
Extra: Some Algebra
You might be wondering why the training of the generator looks so simple here - almost too simple.Let's work through it.
First let's look at the discriminator being trained on real data x, without input from the generator.
$$
output_{D} = parameter_{D} \cdot x
$$
The error is the difference between the desired and actual output, squared:
$$
\begin{align}
error_{D} & = (target - output_{D})^2 \\
& = (target - parameter_{D} \cdot x)^2
\end{align}
$$
And this error changes with the generator parameter simply:
$$
\begin{align}
\frac{\partial}{\partial parameter_{D}} ( error_{D} ) & = \frac{\partial}{\partial parameter_{D}} ( target - parameter_{D} \cdot x )^2 \\
& = -2 \cdot ( target - parameter_{D} \cdot x) \cdot x \\
& = -2 \cdot (target - output_{D}) \cdot x
\end{align}
$$
The parameter is updated to follow that gradient downwards:
$$
\begin{align}
\Delta parameter_{D} & = - [-2 \cdot (target - output_{D}) \cdot x ]\\
& = 2 \cdot (target - output_{D}) \cdot x \\
& \sim (target - output_{D}) \cdot x
\end{align}
$$
This is why the weight update for the discriminator is as simple as:
Note the error in the code is simply the difference between the target and output, not squared.
Now let's look at how we might update the generator parameter. We could work out how the overall error depends on this parameter, but we'll mirror the approach taken when back-propagating errors in a neural network. You can read a gentle introduction to back-propagation here [link].
In that approach, we split the overall error amongst the preceding nodes and use the same simple update rule we derived above. Here we only have one node, the generator, that feeds the discriminator. So we can use the same error.
The analogous update rule is:
$$
\Delta parameter_{G} \sim (target - output_{D})
$$
It makes sense if we think of this node as the same as the discriminator node but with a constant input of x=1.
It's now clear why the weight update for the discriminator is as simple as:
Again, note the error in the code is simply the difference between the target and output, not squared.
# use error to adjust parameter, learning rate is 0.05
self.parameter += 0.05 * error * x
Note the error in the code is simply the difference between the target and output, not squared.
Now let's look at how we might update the generator parameter. We could work out how the overall error depends on this parameter, but we'll mirror the approach taken when back-propagating errors in a neural network. You can read a gentle introduction to back-propagation here [link].
In that approach, we split the overall error amongst the preceding nodes and use the same simple update rule we derived above. Here we only have one node, the generator, that feeds the discriminator. So we can use the same error.
The analogous update rule is:
$$
\Delta parameter_{G} \sim (target - output_{D})
$$
It makes sense if we think of this node as the same as the discriminator node but with a constant input of x=1.
It's now clear why the weight update for the discriminator is as simple as:
# use error to adjust parameter, learning rate is 0.05
self.parameter += 0.05 * error
Again, note the error in the code is simply the difference between the target and output, not squared.
No comments:
Post a Comment