In Part I we looked at the interesting architecture of adversarial learning with two learning models pitted against each other. We also built a very simple example of two nodes with adjustable parameters to get started with coding this adversarial architecture and visualising the learning as it progresses.
That example was so simple that the algebra collapsed to make the generator independent of the discriminator, but the exercise was still useful to develop the code and visualisation and avoid the additional complexity of neural networks.
We now progress to using neural networks as the learning models, but still keep both the learning task and the neural networks as simple as possible.
PyTorch
I firmly believe in learning how to build things from scratch if we really want to understand them. We've previously learned to make our own neural networks from scratch using Python. You can read more on the blog that follows that journey.Once we've done that it can make sense to use frameworks that make building and using neural networks easier. There are two leading choices - PyTorch and TensorFlow. Both allow easy use of GPU acceleration. Although TensforFlow is open source, its development is firmly led by Google. PyTorch has some advantages:
- it is much more open source in its development and community involvement
- it is much more pythonic, meaning code is easy to read and learn, and also to debug
- the computation graphs are dynamic allowing more interesting tasks to be done more simply
We'll be using PyTorch.
I previously wrote an intro to PyTorch. Although PyTorch itself has changed a little the explanation of its ability to automatically calculate error gradients for back propagation is still valid:
Task Overview - Learn To Imitate 1010 Patterns
The following diagram shows an overview of our task. The architecture is the same as we saw last time - the discriminator is being trained to classify data from the training data as real, and data from the generator as fake.This time the generator and discriminator are simple neural networks. Because neural networks need an input, the generator is fed data, which we'll discuss below.
The diagram also shows where learning happens. The discriminator learns as a result of the error in its output. Back-propagation of this error is used to calculate the weight changes in the discriminator neural network. The generator also learns from the classification error with the neural network weight changes back-propagated all the way back to itself via the discriminator.
Training Data
The training data are patterns of four numbers of the form 1010. What we'll do is make this a little fuzzy by generating four random numbers where the first and third are close to 1, and the second and fourth are are close to 0. That means the training data could be something like [0.99, 0.01, 0.98, 0.02].We can write a very simple function to create this fuzzy 1010 pattern:
# function to generate real data
def generate_real():
t = torch.FloatTensor([random.uniform(0.8, 1.0),
random.uniform(0.0, 0.2),
random.uniform(0.8, 1.0),
random.uniform(0.0, 0.2)]).view(1, 4)
return t
You can see we're not just returning a simple python list but turning it a PyTorch tensor which is like a numpy array, but with additional machinery needed to enable machine learning.
Feeding The Generator
The generator also creates data in the form of four numbers. If the training goes well, it will have learned to imitate the training data and create numbers that might be like [0.99, 0.02, 0.99, 0.01]. Unlike last time, this generator is a neural network and so needs an input to turn into an output. The most neutral input is four uniformly random numbers.Again, the code for generating uniformly random numbers is very simple:
# function to generate uniform random data
def generate_random():
t = torch.FloatTensor([random.uniform(0.0, 1.0),
random.uniform(0.0, 1.0),
random.uniform(0.0, 1.0),
random.uniform(0.0, 1.0)]).view(1, 4)
return t
We can test these functions to be sure they do create data that looks right:
We can see the generate_real() function does indeed create number that are high-low-high-low.
The Discriminator
We'll use PyTorch to build the discriminator as a simple neural network. It'll need 4 input nodes because the training data examples are 4 numbers (1010). Because the discriminator is a classifier, it only needs 1 output, which can have a value of 1 for "true" and 0 for "false". To keep things simple, we'll have just one hidden layer, and it can have 3 hidden nodes. I'm pretty sure an even smaller hidden layer would work, but that experimentation is a distraction from our task here.Neural networks are typically built by subclassing from PyTorch. We describe the size and other design elements in the __init__() constructor. We also need to describe how the inputs work their way to become outputs, via network layers and activation functions. By convention this is described in a method called forward().
The following shows the code for a discriminator class:
## discriminator class
class Discriminator(nn.Module):
def __init__(self):
# initialise parent pytorch class
super().__init__()
# define the layers and their sizes, turn off bias
self.linear_ih = nn.Linear(4, 3, bias=False)
self.linear_ho = nn.Linear(3, 1, bias=False)
# define activation function
self.activation = nn.Sigmoid()
# create error function
self.error_function = torch.nn.MSELoss()
# create optimiser, using simple stochastic gradient descent
self.optimiser = torch.optim.SGD(self.parameters(), lr=0.01)
# accumulator for progress
self.progress = []
pass
def forward(self, inputs):
# combine input layer signals into hidden layer
hidden_inputs = self.linear_ih(inputs)
# apply sigmiod activation function
hidden_outputs = self.activation(hidden_inputs)
# combine hidden layer signals into output layer
final_inputs = self.linear_ho(hidden_outputs)
# apply sigmiod activation function
final_outputs = self.activation(final_inputs)
return final_outputs
def train(self, inputs, targets):
# calculate the output of the network
output = self.forward(inputs)
# calculate error
loss = self.error_function(output, targets)
# accumulate error
self.progress.append(loss.item())
# zero gradients, perform a backward pass, and update the weights.
self.optimiser.zero_grad()
loss.backward()
self.optimiser.step()
pass
def plot_progress(self):
df = pandas.DataFrame(self.progress, columns=['loss'])
df.plot(ylim=(0, 0.5), figsize=(16,8), alpha=0.1, marker='.', grid=True, yticks=(0, 0.25, 0.5))
pass
pass
You can see the hidden (middle) layer combines the inputs as a linear combination, and we use a simple sigmoid activation function. The same is done to take the outputs of the hidden layer to the final layer, which is just a single node. The error function is again a simple mean squared error. The optimiser, which decides how to change the neural network weights, is also a very simple stochastic gradient descent. We've deberately chosen simple options as our focus here is on getting a basic GAN up and running, not worrying about fine details.
The train() method is pretty self-explanatory too. The inputs are pushed through the network using the forward() function, and the output is compared to the target to give the error, conventionally called the loss. I've added code to append the loss to a list so that we can visualise how it changes over many training runs. The last three lines of code in the train() method are standard PyTorch - we need to zero the gradients from any previous runs, use the latest loss to back propagate and calculate new error gradients, and the change the weights.
Before we move onto the generator, let's make sure our discriminator works. The following code
# test discriminator itself works
D = Discriminator()
for i in range(10000):
D.train(generate_real(), torch.FloatTensor([1.0]))
D.train(generate_false(), torch.FloatTensor([0.0]))
pass
You can see we're giving the discriminator examples of fuzzy 1010 data from generate_real() and telling it the correct classification is 1.0. We're also giving the discriminator examples of false data and telling it the correct classification is 0.0. The generate_false() simply provides a fuzzy 0101 pattern.
Let's visualise the discriminator loss over these 10,000 training sessions.
That looks like the right shape. Over training sessions, the error is falling, which means the discriminator is getting better at learning the training data. You might be wondering why the values start around 0.25 and not 0.5. That's because on average each position in the sequence 1010 will be wring half the time, so the sum on average is 2, the mean is 0.5 and the square of the mean is 0.25. So "half right" will be 0.25 on the graphs, not 0.5.
The reason that plot of errors seem to have two modes at the start is because in the early stages of learning, the network will have an average accuracy for classifying real data that is distinct from the average accuracy against random data.
Let's manually test the discriminator by feeding it data we know to be true and false:
Fed a 0101 pattern, the output is a low 0.05 (false). Fed a 1010 pattern, the output is a high 0.94 (true). That confirms the discriminator is working correctly.
The Generator
Let's now build the generator. Let's remind ourselves what it is. It is a learning model that learns to get better at generating data that looks real. As we're using a using a neural network to do this learning, we need to think about its architecture. We can use 4 output nodes for the four positions of the 1010 pattern. The input and hidden layers have greater freedom, but for simplicity we'll go for 4 nodes in each of these. Any smaller and we risk limiting the expressive capacity of the network.The generator neural network needs an input. If we think about it, the output depends on the input. If we're tuning the network to learn to give a desired output, we want the inputs to, at minimum, not make that task difficult by being biased. This points to a uniform randomness as the inputs to the network.
The code for the generator class is almost identical to the discriminator - they are both neural networks, passing signals from every node in one layer to every node in the next layer, using the same sigmoid activation function, and the same mean squared error function.
# generator class
class Generator(nn.Module):
def __init__(self):
# initialise parent pytorch class
super().__init__()
# define the layers and their sizes, turn off bias
self.linear_ih = nn.Linear(4, 4, bias=False)
self.linear_ho = nn.Linear(4, 4, bias=False)
# define activation function
self.activation = nn.Sigmoid()
# create error function
self.error_function = torch.nn.MSELoss()
# create optimiser, using simple stochastic gradient descent
self.optimiser = torch.optim.SGD(self.parameters(), lr=0.01)
# accumulator for progress
self.progress = []
# counter and array for outputting images
self.counter = 0;
self.image_array_list = [];
pass
def forward(self, inputs):
# combine input layer signals into hidden layer
hidden_inputs = self.linear_ih(inputs)
# apply sigmiod activation function
hidden_outputs = self.activation(hidden_inputs)
# combine hidden layer signals into output layer
final_inputs = self.linear_ho(hidden_outputs)
# apply sigmiod activation function
final_outputs = self.activation(final_inputs)
return final_outputs
def train(self, D, inputs, targets):
# calculate the output of the network
g_output = self.forward(inputs)
# pass onto Discriminator
d_output = D.forward(g_output)
# calculate error
loss = D.error_function(d_output, targets)
# calculate how far wrong the generator for purposes of plotting
# note we're using knowledge about real data here
g_loss = self.error_function(g_output, torch.FloatTensor([0.9, 0.0, 0.9, 0.0]))
# accumulate error
self.progress.append(g_loss.item())
# zero gradients, perform a backward pass, and update the weights.
self.optimiser.zero_grad()
loss.backward()
self.optimiser.step()
# increase counter and add row to image
self.counter += 1;
if (self.counter % 1000 == 0):
self.image_array_list.append(g_output.detach().numpy())
pass
pass
def plot_progress(self):
df = pandas.DataFrame(self.progress, columns=['loss'])
df.plot(ylim=(0, 0.5), figsize=(16,8), alpha=0.1, marker='.', grid=True, yticks=(0, 0.25, 0.5))
pass
def plot_images(self):
plt.figure(figsize = (16,8))
plt.imshow(numpy.concatenate(self.image_array_list).T, interpolation='none', cmap='Blues')
pass
pass
Although most of the generator code is similar to that of the discriminator, the training is different. Here we pass the inputs through the generator as normal to give the outputs. However, we aren't learning by comparing these outputs with real data. Remember, the generator doesn't see the real data. It only learns by looking a how well it convinced the discriminator. So we push the generator outputs through the discriminator to get a classification. We want that to be real or 1.0.
The error function, which decides how we update the network weights, compares the classifier output with what it should be, 1.0. The way PyTorch works, the act of performing calculations on PyTorch tensors, starting with the random inputs to the generator, through to the output of the discriminator, means PyTorch internally calculates the error gradients from the classification error all the way back through the discriminator weights to the generator weights.
However - we don't want to change the discriminator weights. We aren't training the discriminator to recognise the generator outputs as real. We're only training the generator. Luckily, the call to self.optimiser.step() referred only to the generator parameters, so this is easy to do and doesn't require extra coding.
We have the same extra code to keep a log of the generator errors just like before, but this time we are using knowledge of what real data should look like to make the comparison. Look at the code yourself to confirm that knowledge is not used to train the generator itself. It's only used to help us visualise progress, and can be removed at any time.
We also have additional code which takes a snapshot of the generator outputs at every 1000 training steps so we can visualise the patterns it creates.
Adversarial Training
The training of this adversarial architecture takes three distinct step, repeated many times:- showing the discriminator a real data example, and telling it the classification should be 1.0
- showing the discriminator the output of the generator and telling it the classification should be 0.0
- showing the discriminator the output of the generator and telling the generator the result should be 1.0
The first two steps train the discriminator to get good at separating real and false data. The third step trains the generator to get create real looking data that can get past the discriminator.
The code for this three step training is simple:
# create Discriminator and Generator
D = Discriminator()
G = Generator()
# train Discriminator and Generator
for i in range(10000):
# train discriminator on true
D.train(generate_real(), torch.FloatTensor([1.0]))
# train discriminator on false
# use detach() so only D is updated, not G
D.train(G.forward(generate_random()).detach(), torch.FloatTensor([0.0]))
# train generator
G.train(D, generate_random(), torch.FloatTensor([1.0]))
pass
Let's see how the discriminator training progresses:
That's interesting!
Before, the error reduced towards zero as the discriminator got better and better at telling real data from fake data. Now the discriminator seems to be approaching a state where it isn't good at telling real data apart from the data from the generator, which itself is getting better and better at generating more realistic data. That's why the error is approaching an average of 0.25.
Let's see the error between the output of the generator compared to what we know real data should look like:
That confirms the generator is getting better and better at data that looks like 1010.
Great - we've trained a generator that successfully learns to create realistic data that the discriminator finds hard to tell apart from actual real training data!
Images
Let's visualise the snapshots the generator took of its output at every 1000 training steps.The generator output starts indistinct, but over time, the out becomes distinctly 1010.
This visualisation is a forward look to Part III where we'll try to train a GAN to generate 2-dimensional images.
As a final check, let's manually run the generator to confirm the outputs do indeed look like 1010.
Yup - the outputs are very close to 1010.
Conclusion
We've succeeded in taking the basic adversarial architecture we discussed in Part I, developing it to use neural networks as learning units, and applying it to a more interesting learning task.We also used visualisation of the error and generator outputs to see, and better understand, the training process.
The key point here is that the generator never sees the real training data - yet it learns to create convincing imitations!
The code is available on github as a notebook:
More Reading
- Stanford's intro to PyTorch: https://cs230-stanford.github.io/pytorch-getting-started.html