Tuesday 16 April 2019

Generative Adversarial Networks - Part II

This is the second in a short series of posts introducing and building generative adversarial networks, known as GANs.

In Part I we looked at the interesting architecture of adversarial learning with two learning models pitted against each other. We also built a very simple example of two nodes with adjustable parameters to get started with coding this adversarial architecture and visualising the learning as it progresses.

That example was so simple that the algebra collapsed to make the generator independent of the discriminator, but the exercise was still useful to develop the code and visualisation and avoid the additional complexity of neural networks.

We now progress to using neural networks as the learning models, but still keep both the learning task  and the neural networks as simple as possible.


PyTorch

I firmly believe in learning how to build things from scratch if we really want to understand them. We've previously learned to make our own neural networks from scratch using Python. You can read more on the blog that follows that journey.

Once we've done that it can make sense to use frameworks that make building and using neural networks easier. There are two leading choices - PyTorch and TensorFlow. Both allow easy use of GPU acceleration. Although TensforFlow is open source, its development is firmly led by Google. PyTorch has some advantages:

  • it is much more open source in its development and community involvement
  • it is much more pythonic, meaning code is easy to read and learn, and also to debug
  • the computation graphs are dynamic allowing more interesting tasks to be done more simply

We'll be using PyTorch.

I previously wrote an intro to PyTorch. Although PyTorch itself has changed a little the explanation of its ability to automatically calculate error gradients for back propagation is still valid:




Task Overview -  Learn To Imitate 1010 Patterns

The following diagram shows an overview of our task. The architecture is the same as we saw last time - the discriminator is being trained to classify data from the training data as real, and data from the generator as fake.


This time the generator and discriminator are simple neural networks. Because neural networks need an input, the generator is fed data, which we'll discuss below.

The diagram also shows where learning happens. The discriminator learns as a result of the error in its output. Back-propagation of this error is used to calculate the weight changes in the discriminator neural network.  The generator also learns from the classification error with the neural network weight changes back-propagated all the way back to itself via the discriminator.


Training Data

The training data are patterns of four numbers of the form 1010. What we'll do is make this a little fuzzy by generating four random numbers where the first and third are close to 1, and the second and fourth are are close to 0. That means the training data could be something like [0.99, 0.01, 0.98, 0.02].

We can write a very simple function to create this fuzzy 1010 pattern:


# function to generate real data

def generate_real():
    t = torch.FloatTensor([random.uniform(0.8, 1.0),
                           random.uniform(0.0, 0.2),
                           random.uniform(0.8, 1.0),
                           random.uniform(0.0, 0.2)]).view(1, 4)
    return t


You can see we're not just returning a simple python list but turning it a PyTorch tensor which is like a numpy array, but with additional machinery needed to enable machine learning.


Feeding The Generator

The generator also creates data in the form of four numbers. If the training goes well, it will have learned to imitate the training data and create numbers that might be like [0.99, 0.02, 0.99, 0.01]. Unlike last time, this generator is a neural network and so needs an input to turn into an output. The most neutral input is four uniformly random numbers.

Again, the code for generating uniformly random numbers is very simple:


# function to generate uniform random data

def generate_random():
    t = torch.FloatTensor([random.uniform(0.0, 1.0),
                           random.uniform(0.0, 1.0),
                           random.uniform(0.0, 1.0),
                           random.uniform(0.0, 1.0)]).view(1, 4)
    return t


We can test these functions to be sure they do create data that looks right:

We can see the generate_real() function does indeed create number that are high-low-high-low.


The Discriminator

We'll use PyTorch to build the discriminator as a simple neural network. It'll need 4 input nodes because the training data examples are 4 numbers (1010). Because the discriminator is a classifier, it only needs 1 output, which can have a value of 1 for "true" and 0 for "false". To keep things simple, we'll have just one hidden layer, and it can have 3 hidden nodes. I'm pretty sure an even smaller hidden layer would work, but that experimentation is a distraction from our task here.

Neural networks are typically built by subclassing from PyTorch. We describe the size and other design elements in the __init__() constructor. We also need to describe how the inputs work their way to become outputs, via network layers and activation functions. By convention this is described in a method called forward().

The following shows the code for a discriminator class:


## discriminator class

class Discriminator(nn.Module):
    
    def __init__(self):
        # initialise parent pytorch class
        super().__init__()
        
        # define the layers and their sizes, turn off bias
        self.linear_ih = nn.Linear(4, 3, bias=False)
        self.linear_ho = nn.Linear(3, 1, bias=False)
        
        # define activation function
        self.activation = nn.Sigmoid()
        
        # create error function
        self.error_function = torch.nn.MSELoss()

        # create optimiser, using simple stochastic gradient descent
        self.optimiser = torch.optim.SGD(self.parameters(), lr=0.01)
        
        # accumulator for progress
        self.progress = []
        pass
    
    
    def forward(self, inputs):        
        # combine input layer signals into hidden layer
        hidden_inputs = self.linear_ih(inputs)
        # apply sigmiod activation function
        hidden_outputs = self.activation(hidden_inputs)
        
        # combine hidden layer signals into output layer
        final_inputs = self.linear_ho(hidden_outputs)
        # apply sigmiod activation function
        final_outputs = self.activation(final_inputs)
        
        return final_outputs
    
    
    def train(self, inputs, targets):
        # calculate the output of the network
        output = self.forward(inputs)
        
        # calculate error
        loss = self.error_function(output, targets)
        
        # accumulate error
        self.progress.append(loss.item())

        # zero gradients, perform a backward pass, and update the weights.
        self.optimiser.zero_grad()
        loss.backward()
        self.optimiser.step()
        pass
    
    
    def plot_progress(self):
        df = pandas.DataFrame(self.progress, columns=['loss'])
        df.plot(ylim=(0, 0.5), figsize=(16,8), alpha=0.1, marker='.', grid=True, yticks=(0, 0.25, 0.5))
        pass
    
    pass


You can see the hidden (middle) layer combines the inputs as a linear combination, and we use a simple sigmoid activation function. The same is done to take the outputs of the hidden layer to the final layer, which is just a single node. The error function is again a simple mean squared error. The optimiser, which decides how to change the neural network weights, is also a very simple stochastic gradient descent. We've deberately chosen simple options as our focus here is on getting a basic GAN up and running, not worrying about fine details.

The train() method is pretty self-explanatory too. The inputs are pushed through the network using the forward() function, and the output is compared to the target to give the error, conventionally called the loss. I've added code to append the loss to a list so that we can visualise how it changes over many training runs. The last three lines of code in the train() method are standard PyTorch - we need to zero the gradients from any previous runs, use the latest loss to back propagate and calculate new error gradients, and the change the weights.

Before we move onto the generator, let's make sure our discriminator works. The following code

    
 # test discriminator itself works

D = Discriminator()

for i in range(10000):
    D.train(generate_real(), torch.FloatTensor([1.0]))
    D.train(generate_false(), torch.FloatTensor([0.0]))
    pass


You can see we're giving the discriminator examples of fuzzy 1010 data from generate_real() and telling it the correct classification is 1.0. We're also giving the discriminator examples of false data and telling it the correct classification is 0.0. The generate_false() simply provides a fuzzy 0101 pattern.

Let's visualise the discriminator loss over these 10,000 training sessions.


That looks like the right shape. Over training sessions, the error is falling, which means the discriminator is getting better at learning the training data. You might be wondering why the values start around 0.25 and not 0.5. That's because on average each position in the sequence 1010 will be wring half the time, so the sum on average is 2, the mean is 0.5 and the square of the mean is 0.25. So "half right" will be 0.25 on the graphs, not 0.5.

The reason that plot of errors seem to have two modes at the start is because in the early stages of learning, the network will have an average accuracy for classifying real data that is distinct from the average accuracy against random data.

Let's manually test the discriminator by feeding it data we know to be true and false:


Fed a 0101 pattern, the output is a low 0.05 (false). Fed a 1010 pattern, the output is a high 0.94 (true). That confirms the discriminator is working correctly.


The Generator

Let's now build the generator. Let's remind ourselves what it is. It is a learning model that learns to get better at generating data that looks real. As we're using a using a neural network to do this learning, we need to think about its architecture. We can use 4 output nodes for the four positions of the 1010 pattern. The input and hidden layers have greater freedom, but for simplicity we'll go for 4 nodes in each of these. Any smaller and we risk limiting the expressive capacity of the network.

The generator neural network needs an input. If we think about it, the output depends on the input. If we're tuning the network to learn to give a desired output, we want the inputs to, at minimum, not make that task difficult by being biased. This points to a uniform randomness as the inputs to the network.

The code for the generator class is almost identical to the discriminator - they are both neural networks, passing signals from every node in one layer to every node in the next layer, using the same sigmoid activation function, and the same mean squared error function.

    
# generator class

class Generator(nn.Module):
    
    def __init__(self):
        # initialise parent pytorch class
        super().__init__()
        
        # define the layers and their sizes, turn off bias
        self.linear_ih = nn.Linear(4, 4, bias=False)
        self.linear_ho = nn.Linear(4, 4, bias=False)
        
        # define activation function
        self.activation = nn.Sigmoid()
        
        # create error function
        self.error_function = torch.nn.MSELoss()

        # create optimiser, using simple stochastic gradient descent
        self.optimiser = torch.optim.SGD(self.parameters(), lr=0.01)
        
        # accumulator for progress
        self.progress = []
        
        # counter and array for outputting images
        self.counter = 0;
        self.image_array_list = [];
        pass
    
    
    def forward(self, inputs):        
        # combine input layer signals into hidden layer
        hidden_inputs = self.linear_ih(inputs)
        # apply sigmiod activation function
        hidden_outputs = self.activation(hidden_inputs)
        
        # combine hidden layer signals into output layer
        final_inputs = self.linear_ho(hidden_outputs)
        # apply sigmiod activation function
        final_outputs = self.activation(final_inputs)
        
        return final_outputs
    
    
    def train(self, D, inputs, targets):
        # calculate the output of the network
        g_output = self.forward(inputs)
        
        # pass onto Discriminator
        d_output = D.forward(g_output)
        
        # calculate error
        loss = D.error_function(d_output, targets)
        
        # calculate how far wrong the generator for purposes of plotting
        # note we're using knowledge about real data here
        g_loss = self.error_function(g_output, torch.FloatTensor([0.9, 0.0, 0.9, 0.0]))
        
        # accumulate error
        self.progress.append(g_loss.item())

        # zero gradients, perform a backward pass, and update the weights.
        self.optimiser.zero_grad()
        loss.backward()
        self.optimiser.step()
        
        # increase counter and add row to image
        self.counter += 1;
        if (self.counter % 1000 == 0):
            self.image_array_list.append(g_output.detach().numpy())
            pass
        
        pass
    
    
    def plot_progress(self):
        df = pandas.DataFrame(self.progress, columns=['loss'])
        df.plot(ylim=(0, 0.5), figsize=(16,8), alpha=0.1, marker='.', grid=True, yticks=(0, 0.25, 0.5))
        pass
    
    
    def plot_images(self):
        plt.figure(figsize = (16,8))
        plt.imshow(numpy.concatenate(self.image_array_list).T, interpolation='none', cmap='Blues')
        pass
    
    pass


Although most of the generator code is similar to that of the discriminator, the training is different. Here we pass the inputs through the generator as normal to give the outputs. However, we aren't learning by comparing these outputs with real data. Remember, the generator doesn't see the real data. It only learns by looking a how well it convinced the discriminator. So we push the generator outputs through the discriminator to get a classification. We want that to be real or 1.0.

The error function, which decides how we update the network weights, compares the classifier output with what it should be, 1.0. The way PyTorch works, the act of performing calculations on PyTorch tensors, starting with the random inputs to the generator, through to the output of the discriminator, means PyTorch internally calculates the error gradients from the classification error all the way back through the discriminator weights to the generator weights.

However - we don't want to change the discriminator weights. We aren't training the discriminator to recognise the generator outputs as real. We're only training the generator. Luckily, the call to self.optimiser.step() referred only to the generator parameters, so this is easy to do and doesn't require extra coding.

We have the same extra code to keep a log of the generator errors just like before, but this time we are using knowledge of what real data should look like to make the comparison. Look at the code yourself to confirm that knowledge is not used to train the generator itself. It's only used to help us visualise progress, and can be removed at any time.

We also have additional code which takes a snapshot of the generator outputs at every 1000 training steps so we can visualise the patterns it creates.


Adversarial Training

The training of this adversarial architecture takes three distinct step, repeated many times:

  • showing the discriminator a real data example, and telling it the classification should be 1.0
  • showing the discriminator the output of the generator and telling it the classification should be 0.0
  • showing the discriminator the output of the generator and telling the generator the result should be 1.0

The first two steps train the discriminator to get good at separating real and false data. The third step trains the generator to get create real looking data that can get past the discriminator.

The code for this three step training is simple:

    
# create Discriminator and Generator

D = Discriminator()
G = Generator()


# train Discriminator and Generator

for i in range(10000):
    
    # train discriminator on true
    D.train(generate_real(), torch.FloatTensor([1.0]))
    
    # train discriminator on false
    # use detach() so only D is updated, not G
    D.train(G.forward(generate_random()).detach(), torch.FloatTensor([0.0]))
    
    # train generator
    G.train(D, generate_random(), torch.FloatTensor([1.0]))
    
    pass


Let's see how the discriminator training progresses:


That's interesting!

Before, the error reduced towards zero as the discriminator got better and better at telling real data from fake data. Now the discriminator seems to be approaching a state where it isn't good at telling real data apart from the data from the generator, which itself is getting better and better at generating more realistic data. That's why the error is approaching an average of 0.25.

Let's see the error between the output of the generator compared to what we know real data should look like:


That confirms the generator is getting better and better at data that looks like 1010.

Great - we've trained a generator that successfully learns to create realistic data that the discriminator finds hard to tell apart from actual real training data!


Images

Let's visualise the snapshots the generator took of its output at every 1000 training steps.


The generator output starts indistinct, but over time, the out becomes distinctly 1010.

This visualisation is a forward look to Part III where we'll try to train a GAN to generate 2-dimensional images.

As a final check, let's manually run the generator to confirm the outputs do indeed look like 1010.


Yup - the outputs are very close to 1010.


Conclusion

We've succeeded in taking the basic adversarial architecture we discussed in Part I,  developing it to use neural networks as learning units, and applying it to a more interesting learning task.

We also used visualisation of the error and generator outputs to see, and better understand, the training process.

The key point here is that the generator never sees the real training data - yet it learns to create convincing imitations!

The code is available on github as a notebook:



More Reading

Friday 12 April 2019

Generative Adversarial Networks - Part I

This is the first of a short series of posts introducing and building generative adversarial networks, known as GANs.


Why GANs?

Artificial intelligence has seen huge advances in recent years, with notable achievements like computers being able to compete with humans at the notoriously difficult to master ancient game of go, self-driving cars, and voice recognition in your pocket.

Much of that recent progress has been enabled by the ability to train large neural networks as computing power has become cheaper. The training of neural networks with many layers as become known as deep learning, although that terms does cover other many-layered learning models too.

The key benefit of deep, or many-layered, neural networks is that they can learn which elements of the data are useful features. These features can usefully be reasoned about to make higher level decisions. For example, a face recognition system might learn features such as eyes and mouth. Previously we had to work out, or guess, what the right low-level features should be.

Neural networks are typically used to distill lots of data into smaller information, like a yes/no decision or a classification. But they can also be used to generate data - which can include images.

Even more recently, a new architecture emerged that led to spectacular results for generated images. The following faces are not real, they were created by a generative network (source).


In October 2018, the world-leading art auction house Christies sold the Portrait of Edmond Belamy for $432,500.


That portrait was not painted by a person, but created using a generative neural network.

The neural network architecture that generates these compelling results is known as a generative adversarial network, or GAN.

The name describes the unique adversarial way in which the networks learn.


Generative Adversarial Learning

Before we look at this unique adversarial way of learning, let's first look the typical approach to machine learning.


A model, often a neural network is fed training data, and the output of that model is compared to what the right output should be. The difference, the error, guides how internal parameters of that model are updated in an attempt to reduce the error.

For a neural network, the error is used to update the link weights that connect nodes in the network, using a method known as back propagation of the error.

This typical approach has been pervasive across many forms of machine learning.

Although he wasn't the first to explore the idea, Ian Goodfellow's 2014 paper (pdf) kicked off a period of intense interest in a new approach.

In this approach we still have a learning model that is fed examples to learn from. This time, the learning model is trained to distinguish between real and fake examples of data.


You can see from the picture above that the learning model is fed examples of real data as is trained to recognise them as real. You can also see that same learning model is also fed data from another source, and is trained to recognise them as false.

In the picture above, you can see we're not using a data set for the fake examples, but something that generates that data. It makes sense to call it a generator.

So its job is to get good at spotting the real examples from the fake ones - that's why it is called a discriminator.

So far that's very much like the standard approach to machine learning.

What's new is that while the discriminator is learning to get good at separating real data from fake data, the generator is learning to get better at creating data that can fool the discriminator!


As training progresses:
  • the discriminator gets better and better at telling real and fake data apart
  • the generator gets better and better at creating data that looks like real data

The discriminator and generator are pitted against each other - their aims are adversarial.

Ingenious!

Let's look think a little bit more about how the generator is trained, as it is not often explained well.

Unlike the discriminator, we don't have examples of what the correct output of the generator should be. All we know is that if the generator does a good job, the output of the discriminator should be a "true" classification.

This sounds like a problem, but we can actually train the generator if we consider the combination of the generator and discriminator as a longer machine learning model.


Machine learning models have parameters that are adjusted during training. If the learning models are neural networks, these parameters are the link weights. In this example, we calculate the weight updates as if we were training a long neural network (generator + discriminator) but only update the generator's weights.

This neat idea solve our apparent problem, and avoids training the discriminator to say that generated data is real.

Again, ingenious!

In practice, this method of training the generator works either badly, or very well. In the wider context, GANs are a new method and like all machine learning methods, lots still needs to be learned to improve the performance and stability of learning. When they work, the results can be impressive!


(Over?) Simplified Adversarial Learning

Let's see if we can build a generative adversarial learning system that is as simple as we can make it. The aim is to see the adversarial learning process in action - but avoid the complexity of neural networks and data that needs to be transformed and messed about with.

Imagine a very simple discriminator node that has only had one adjustable parameter.


The node takes an input x and multiplies it by parameter p to give the an output o. We can't get simpler than that!

Now imagine the inputs x, are examples of real data. Let's say real data is around the value 1.0 so the training examples are in the range 0.9 to 1.1. The following code shows a very simple function that creates these real data examples:


# function to generate real data

def generate_real():
    
    return random.uniform(0.9, 1.1)


As a really simple task, let's say the job of the learning node is to output 1 when the input is real. That means the adjustable parameter p must approach 1 as it learns. Let's set it to start at 0.1. That means during training that parameter needs to increase towards 1.

Here's a simple class for the discriminator showing the initial parameter at 0.1, a very simple test() method which calculates the output, and a train() method that adjusts the parameter according to the error and a learning rate, which here is 0.05.


# disciminitator node with adjustable parameter

class Discriminator:
    
    def __init__(self):
        self.parameter = 0.1
        
        # accumulator for progress
        self.progress = []
        pass
    
    def test(self, x):
        return x * self.parameter
    
    def train(self, x, target):
        output = self.test(x)
        error = target - output
        
        # use error to adjust parameter, learning rate is 0.05
        self.parameter += 0.05 * error * x
        
        # accumulate progress
        self.progress.append([error, self.parameter])        
        pass
    
    def plot_progress(self):
        df = pandas.DataFrame(self.progress, columns=['error', 'parameter'])
        df.plot(figsize=(16,8))
        pass

    pass


Just like the standard machine learning approach, if the output is close to the target, then the error is small and the parameter doesn't need to be adjusted by much.

There is some extra code in there to accumulate the error and parameter as they evolve in a list so we can plot them later.

The following simple code shows how we can create an instance of a discriminator and train it to output a target of 1.0. You can see we're training it 300 times.


# create Discriminator

D = Discriminator()


# train Discriminator

for i in range(300):
    
    # train discriminator on true
    D.train(generate_real(), 1.0)
    
    pass


Let's see plot a graph of the error and parameter as they change over the training period.

D.plot_progress()



As expected, we can see the parameter starts at 0.1 and grows towards 1.0. We can also see error start at around 0.9 and fall towards zero.

So far we've not done anything particularly special. We have trained a very simple node in a very simple scenario.

Let's now think about a generator node, keeping it as simple as possible.


This node doesn't take any input. It has an adjustable parameter p, and the output o is simply that parameter p. We can use the difference between the output and a target value, the error, to adjust the parameter p, just like before.

The following shows the class for this simplified generator. The parameter is initially 0.1 which means the first generated value will be 0.1.


# generator node with adjustable parameter

class Generator:
    
    def __init__(self):
        self.parameter = 0.1
        
        # accumulator for progress
        self.progress = []
        
        pass
    
    def generate(self):
        return self.parameter
    
    def train(self, target):
        output = self.generate()
        error = target - output
        
        # use error to adjust parameter, learning rate is 0.05
        self.parameter += 0.05 * error
        
        # accumulate progress
        self.progress.append([error, self.parameter])        
        pass
    
    def plot_progress(self):
        df = pandas.DataFrame(self.progress, columns=['error', 'parameter'])
        df.plot(figsize=(16,8))
        pass
    
    pass


The code almost identical to the discriminator because both have an adjustable parameter, and both update the parameter in a similar way.

Let's now train the discriminator on both the real data and on the fake data coming from the generator. The code below shows the target for the real data is 1.0 but for the fake data it is 0.0. The aim is to get the discriminator good at telling real and fake data apart.


# create Discriminator and Generator

D = Discriminator()
G = Generator()


# train Discriminator and Generator

for i in range(300):
    
    # train discriminator on true
    D.train(generate_real(), 1.0)
    
    # train discriminator on false
    D.train(D.test(G.generate()), 0.0)
    
    # train generator
    G.train(1.0)
    
    pass


You can also see we're also training the generator. We telling it that it should target 1.0 when generating data.

Let's see how the generator parameter and error changes during training.


We can see that over time, the parameter grows from the initial 0.1 towards 1.0. This means the generator is getting better at creating data that looks like real data - which was in the range 0.9 to 1.1. As expected, the error falls towards zero.

Great!

let's look again at what's happening with the discriminator now that it is being trained against both the real data and data from the generator.


That's interesting. The parameter is no longer rising towards 1.0. The error is not smoothly falling to zero. The reason for this is that as the generator gets better, the discriminator finds it harder to distinguish between the real and generated data. It is being told the target for the generated data, which is getting closer to 1.0, should be 0.0 - hence the errors. Over time, the error parameter might approach 0.5 reflecting the fact that it can't decide between the two data sources.

This is what happens with real GANs, the discriminator never learns to discriminate between the real data and the ever improving generated data.

Although this has been a very simple, perhaps oversimplified, example - we have seen the key elements: template code for the

You can find the code and graphs in a notebook on github:


Next Time - Neural Networks

In Part II we'll progress to develop a discriminator and generator that are neural networks to see if we can generate more interesting data.

We'll also see a key difference between GANs using neural networks and our simplified example - which is that the generator learns to create


More Reading

The following are useful additional resources:





Extra: Some Algebra

You might be wondering why the training of the generator looks so simple here - almost too simple.

Let's work through it.

First let's look at the discriminator being trained on real data x, without input from the generator.

$$
output_{D}  = parameter_{D} \cdot x
$$

The error is the difference between the desired and actual output, squared:

$$
\begin{align}

error_{D} & = (target - output_{D})^2 \\
& = (target - parameter_{D} \cdot x)^2

\end{align}
$$

And this error changes with the generator parameter simply:

$$
\begin{align}

\frac{\partial}{\partial parameter_{D}} ( error_{D} ) & = \frac{\partial}{\partial parameter_{D}}  ( target - parameter_{D} \cdot x )^2 \\
& = -2 \cdot ( target - parameter_{D} \cdot x) \cdot x \\
& = -2 \cdot (target - output_{D}) \cdot x

\end{align}
$$

The parameter is updated to follow that gradient downwards:

$$
\begin{align}

\Delta parameter_{D} & = - [-2 \cdot (target - output_{D}) \cdot x ]\\
& = 2 \cdot (target - output_{D}) \cdot x \\
& \sim (target - output_{D}) \cdot x

\end{align}
$$

This is why the weight update for the discriminator is as simple as:


# use error to adjust parameter, learning rate is 0.05
self.parameter += 0.05 * error * x


Note the error in the code is simply the difference between the target and output, not squared.

Now let's look at how we might update the generator parameter. We could work out how the overall error depends on this parameter, but we'll mirror the approach taken when back-propagating errors in a neural network. You can read a gentle introduction to back-propagation here [link].

In that approach, we split the overall error amongst the preceding nodes and use the same simple update rule we derived above. Here we only have one node, the generator, that feeds the discriminator. So we can use the same error.

The analogous update rule is:

$$

\Delta parameter_{G}  \sim (target - output_{D})

$$

It makes sense if we think of this node as the same as the discriminator node but with a constant input of x=1.

It's now clear why the weight update for the discriminator is as simple as:


# use error to adjust parameter, learning rate is 0.05
self.parameter += 0.05 * error


Again, note the error in the code is simply the difference between the target and output, not squared.