Neural network The Santa Fe data set is obtained from a c

Homework answers / question archive / Neural network The Santa Fe data set is obtained from a chaotic laser which can be described as a nonlinear dynamical system

Neural network The Santa Fe data set is obtained from a chaotic laser which can be described as a nonlinear dynamical system

Computer Science

Share With

Neural network

The Santa Fe data set is obtained from a chaotic laser which can be described as a nonlinear dynamical system. Given are 1000

training data points. The aim is to predict the next 100 points (it is forbidden to include these points in the training set!). The training data are stored in lasertrain.dat and are shown in Figure 2a. The test data are contained in laserpred.dat and shown 1n

Figure 2b.

300 300

250 + 250 |

200 | 200 | |

| ‘

| | | I | |

150 | | |) 150 fF I

| } i | ot |

100 | | 100 | . ‘| / | 1. | | |

| |

pi, tote 1 | oe

“ti ! i i WA sat il Y OP

inh hi MA | hy Mh ores ha hae

° 0 200 400 600 800 1000 ° 0 20 40 60 80 100

Discrete time k Discrete time k

(a) Training set (b) Test set

Figure 2

Exercise

Train a MLP with one hidden layer after standardizing the data set. The training is done in feedforward mode:

9. = w' tanh(V [yx—15 ye—2} ---3 Yk—p] + B)- (4)

In order to make predictions, the trained network is used in an iterative way as a recurrent network:

Ge = w* tanh(V[Gn—15 Ge—2; ---3 Fe—p] + 8). (5)

To format the data you can use the provided function get TimeSeriesTrainData. Make sure you understand what the function does by trying it out on a small self-made toy example. To predict the test set you will have to write a for loop that includes the predicted value from the previous timestep in the input vector to predict the next timestep. Investigate the model performance with different lags and number of neurons. Explain clearly how do you tune the parameters and what is the influence

on the final prediction. Which combination of parameters gives the best performance (RMSE) on the test set?

Long short-term memory network

Long Short Term Memory networks, usually just called “LSTMs”, are a special kind of RNN, capable of learning long-term dependencies [2]. LSTMs contain information outside the normal flow of the recurrent network in a gated cell. Information can be stored in, written to, or read from a cell, much like data in a computer’s memory. The cell makes decisions about what to store, and when to allow reads, writes and erasures, via gates that open and close. Those gates act on the signals they receive, and similar to the neural network’s nodes, they block or pass on information based on its strength and importance, which they filter with their own sets of weights. Those weights, like the weights that modulate input and hidden states, are adjusted via the recurrent networks learning process. That is, the cells learn when to allow data to enter, leave or be deleted through the iterative process of making guesses, backpropagating error, and adjusting weights via gradient descent.

Demo

Study the following example, where an LSTM is build to predict the monthly cases of chickenpox by running

openExample (’nnet/TimeSeriesForecastingUsingDeepLearningExample’ ).

Exercise

Based on the previous demo, try to model the Santa Fe data set.

e Train the LSTM model and explain the design process. Discuss how the model looks, the parameters that you tune, ...

What is the effect of changing the lag value for the LSTM network?

e Afterwards try to predict the test set. Use the predictAndUpdateState function to predict time steps one at a time

and update the network state at each prediction. For each prediction, use the previous prediction as input to the function.

e Compare results of the recurrent neural network with the LSTM. Which model do you prefer and why?

[2] Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural

computation, 9(8), 1735-1780.

Write a report of maximum 3 pages (including text and figures) to discuss the

exercises of neural network and Long short-term memory network.

Convolutional Neural Networks

Convolutional Neural Networks (CNN) is a deep learning technique that uses the concept of local connectivity. In a normal

multilayer neural network all nodes from subsequent layers are connected, we call these models fully connected. The idea is that

in a lot of datasets, points that are close to each other, are likely to be a lot more connected than points that are further away. For

example, in image datasets where the datapoints represent pixels. Pixels that are close are likely to represent the same part of the

image, while pixels that are further away can represent different parts.

Exercise

Run the script CNNex.m, try to understand what is happening.

Take a look at the layers of the downloaded CNN and answer the following questions:

¢ Take a look at the first convolutional layer (layer 2) and at the dimension of the weights

(size (convnet.Layers (2) .Weights) ). What do these weights represent?

¢ Inspect layers 1 to 5. If you know that a ReLU and a Cross Channel Normalization layer do not affect the dimension of the

input, what is the dimension of the input at the start of layer 6 and why?

¢ What is the dimension of the inputs before the final classification part of the network (i.e. before the fully connected

layers)? How does this compare with the initial dimension? Briefly discuss the advantage of CNNs over fully connected

networks for image classification.

The script CNNDigits.m[ ] runs a small CNN on the handwritten digits dataset. Use this script to investigate some CNN

architectures. Try out some different amount of layers, combinations of different kinds of layers, dimensions of the weights, etc.

Briefly discuss your results. Be aware that some architectures will take a long time to train!

Write a report of 1-2 pages (including text and figures) to discuss the exercise of

CNN.

Generative Adversarial Networks

Introduction

Generative adversarial networks (GANs) are a class of algorithms used in unsupervised machine learning, implemented by a

system of two neural networks competing with each other in a zero-sum game framework [.']. One neural network, called the

generator, generates new data instances, while the other, the discriminator, evaluates them for authenticity; i.e. the discriminator

decides whether each instance of data belongs to the actual training dataset or not.

To summarize, here are the steps a GAN takes for an image generation example:

1. The generator takes in random numbers and returns an image.

2. This generated image is fed into the discriminator together with a batch of images taken from the actual dataset.

3. The discriminator takes in both real and fake images and returns probabilities, a number between 0 and 1, with | repre-

senting a prediction of authenticity and 0 representing fake.

4. Update the weights of the competing neural networks.

Exercises

Upload the file DCGAN . ipynb and go trough the code. Afterwards, answer the following questions:

1. Select one class from the CIFAR dataset and train a Deep convolutional generative adversarial network (DCGAN). Take

into account the architecture guidelines from Radford et al. [>]. Make sure that you train the model long enough, such that

it is able to generate ’real” images. Monitor the loss and accuracy of the generator vs discriminator, and comment on the

stability of the training. Explain this in context of the GAN framework.

Optimal transport

Optimal transport (OT) [‘] theory can be informally described using the words of Gaspard Monge (1746-1818): A worker with

a shovel in hand has to move a large pile of sand lying on a construction site. The goal of the worker is to construct with all that

sand a target pile with a prescribed shape (for example, that of a giant sand castle). Naturally, the worker wishes to minimize her

total effort, quantified for instance as the total distance or time spent carrying shovels of sand. People interested in OT cast that

problem as that of comparing two probability distributions-two different piles of sand of the same volume. They consider all of

the many possible ways to morph, transport or reshape the first pile into the second, and associate a ’ global” cost to every such

transport, using the local” consideration of how much it costs to move a grain of sand from one place to another. In OT, one

analyzes the properties of that least costly transport, as well as its efficient computation. An example of the computation of OT

and displacement interpolation between two 1-D measures is visible on Figure 3.

A common problem that is solved by OT is the assignment problem. Suppose that we have a collection of n factories, and a

collection of n stores which use the goods that the factory produce. Suppose that we have a cost function c, so that c(z, y) is the

cost of transporting one shipment of the factory from z to y. For simplicity, we ignore the time taken to do the transporting and a

factory can only deliver complete goods (no splitting of goods). Let us introduce some notation so we can formally state this as

an optimization problem. Let r be the vector containing the amount of goods every store needs. Similarly, k denotes the vector

of how much goods every factory produces. Often r and k represent marginal probability distributions, hence their values sum

to one. We wish to find the optimal transport plan, whose total cost is equal to:

du(r,k) = , min d Pi Maj, (1)

where M is the cost matrix, U(r, k) all possible ways to match factories with stores and P,; quantifies the amount of goods that

is transported from factory 2 to store 7. This is called the optimal transport between r and k. It can be solved relatively easily

using linear programming. The optimum, dj, (r, k), is called the Wasserstein metric. It is a distance between two probability

distributions, sometimes also called the earth mover distance as it can be interpreted as how much ‘dirt’ you have to move to

change one landscape’ (distribution) in another (see Monge’s original problem).

Consider a slightly modified form of optimal transport:

dx, (r,k) = pani ,) d PijMij — yh(P), with h(P) = — d Pyjlog(Pi;). (2)

Which is called the Sinkhorn distance, where the second term denotes the information entropy of P. One can increase the entropy

by making the distribution more homogeneous, i.e. giving everybody a more equal share of goods. The parameter A determines

the trade-off between the two terms: trying to give every store only goods from the closest factory (lowest value in the cost

matrix) or encouraging equal distributions. This is similar to regularization in, for example, ridge regression. Similar as that for

machine learning problems a tiny bit of shrinkage of the parameter can lead to an improved performance, the Sinkhorn distance

is also observed to work better than the Wasserstein distance on some problems. This is because we use a very natural prior on

the distribution matrix P: in absence of a cost, everything should be homogeneous.

In many situations the primary interest is not to obtain the optimal transportation map. Instead, we are often interested in using the

optimal transportation cost as a statistical divergence between two probability distributions. A statistical divergence is a function

that takes two probability distributions as input and outputs a non-negative number that is zero if and only if the two distributions

are identical. Statistical divergences such as the KL divergence are frequently used in statistics and machine learning as a way

of measuring dissimilarity between two probability distributions. For example, suppose you want to compare different recipes,

where every recipe is a set of different ingredients. There is a meaningful distance or similarity between two ingredients, but how

do you compare the recipes themselves? Using optimal transport boils down to finding the effort needed to turn one recipe into

another.

Wing

i et

RA RIANA A Ly)

1] TIBSRRXX\|

[PRY

Figure 3: Example of the computation of OT between two 1-D measures. The third figure shows the displacement interpolation

between the two using OT.

5] Radford, Alec, Luke Metz, and Soumith Chintala. ’Unsupervised representation learning with deep convolutional generative

adversarial networks.” arXiv preprint arXiv: 1511.06434 (2015).

[6] Peyré, Gabriel, and Marco Cuturi. ’Computational optimal transport.” Foundations and Trends® in Machine Learning 11.5-6

(2019): 355-607.

Exercises

Upload the file OT.ipynb and go trough the code. Afterwards, answer the following

question:

1. Upload your own images (of equal size) using the Files tab. Afterwards transfer the colors between the two images

using the provided notebook. Show the results and explain how the color histograms are transported, how 1s this different

from non optimal color swapping (e.g. just swapping the pixels)?

Upload the file WGAN . ipynb and go trough the code. Afterwards, try to answer the following question:

1. Train a fully connected minimax GAN and Wasserstein GAN on the MNIST dataset. Compare the performance of the

two GAN’s over the different iterations. Do you see an improvement in stability and quality of the generated samples?

Elaborate on the knowledge you have gained about optimal transport and the Wasserstein distance.

Write a report of maximum 2 pages (including text and figures) to discuss the

exercises of GAN and Optimal transport.

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

ReportANN.pdf

lasertrain.dat

laserpred.dat

getTimeSeriesTrainData.m

menu

Neural network The Santa Fe data set is obtained from a chaotic laser which can be described as a nonlinear dynamical system

Computer Science

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Related Questions