Recurrent neural networks

Feed forward network implementation to sequential data

Assume multiple time points.

Feed forward network implementation to sequential data

Assume multiple time points.

Feed forward network implementation to sequential data

Assume multiple time points.

Dependency of inputs not modelled such that ambiguous sequences cannot be distinguished:

“dog bites man” vs “man bites dog”

Feed forward network implementation to sequential data

Assume multiple time points.

Time points are modelled individually ( \(\hat{Y}_t = f(X_t)\) )

Feed forward network implementation to sequential data

Assume multiple time points.

Time points are modelled individually ( \(\hat{Y}_t = f(X_t)\) )
Also want dependency on previous inputs ( \(\hat{Y}_t = f(..., X_1, X_0)\) )

Adding recurrence relations

Folded representation

Unfolded representation

Add a hidden state \(h\) that introduces a dependency on the previous step:

\[ \hat{Y}_t = f(X_t, h_{t-1}) \]

Sequential memory of RNNs

RNNs have what one could call “sequential memory” (Phi, 2020)

Alphabet

Exercise: say alphabet in your head

A B C … X Y Z

Modification: start from e.g. letter F

May take time to get started, but from there on it’s easy

Now read the alphabet in reverse:

Z Y X … C B A

Memory access is associative and context-dependent

Recurrent Neural Networks

Add recurrence relation where current hidden cell state \(h_t\) depends on input \(x_t\) and previous hidden state \(h_{t-1}\) via a function \(f_W\) that defines the network parameters (weights):

\[ h_t = f_\mathbf{W}(x_t, h_{t-1}) \]

Note that the same function and weights are used across all time steps!

Recurrent Neural Networks - pseudocode

class RNN:
  # ...
  # Description of forward pass
  def step(self, x):
    # update the hidden state
    self.h = np.tanh(np.dot(self.W_hh, self.h) + np.dot(self.W_xh, x))
    # compute the output vector
    y = np.dot(self.W_hy, self.h)
    return y

rnn = RNN()
ff = FeedForwardNN()

for word in input:
    output = rnn.step(word)

prediction = ff(output)

Pseudocode examples, my example:

Error: NameError: name 'input_data' is not defined

Error: TypeError: can't multiply sequence by non-int of type 'list'

(Karpathy, 2015)

Error: AttributeError: 'RNN' object has no attribute 'step'

A two-layer network would look as follows:

Error: AttributeError: 'RNN' object has no attribute 'step'

Error: AttributeError: 'RNN' object has no attribute 'step'

Keras version (https://keras.io/api/layers/recurrent_layers/rnn/#rnn-class):

Also (Phi, 2020b)

Error: NameError: name 'FeedForwardNN' is not defined

Error: TypeError: 'builtin_function_or_method' object is not iterable

Error: NameError: name 'ff' is not defined

Also (Phi, 2020a) :

Error: NameError: name 'inputs' is not defined

Vanilla RNNs

Output vector

\[ \hat{Y}_t = \mathbf{W_{hy}^T}h_t \]

Update hidden state

\[ h_t = \mathsf{tanh}(\mathbf{W_{xh}^T}X_t + \mathbf{W_{hh}^T}h_{t-1}) \]

Input vector

\[ X_t \]

Vanilla RNNs

(Olah, 2015)

Vanilla RNNs

Note: \(\mathbf{W_{xh}}\), \(\mathbf{W_{hh}}\), and \(\mathbf{W_{hy}}\) are shared across all cells!

Desired features of RNN

1. Variable sequence lengths

Not all inputs are of equal length

2. Long-term memory

“I grew up in England, and … I speak fluent English”

3. Preserve order

“dog bites man” != “man bites dog”

Adresses points 2 and 3.

Example: Box & Jenkins airline passenger data set

(Onnen, 2021)

Example: generate test and training data

Partition time series into training and test data sets at an e.g. 2:1 ratio:

import rnnutils
import numpy as np
df = rnnutils.airlines()
data = np.array(
    df['passengers'].values
    .astype('float32')
).reshape(-1, 1)
train, test, scaler = rnnutils.make_train_test(data)

Example: prepare data for keras

time_steps = 12
trainX, trainY, trainX_indices, trainY_indices = rnnutils.make_xy(train, time_steps)
testX, testY, testX_indices, testY_indices = rnnutils.make_xy(test, time_steps)

Example: create vanilla RNN model

from keras.models import Sequential
from keras.layers import Dense, SimpleRNN

model = Sequential()
model.add(SimpleRNN(units=3, input_shape=(time_steps, 1),
                    activation="tanh"))
model.add(Dense(units=1, activation="tanh"))
model.compile(loss='mean_squared_error', optimizer='adam')
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 simple_rnn (SimpleRNN)      (None, 3)                 15        
                                                                 
 dense (Dense)               (None, 1)                 4         
                                                                 
=================================================================
Total params: 19
Trainable params: 19
Non-trainable params: 0
_________________________________________________________________

On RNN layers and time steps (https://keras.io/guides/working_with_rnns/):

the units correspond to the output size (yhat)
an RNN layer processes batches of input sequences
an RNN layer loops RNN cells that process one input at a time (e.g. one word, one time point)

Also from the source code (LSTMCell):

This class processes one step within the whole time sequence input, whereas `tf.keras.layer.LSTM` processes the whole sequence.

The number of LSTMCells is defined by the units parameter, e.g.:

Error: NameError: name 'tf' is not defined

Example: fit the model and evaluate

history = model.fit(trainX, trainY, epochs=20, batch_size=1, verbose=2)
Ytrainpred = model.predict(trainX)
Ytestpred = model.predict(testX)

rnnutils.plot_history(history)

data = {'train': (model.predict(trainX), train, trainY_indices),
        'test': (model.predict(testX), test, testY_indices)}
rnnutils.plot_pred(data, scaler=scaler, ticks=range(0, 144, 20),
                   labels=df.year[range(0, 144, 20)])

Example: model topology writ out

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 simple_rnn (SimpleRNN)      (None, 3)                 15        
                                                                 
 dense (Dense)               (None, 1)                 4         
                                                                 
=================================================================
Total params: 19
Trainable params: 19
Non-trainable params: 0
_________________________________________________________________

Example: model topology writ out

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 simple_rnn (SimpleRNN)      (None, 3)                 15        
                                                                 
 dense (Dense)               (None, 1)                 4         
                                                                 
=================================================================
Total params: 19
Trainable params: 19
Non-trainable params: 0
_________________________________________________________________

NB! In keras, RNN input is a 3D tensor with shape [batch, timesteps, feature]

(Verma, 2021)

An RNN in numbers

(Karpathy, 2015)

Example network trained on “hello” showing activations in forward pass given input “hell”. The outputs contain confidences in outputs (vocabulary={h, e, l, o}). We want blue numbers high, red numbers low. P(e) is in context of “h”, P(l) in context of “he” and so on.

What is the topology of the network?

4 input units (features), 4 time steps, 3 hidden units, 4 output units

Exercise

See if you can improve the airline passenger model. Some things to try:

change the number of units
change time_steps
change the number of epochs