Loading [MathJax]/jax/output/CommonHTML/jax.js
+ - 0:00:00
Notes for current slide
Notes for next slide

Lab 005

Deep Learning and You

Connor Lennon

30 March 2021

1 / 67

Deep Learning and Neural Nets

2 / 67

Why so exciting?

  • Aside from having a name that sounds like it came straight out of Neuromancer, why is everyone so excited about Neural Networks?
3 / 67

Why so exciting?

  • Aside from having a name that sounds like it came straight out of Neuromancer, why is everyone so excited about Neural Networks?

  • Imagine trying to map a complex pattern to some outcome. Maybe you're trying to recognize whether an image is a dog, or a blueberry muffin.

3 / 67

Why so exciting?

  • Aside from having a name that sounds like it came straight out of Neuromancer, why is everyone so excited about Neural Networks?

  • Imagine trying to map a complex pattern to some outcome. Maybe you're trying to recognize whether an image is a dog, or a blueberry muffin.

3 / 67

Why so exciting?

  • How do we find the dog/muffin functional form?
4 / 67

Why so exciting?

  • How do we find the dog/muffin functional form? What kind of expert dog-ology field can we turn to in order to solve our problem?
4 / 67

Why so exciting?

  • How do we find the dog/muffin functional form? What kind of expert dog-ology field can we turn to in order to solve our problem?
  • However, you can recognize which is which, right?
4 / 67

Why so exciting?

  • How do we find the dog/muffin functional form? What kind of expert dog-ology field can we turn to in order to solve our problem?
  • However, you can recognize which is which, right?

Have you taken a course in dog-ology?

4 / 67

Why so exciting?

  • How do we find the dog/muffin functional form? What kind of expert dog-ology field can we turn to in order to solve our problem?
  • However, you can recognize which is which, right?

Have you taken a course in dog-ology? Probably not.

4 / 67

Why so exciting?

You've seen dogs, you've seen muffins, and somehow your brain has found an unknown way to tell the difference. What about letters/numbers?

5 / 67

Why so exciting?

You've seen dogs, you've seen muffins, and somehow your brain has found an unknown way to tell the difference. What about letters/numbers?

5 / 67

Why so exciting?

You've seen dogs, you've seen muffins, and somehow your brain has found an unknown way to tell the difference. What about letters/numbers?

That's what neural networks are trying to imitate - that process of taking one set of sensory inputs, and through repetition, finding patterns that map to some internal set of labels.

5 / 67

Why so exciting?

You've seen dogs, you've seen muffins, and somehow your brain has found an unknown way to tell the difference. What about letters/numbers?

That's what neural networks are trying to imitate - that process of taking one set of sensory inputs, and through repetition, finding patterns that map to some internal set of labels. But, in order to understand neural networks, we should do a little foundational history on them.

5 / 67

The building block: perceptron

Where did the first 'neural network' come from

  • Neural networks (in their most basic form) are actually one of the oldest (if not THE oldest) machine learning tool still in use today.
6 / 67

The building block: perceptron

Where did the first 'neural network' come from

  • Neural networks (in their most basic form) are actually one of the oldest (if not THE oldest) machine learning tool still in use today.

  • Invented by McCulloch and Pitts in 1943*, the perceptron (the most fundamental element of a neural network) conceptually predate Alan Turing's machine.

* McCulloch, W. S. and Pitts, W. 1943. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5:115–133..

6 / 67

The building block: perceptron

Where did the first 'neural network' come from

  • Neural networks (in their most basic form) are actually one of the oldest (if not THE oldest) machine learning tool still in use today.

  • Invented by McCulloch and Pitts in 1943*, the perceptron (the most fundamental element of a neural network) conceptually predate Alan Turing's machine.

* McCulloch, W. S. and Pitts, W. 1943. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5:115–133..

You might ask yourself, why would anyone bother with designing an algorithm that takes hours of compute time on a high-end machine TODAY when at the time, they lacked even a rudimentary computer?

6 / 67

The building block: perceptron

Where did the first 'neural network' come from

  • Neural networks (in their most basic form) are actually one of the oldest (if not THE oldest) machine learning tool still in use today.

  • Invented by McCulloch and Pitts in 1943*, the perceptron (the most fundamental element of a neural network) conceptually predate Alan Turing's machine.

* McCulloch, W. S. and Pitts, W. 1943. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5:115–133..

You might ask yourself, why would anyone bother with designing an algorithm that takes hours of compute time on a high-end machine TODAY when at the time, they lacked even a rudimentary computer? The intent was to build a model of a single neuron.

6 / 67

The building block: perceptron

In order to understand how the intuition for how a perceptron works mathematically, having a basic understanding of the biological mechanics of a neuron is useful.

7 / 67

The building block: perceptron

In order to understand how the intuition for how a perceptron works mathematically, having a basic understanding of the biological mechanics of a neuron is useful.

* Source: wikipedia. No, they didn't endorse my usage of this diagram.

7 / 67

The building block: perceptron

In order to understand how the intuition for how a perceptron works mathematically, having a basic understanding of the biological mechanics of a neuron is useful.

* Source: wikipedia. No, they didn't endorse my usage of this diagram.

* Moar JPEG

8 / 67

The building block: perceptron

Ok, so how do we model this process using only the mathematical tools we have available?

9 / 67

The building block: perceptron

Ok, so how do we model this process using only the mathematical tools we have available?

  • We need a function that takes a consistent set of inputs and maps them efficiently to some (potentially unknown) outputs.
9 / 67

The building block: perceptron

Ok, so how do we model this process using only the mathematical tools we have available?

  • We need a function that takes a consistent set of inputs and maps them efficiently to some (potentially unknown) outputs.
  • Let's start by simplifying the problem, and only modeling 'activated/not activated' once some threshold of charge is reached.
9 / 67

The building block: perceptron

Ok, so how do we model this process using only the mathematical tools we have available?

  • We need a function that takes a consistent set of inputs and maps them efficiently to some (potentially unknown) outputs.
  • Let's start by simplifying the problem, and only modeling 'activated/not activated' once some threshold of charge is reached.
9 / 67

Perceptron, formally

Rules

  1. output=y{0,1}

  2. inputs=xn0,1, and |X|=N

  3. Threshold value=Θ

  4. σ(X)=1 ifNk=1xk>Θ, else=0

10 / 67

Perceptron, formally

Put another way:

11 / 67

Perceptron, formally

Put another way:

We're going to:

11 / 67

Perceptron, formally

Put another way:

We're going to:

  • take Xs (inputs), multiply each xi by a weight wi*, and then scale that value from 0-1, using a step function where iwi=Θ

* This was a slight modification by Frank Rosenblatt who built the first computer-based perceptron

11 / 67

Perceptron, formally

Put another way:

We're going to:

  • take Xs (inputs), multiply each xi by a weight wi*, and then scale that value from 0-1, using a step function where iwi=Θ

* This was a slight modification by Frank Rosenblatt who built the first computer-based perceptron

11 / 67

Perceptron, formally

Put another way:

We're going to:

  • take Xs (inputs), multiply each xi by a weight wi*, and then scale that value from 0-1, using a step function where iwi=Θ

* This was a slight modification by Frank Rosenblatt who built the first computer-based perceptron

Now, we're going to make a minor change
11 / 67

Perceptron, formally

12 / 67

Perceptron, formally

Two things to notice:

12 / 67

Perceptron, formally

Two things to notice:

  • we now have an activation function that replaces Θ.
12 / 67

Perceptron, formally

Two things to notice:

  • we now have an activation function that replaces Θ.

  • T? Why is that there? That's an in-built assumption that learning takes time. It also is critical in how we go about solving this thing.

12 / 67

Perceptron, how do they work?

  • You might have wondered: how in the world do we solve this thing?
13 / 67

Perceptron, how do they work?

  • You might have wondered: how in the world do we solve this thing?

  • The answer:

13 / 67

Perceptron, how do they work?

  • You might have wondered: how in the world do we solve this thing?

  • The answer: we guess. We start with some random set of weights.

13 / 67

Perceptron, how do they work?

  • You might have wondered: how in the world do we solve this thing?

  • The answer: we guess. We start with some random set of weights.

  • Then we update, using information from our accuracy to infer how well our weights are informing those guesses. Eventually, we'll get something meaningful

13 / 67

Perceptron, how do they work?

  • You might have wondered: how in the world do we solve this thing?

  • The answer: we guess. We start with some random set of weights.

  • Then we update, using information from our accuracy to infer how well our weights are informing those guesses. Eventually, we'll get something meaningful

  • This is why the change from the original step-wise function is so important:

13 / 67

Perceptron, how do they work?

  • You might have wondered: how in the world do we solve this thing?

  • The answer: we guess. We start with some random set of weights.

  • Then we update, using information from our accuracy to infer how well our weights are informing those guesses. Eventually, we'll get something meaningful

  • This is why the change from the original step-wise function is so important:

The linear activation function buys us something that the simple step-wise function could not:

13 / 67

Perceptron, how do they work?

  • You might have wondered: how in the world do we solve this thing?

  • The answer: we guess. We start with some random set of weights.

  • Then we update, using information from our accuracy to infer how well our weights are informing those guesses. Eventually, we'll get something meaningful

  • This is why the change from the original step-wise function is so important:

The linear activation function buys us something that the simple step-wise function could not: differentiability.

13 / 67

Perceptron, how do they work?

  • You might have wondered: how in the world do we solve this thing?

  • The answer: we guess. We start with some random set of weights.

  • Then we update, using information from our accuracy to infer how well our weights are informing those guesses. Eventually, we'll get something meaningful

  • This is why the change from the original step-wise function is so important:

The linear activation function buys us something that the simple step-wise function could not: differentiability.

  • This means we can identify some cost function and attempt to identify a method to minimize that cost function - using the derivative Lwti to update the weights from wti to wt+1i.
13 / 67

Perceptron, how do they work?

Lucky for us, we can define 12SSE as our loss function, and then Lwti is fairly easy.

for each training sample... in each iteration

14 / 67

Perceptron, how do they work?

Lucky for us, we can define 12SSE as our loss function, and then Lwti is fairly easy.

for each training sample... in each iteration

1.) z=wt0+wt1x1wtnxn=(Wt)Tx.

wt0 is a scalar (bias). You will sometimes see this as bt

14 / 67

Perceptron, how do they work?

Lucky for us, we can define 12SSE as our loss function, and then Lwti is fairly easy.

for each training sample... in each iteration

1.) z=wt0+wt1x1wtnxn=(Wt)Tx.

wt0 is a scalar (bias). You will sometimes see this as bt

2.) ϕ(z(i))=ˆy(i)

14 / 67

Perceptron, how do they work?

Lucky for us, we can define 12SSE as our loss function, and then Lwti is fairly easy.

for each training sample... in each iteration

1.) z=wt0+wt1x1wtnxn=(Wt)Tx.

wt0 is a scalar (bias). You will sometimes see this as bt

2.) ϕ(z(i))=ˆy(i)

3.) η=learning rate

14 / 67

Perceptron, how do they work?

Lucky for us, we can define 12SSE as our loss function, and then Lwti is fairly easy.

for each training sample... in each iteration

1.) z=wt0+wt1x1wtnxn=(Wt)Tx.

wt0 is a scalar (bias). You will sometimes see this as bt

2.) ϕ(z(i))=ˆy(i)

3.) η=learning rate

Loss function: J(wti)=12i(y(i)ϕ(z)(i))2

14 / 67

Perceptron, how do they work?

Lucky for us, we can define 12SSE as our loss function, and then Lwti is fairly easy.

for each training sample... in each iteration

1.) z=wt0+wt1x1wtnxn=(Wt)Tx.

wt0 is a scalar (bias). You will sometimes see this as bt

2.) ϕ(z(i))=ˆy(i)

3.) η=learning rate

Loss function: J(wti)=12i(y(i)ϕ(z)(i))2

Jwtj=(y(i)ϕ(z)(i))x(i)j, Wt+1=WtηJ(Wt)

14 / 67

Perceptron, how do they work?

Let's see this in action:

15 / 67

Perceptron, how do they work?

Let's see this in action:

reddit perceptron Source: Reddit
15 / 67

Perceptron, how do they work?

You'll notice that the linear 'activation function' produces a linear decision boundary.

16 / 67

Perceptron, how do they work?

You'll notice that the linear 'activation function' produces a linear decision boundary.

  • So what's the big deal? We could do that, and more, with SVMs.
16 / 67

Perceptron, how do they work?

You'll notice that the linear 'activation function' produces a linear decision boundary.

  • So what's the big deal? We could do that, and more, with SVMs.

  • Unlike with SVM, perceptrons improve over time.

16 / 67

Perceptron, how do they work?

You'll notice that the linear 'activation function' produces a linear decision boundary.

  • So what's the big deal? We could do that, and more, with SVMs.

  • Unlike with SVM, perceptrons improve over time. Because a perceptron can hypothetically continue to improve so long as it keeps seeing any loss, the only upper limit to a perceptron's performance is how flexible the activation function is.

16 / 67

Perceptron, how do they work?

You'll notice that the linear 'activation function' produces a linear decision boundary.

  • So what's the big deal? We could do that, and more, with SVMs.

  • Unlike with SVM, perceptrons improve over time. Because a perceptron can hypothetically continue to improve so long as it keeps seeing any loss, the only upper limit to a perceptron's performance is how flexible the activation function is.

But there's more than that: because we're using derivatives to update the weights, we can link a bunch of neurons together and use the chain rule to optimize the model weights W as they work together to find the pattern that maps XY.

16 / 67

Perceptron, how do they work?

You'll notice that the linear 'activation function' produces a linear decision boundary.

  • So what's the big deal? We could do that, and more, with SVMs.

  • Unlike with SVM, perceptrons improve over time. Because a perceptron can hypothetically continue to improve so long as it keeps seeing any loss, the only upper limit to a perceptron's performance is how flexible the activation function is.

But there's more than that: because we're using derivatives to update the weights, we can link a bunch of neurons together and use the chain rule to optimize the model weights W as they work together to find the pattern that maps XY.

Well what do you get when you glue a bunch of neurons together?

16 / 67

Perceptron, how do they work?

You'll notice that the linear 'activation function' produces a linear decision boundary.

  • So what's the big deal? We could do that, and more, with SVMs.

  • Unlike with SVM, perceptrons improve over time. Because a perceptron can hypothetically continue to improve so long as it keeps seeing any loss, the only upper limit to a perceptron's performance is how flexible the activation function is.

But there's more than that: because we're using derivatives to update the weights, we can link a bunch of neurons together and use the chain rule to optimize the model weights W as they work together to find the pattern that maps XY.

Well what do you get when you glue a bunch of neurons together? A brain!*

16 / 67

Neural Networks

By breaking down a complex mapping task into a series of steps, we can use large collections of modified perceptrons to universally approximate any functional form.

17 / 67

Neural Networks

By breaking down a complex mapping task into a series of steps, we can use large collections of modified perceptrons to universally approximate any functional form.

  • You already know how to do this*, you can extend the model yourself!

* with a liiitle more calculus

17 / 67

Neural Networks

By breaking down a complex mapping task into a series of steps, we can use large collections of modified perceptrons to universally approximate any functional form.

  • You already know how to do this*, you can extend the model yourself!

* with a liiitle more calculus

  • Imagine making several perceptrons that each learn patterns in parallel that simultaneously are trying to minimize your loss function.
17 / 67

Neural Networks

By breaking down a complex mapping task into a series of steps, we can use large collections of modified perceptrons to universally approximate any functional form.

  • You already know how to do this*, you can extend the model yourself!

* with a liiitle more calculus

  • Imagine making several perceptrons that each learn patterns in parallel that simultaneously are trying to minimize your loss function. These are called layers
17 / 67

Layer Gif

imperial college gif

Source: Imperial College

18 / 67

Layer?

imperial college gif

Layer seems to imply we could have multiple sets of perceptrons?

19 / 67

Layer?

imperial college gif

Layer seems to imply we could have multiple sets of perceptrons?

  • We can!
19 / 67

Layer?

imperial college gif

Layer seems to imply we could have multiple sets of perceptrons?

  • We can!

All we do is make the y0, y1, y2 feed into a new set of perceptrons and have those new perceptrons find the patterns in the output of our old ones.

19 / 67

Layer?

imperial college gif

Layer seems to imply we could have multiple sets of perceptrons?

  • We can!

All we do is make the y0, y1, y2 feed into a new set of perceptrons and have those new perceptrons find the patterns in the output of our old ones.

that makes the output a little more complex, because now ˆy is equal to σ2(σ1(X))

19 / 67

Layer?

imperial college gif

Layer seems to imply we could have multiple sets of perceptrons?

  • We can!

All we do is make the y0, y1, y2 feed into a new set of perceptrons and have those new perceptrons find the patterns in the output of our old ones.

that makes the output a little more complex, because now ˆy is equal to σ2(σ1(X))

Where σi() is the perceptron weighting and activation function for layer i

19 / 67

Visualizing a Neural Network

20 / 67

Backpropagation

How do we update our weights using this new function? The main difference is that we need a way of creating an error for EVERY layer. This requires...

21 / 67

Backpropagation

How do we update our weights using this new function? The main difference is that we need a way of creating an error for EVERY layer. This requires...

  • the chain rule (the scariest of rules)

F(g(x))=f(g(x))g(x)

21 / 67

Backpropagation

How do we update our weights using this new function? The main difference is that we need a way of creating an error for EVERY layer. This requires...

  • the chain rule (the scariest of rules)

F(g(x))=f(g(x))g(x)

For the output layer, we can define our error (δLj) by

δLj=CaLjσ(zLj)

21 / 67

Backpropagation

How do we update our weights using this new function? The main difference is that we need a way of creating an error for EVERY layer. This requires...

  • the chain rule (the scariest of rules)

F(g(x))=f(g(x))g(x)

For the output layer, we can define our error (δLj) by

δLj=CaLjσ(zLj)

We can compute everything in here

21 / 67

Backpropagation

δLj=CaLjσ(zLj)

CaLj=(aLjyj) (for our cost function)

22 / 67

Backpropagation

δLj=CaLjσ(zLj)

CaLj=(aLjyj) (for our cost function)

we already have zLj, and plugging it into σ is not difficult. The vector form of this is...

δL=(aLy)σ(zL)

Which we can also compute.

22 / 67

Backpropagation

Now all that's left is to find the derivatives for the other layer errors - however, those are going to be functions of the output error- let's look at some arbitrary layer l{1,,L} where l is the layer before l+1

δl=((wl+1)Tδl+1)σ(zl)

Where wl+1weight matrix for layer l+1

23 / 67

Backpropagation

Now all that's left is to find the derivatives for the other layer errors - however, those are going to be functions of the output error- let's look at some arbitrary layer l{1,,L} where l is the layer before l+1

δl=((wl+1)Tδl+1)σ(zl)

Where wl+1weight matrix for layer l+1

Now, we can use what we found to calculate δL, which can be plugged in to find δL1 and so on, until we reach the first layer.

23 / 67

Backpropagation

Now all that's left is to find the derivatives for the other layer errors - however, those are going to be functions of the output error- let's look at some arbitrary layer l{1,,L} where l is the layer before l+1

δl=((wl+1)Tδl+1)σ(zl)

Where wl+1weight matrix for layer l+1

Now, we can use what we found to calculate δL, which can be plugged in to find δL1 and so on, until we reach the first layer.

But what we really need is Cwlj to update the weights.

23 / 67

Backpropagation

In order to update the weights, we really need to know how our costs are changing as a result of our chosen weights. This is actually super simple to do given what we already know:

  • Cwlj,k=al1kδlj

Or,

24 / 67

Backpropagation

In order to update the weights, we really need to know how our costs are changing as a result of our chosen weights. This is actually super simple to do given what we already know:

  • Cwlj,k=al1kδlj

Or,

  • Cwl=ainputδloutput

Where ainput is the input to the weight for layer l and δoutput is the error of the output from the weight wl

24 / 67

Backpropagation

In order to update the weights, we really need to know how our costs are changing as a result of our chosen weights. This is actually super simple to do given what we already know:

  • Cwlj,k=al1kδlj

Or,

  • Cwl=ainputδloutput

Where ainput is the input to the weight for layer l and δoutput is the error of the output from the weight wl

Thus, the MSE for the guess ˆy of y will trickle through the weights and update them as the algorithm learns.

24 / 67

Backpropagation

In order to update the weights, we really need to know how our costs are changing as a result of our chosen weights. This is actually super simple to do given what we already know:

  • Cwlj,k=al1kδlj

Or,

  • Cwl=ainputδloutput

Where ainput is the input to the weight for layer l and δoutput is the error of the output from the weight wl

Thus, the MSE for the guess ˆy of y will trickle through the weights and update them as the algorithm learns. The error from the output trickles back through the layers, propagating changes across all the weights at once

24 / 67

Backpropagation

Let's see this in action:

backprop

* Source: Medium.com

25 / 67

Your errors

You can think of this process as finding the bottom of a bowl by rolling a ball (with no momentum) down the side of it. The method we used before is called gradient descent, but there are others (we'll see those later.)

26 / 67

Your errors

You can think of this process as finding the bottom of a bowl by rolling a ball (with no momentum) down the side of it. The method we used before is called gradient descent, but there are others (we'll see those later.)

As the ball gets close to the bottom of the bowl, it will slow down and eventually stop at the lowest point.

26 / 67

Your errors

You can think of this process as finding the bottom of a bowl by rolling a ball (with no momentum) down the side of it. The method we used before is called gradient descent, but there are others (we'll see those later.)

As the ball gets close to the bottom of the bowl, it will slow down and eventually stop at the lowest point.

  • Let's look at how gradient descent finds its way to the optimal point:
backprop
26 / 67

Other Elements

Unlike other machine learning models we've seen so far, Neural Networks learn over what are called epochs.

27 / 67

Other Elements

Unlike other machine learning models we've seen so far, Neural Networks learn over what are called epochs.

  • dfn Epoch: One full pass of the model over your training data set (updating the weights for each pass)
27 / 67

Other Elements

Unlike other machine learning models we've seen so far, Neural Networks learn over what are called epochs.

  • dfn Epoch: One full pass of the model over your training data set (updating the weights for each pass)

  • They do this because they can still learn from data they have already seen before, so long as the network output has a non-zero error and they haven't reached the optimization point for the data they've seen.

Let's watch some neural networks do their thing. Go here and play around with the tensorflow app: playground.

  • Check out the spiral to see the 'hardest' pattern to fit.
27 / 67

Some pitfalls of neural networks

  • They are extremely flexible - and as we know already, that means they are prone to overfitting. Even more so than the algorithms you've seen so far.
28 / 67

Some pitfalls of neural networks

  • They are extremely flexible - and as we know already, that means they are prone to overfitting. Even more so than the algorithms you've seen so far.

  • They also have a ton of components to consider. You have an activation function to choose, you have a number of layers, you have the number of nodes IN those layers...

28 / 67

Some pitfalls of neural networks

29 / 67

Some pitfalls of neural networks

  • this makes cross-validation much more difficult. They are much more reliant on either guessing and checking (bad) or having experience with them (better).
30 / 67

Some pitfalls of neural networks

  • this makes cross-validation much more difficult. They are much more reliant on either guessing and checking (bad) or having experience with them (better).

  • Further, because they climb to an optimum, there's no guarantee that the solution you find is the best one you could have found for your data. It's highly dependent on how you're updating your weights (optimization method) and where you started.

30 / 67

Some pitfalls of neural networks

  • this makes cross-validation much more difficult. They are much more reliant on either guessing and checking (bad) or having experience with them (better).

  • Further, because they climb to an optimum, there's no guarantee that the solution you find is the best one you could have found for your data. It's highly dependent on how you're updating your weights (optimization method) and where you started.

How we've updated our weights so far has used something called gradient descent.

30 / 67

Some pitfalls of neural networks

  • this makes cross-validation much more difficult. They are much more reliant on either guessing and checking (bad) or having experience with them (better).

  • Further, because they climb to an optimum, there's no guarantee that the solution you find is the best one you could have found for your data. It's highly dependent on how you're updating your weights (optimization method) and where you started.

How we've updated our weights so far has used something called gradient descent.

These two reasons are why the methods often used to minimize a cost function are super bizarre.

30 / 67

Common Optimization Techniques

An example, Stochastic Gradient Descent:

Rather than calculate the error across all points in the sample, only calculate the error for a single point at a time to speed up the process and avoid overfit/getting stuck at the same time.

Let's see how some of these weird functions act next to gradient descent red ball.

31 / 67

Common Optimization Techniques

An example, Stochastic Gradient Descent:

Rather than calculate the error across all points in the sample, only calculate the error for a single point at a time to speed up the process and avoid overfit/getting stuck at the same time.

Let's see how some of these weird functions act next to gradient descent red ball.

optimizers
31 / 67

Programming a Neural Network

It is perfectly possible to code a neural network by hand, but by far the most common tool used to write neural networks for production is to use Tensorflow and keras.

32 / 67

Programming a Neural Network

It is perfectly possible to code a neural network by hand, but by far the most common tool used to write neural networks for production is to use Tensorflow and keras.

The main downside of these is that they're both written under the hood in python, which means it may take some wrangling to get tensorflow for Rstudio to work on your computer.

32 / 67

Programming a Neural Network

It is perfectly possible to code a neural network by hand, but by far the most common tool used to write neural networks for production is to use Tensorflow and keras.

The main downside of these is that they're both written under the hood in python, which means it may take some wrangling to get tensorflow for Rstudio to work on your computer.

Both of these work together and use a strange representation of data called a 'tensor'.

32 / 67

Programming a Neural Network

It is perfectly possible to code a neural network by hand, but by far the most common tool used to write neural networks for production is to use Tensorflow and keras.

The main downside of these is that they're both written under the hood in python, which means it may take some wrangling to get tensorflow for Rstudio to work on your computer.

Both of these work together and use a strange representation of data called a 'tensor'. I don't have enough time to explain what a tensor is, but luckily this adorable human does a much better job than I ever could: tensors

32 / 67

Programming a Neural Network

It is perfectly possible to code a neural network by hand, but by far the most common tool used to write neural networks for production is to use Tensorflow and keras.

The main downside of these is that they're both written under the hood in python, which means it may take some wrangling to get tensorflow for Rstudio to work on your computer.

Both of these work together and use a strange representation of data called a 'tensor'. I don't have enough time to explain what a tensor is, but luckily this adorable human does a much better job than I ever could: tensors

Tensorflow lets us build a model sequentially, layer by layer. This is very similar to how tidymodels lets you build your data process.

32 / 67

Tensorflow & Keras

Like in tidymodels, tensorflow starts with a model-type object.

33 / 67

Tensorflow & Keras

Like in tidymodels, tensorflow starts with a model-type object. For our purposes, this is keras_model_sequential().

33 / 67

Tensorflow & Keras

Like in tidymodels, tensorflow starts with a model-type object. For our purposes, this is keras_model_sequential().

This tells tensorflow we're going to build our neural network sequentially.

33 / 67

Tensorflow & Keras

Like in tidymodels, tensorflow starts with a model-type object. For our purposes, this is keras_model_sequential().

This tells tensorflow we're going to build our neural network sequentially.

Just like parsnip, we build an abstration of a model that will be fed data to train on later.

33 / 67

MNIST dataset

First, let's remind ourselves what our neural network is trying to do:

34 / 67

MNIST dataset

First, let's remind ourselves what our neural network is trying to do:

  • We want our neural network to read handwritten numbers and tell us what number they are supposed to represent.
34 / 67

MNIST dataset

First, let's remind ourselves what our neural network is trying to do:

  • We want our neural network to read handwritten numbers and tell us what number they are supposed to represent.

Just like with all machine learning, we need to really understand our data to do a good job predicting. Let's take a closer look at one observation in our dataset.

34 / 67

MNIST dataset

mnist <- dataset_fashion_mnist()
mnist$train$x
  • WOOOO that was a bad idea. Here's an illustration of what each xi looks like:

35 / 67

MNIST data

You don't trust me, fine. We can look at our stuff too.

36 / 67

MNIST data

How about the first 25

37 / 67

MNIST

Ok, that's great, but remember we need to set up this problem so we can turn a picture into an outcome (the numbers 0-9). How do we do it?

38 / 67

MNIST

Ok, that's great, but remember we need to set up this problem so we can turn a picture into an outcome (the numbers 0-9). How do we do it?

Well, we can squish the data so that each of these matrices in just a very long row of Xs. (28*28 = 784 different variables.)

38 / 67

MNIST

Ok, that's great, but remember we need to set up this problem so we can turn a picture into an outcome (the numbers 0-9). How do we do it?

Well, we can squish the data so that each of these matrices in just a very long row of Xs. (28*28 = 784 different variables.)

In practice, we can do this with color images as well, with each RGB value acting as a different pixel score.

38 / 67

Tensorflow & Keras

Ok- let's do this in order.

model <- keras_model_sequential()
39 / 67

Tensorflow & Keras

Ok- let's do this in order.

model <- keras_model_sequential() %>%
layer_flatten(input_shape = c(28,28))
40 / 67

Tensorflow & Keras

Ok- let's do this in order.

model <- keras_model_sequential() %>%
layer_flatten(input_shape = c(28,28))

This layer takes our 28x28 pixel image and flatttens it into a vector so the model can read it. Think of this like a recipe step.

40 / 67

Tensorflow & Keras

Ok- let's do this in order.

model <- keras_model_sequential() %>%
layer_flatten(input_shape = c(28,28)) %>%
layer_dense(units = 128, activation = "relu")
41 / 67

Tensorflow & Keras

Ok- let's do this in order.

model <- keras_model_sequential() %>%
layer_flatten(input_shape = c(28,28)) %>%
layer_dense(units = 128, activation = "relu")

This is our first layer in our Neural network! It creates 128 different 'neurons' (or perceptrons) and sets their activation function to be a relu.

41 / 67

Tensorflow & Keras

Ok- let's do this in order.

model <- keras_model_sequential() %>%
layer_flatten(input_shape = c(28,28)) %>%
layer_dense(units = 128, activation = "relu")

This is our first layer in our Neural network! It creates 128 different 'neurons' (or perceptrons) and sets their activation function to be a relu.

  • What's a relu? It stands for: Rectified Linear Unit. It's all the rage these days, but it is super simple: ReLU(X)=max(0,X)
41 / 67

Tensorflow & Keras

Ok- let's do this in order.

model <- keras_model_sequential() %>%
layer_flatten(input_shape = c(28,28)) %>%
layer_dense(units = 128, activation = "relu", name= 'hiddenlayer') %>%
layer_dense(10, activation = "softmax", name = 'outputlayer')
42 / 67

Tensorflow & Keras

Ok- let's do this in order.

model <- keras_model_sequential() %>%
layer_flatten(input_shape = c(28,28)) %>%
layer_dense(units = 128, activation = "relu", name= 'hiddenlayer') %>%
layer_dense(10, activation = "softmax", name = 'outputlayer')

This is our output layer, and the activation it's using is simply - "classify this into one of 10 (number of nodes) classes, and guess the one with the largest probability."

42 / 67

Tensorflow & Keras

Ok- let's do this in order.

model <- keras_model_sequential() %>%
layer_flatten(input_shape = c(28,28)) %>%
layer_dense(units = 128, activation = "relu", name= 'hiddenlayer') %>%
layer_dense(10, activation = "softmax", name = 'outputlayer')

This is our output layer, and the activation it's using is simply - "classify this into one of 10 (number of nodes) classes, and guess the one with the largest probability."

And that's a neural network! I gave the layers names, because it will be easier to see what they do if I come back later. How do I do that?

42 / 67

Tensorflow & Keras

summary(model)

you can see how many parameters they have and check - it should be 784(28pix*28pix)128(number of hidden neurons)+128(w0 or b)=100,480

43 / 67

Tensorflow & Keras

you can see how many parameters they have and check - it should be 784(28pix*28pix)128(number of hidden neurons)+128(w0 or b)=100,480

44 / 67

Tensorflow & Keras

you can see how many parameters they have and check - it should be 784(28pix*28pix)128(number of hidden neurons)+128(w0 or b)=100,480

That is what's called a:

44 / 67

Tensorflow & Keras

you can see how many parameters they have and check - it should be 784(28pix*28pix)128(number of hidden neurons)+128(w0 or b)=100,480

That is what's called a:

  • dfn, Shitload: A lot.

of parameters.

44 / 67

Tensorflow & Keras

Now we need to prepare the model.

45 / 67

Tensorflow & Keras

Now we need to prepare the model.

This is done with a compile command. We need to give that command a loss function to use, an optimizer (we'll use Adam which I described earlier) and any metrics we're interested in (let's look at accuracy.)

45 / 67

Tensorflow & Keras

Now we need to prepare the model.

This is done with a compile command. We need to give that command a loss function to use, an optimizer (we'll use Adam which I described earlier) and any metrics we're interested in (let's look at accuracy.)

model %>% compile(
loss = "sparse_categorical_crossentropy",
optimizer = "Adam",
metrics = "accuracy"
)
45 / 67

Tensorflow & Keras

Now, we can use keras/tf to fit our model. First, let's separate our data.

mnist$train$x = mnist$train$x/255 #normalizes to (0,1)
mnist$test$x = mnist$test$x/255 #normalizes to (0,1)

I want you to see what it looks like when it runs, so I'm going to move to a new slide.

46 / 67

Tensorflow & Keras

model %>%
fit(
x = mnist$train$x, y = mnist$train$y,
epochs = 5,
validation_split = 0.3
)

47 / 67

Tensorflow & Keras

48 / 67

Tensorflow

And that's how you run a neural network! Ours is getting a ~ 92% accuracy rate classifying pictures into one of 10 different categories. If you run this for 40 epochs, you'll get ~ 95% accuracy.

49 / 67

Tensorflow

And that's how you run a neural network! Ours is getting a ~ 92% accuracy rate classifying pictures into one of 10 different categories. If you run this for 40 epochs, you'll get ~ 95% accuracy.

It's super easy.

49 / 67

Tensorflow

And that's how you run a neural network! Ours is getting a ~ 92% accuracy rate classifying pictures into one of 10 different categories. If you run this for 40 epochs, you'll get ~ 95% accuracy.

It's super easy.

It's too easy.

49 / 67

Tensorflow

And that's how you run a neural network! Ours is getting a ~ 92% accuracy rate classifying pictures into one of 10 different categories. If you run this for 40 epochs, you'll get ~ 95% accuracy.

It's super easy.

It's too easy.

Be careful... remember all of those things you've learned about model selection? They matter 10x more with deep learning.

49 / 67

Other types of networks

There are TONS of other cool structures, but they all work exactly like these under the hood - they just add fancy math that allows us to process our inputs in a slightly different way.

50 / 67

Other types of networks

There are TONS of other cool structures, but they all work exactly like these under the hood - they just add fancy math that allows us to process our inputs in a slightly different way.

CNN: Convolutional Neural Network. You know that flattening step? That kind of sucks. We're actually losing information on the location of the pixels when we do that. If only we could hold onto that information...

50 / 67

Other types of networks

There are TONS of other cool structures, but they all work exactly like these under the hood - they just add fancy math that allows us to process our inputs in a slightly different way.

CNN: Convolutional Neural Network. You know that flattening step? That kind of sucks. We're actually losing information on the location of the pixels when we do that. If only we could hold onto that information...

50 / 67

Other types of networks

There are TONS of other cool structures, but they all work exactly like these under the hood - they just add fancy math that allows us to process our inputs in a slightly different way.

CNN: Convolutional Neural Network. You know that flattening step? That kind of sucks. We're actually losing information on the location of the pixels when we do that. If only we could hold onto that information...

By using filters and feeding in data in a special way, we can hold onto the positional information of our numbers.

50 / 67

Other types of networks

What about timeseries? Wouldn't it be nice to be able to tell a neural network the 'order' the data should occur in. Handled by a class of estimators called RNN (or Transformers, more recently).

51 / 67

Other types of networks

What about timeseries? Wouldn't it be nice to be able to tell a neural network the 'order' the data should occur in. Handled by a class of estimators called RNN (or Transformers, more recently).

LSTM/GRU: Long-Short Term Memory/ Gated Recurrent Unit. Both of these adapt our neurons so they can 'hold onto' data further in the learning process and change it as they like. The diagram is ugly, but...

51 / 67

Other types of networks

What about timeseries? Wouldn't it be nice to be able to tell a neural network the 'order' the data should occur in. Handled by a class of estimators called RNN (or Transformers, more recently).

LSTM/GRU: Long-Short Term Memory/ Gated Recurrent Unit. Both of these adapt our neurons so they can 'hold onto' data further in the learning process and change it as they like. The diagram is ugly, but...

where each cell is now

51 / 67

Basic RNN Visualized

Recurrent Neural Network

52 / 67

LSTM init

lstm step 1

53 / 67

LSTM step 2

lstm drop cell-state?

54 / 67

LSTM step 3

lstm how much to add to cell-state?

55 / 67

LSTM final

final step

56 / 67

Running a LSTM in R

DateTimeGlobal_active_powerGlobal_reactive_powerVoltageGlobal_intensitySub_metering_1Sub_metering_2Sub_metering_3datetime
16/12/200617:24:004.220.41823518.401172006-12-16 17:24:00
16/12/200617:25:005.360.43623423  01162006-12-16 17:25:00
16/12/200617:26:005.370.49823323  02172006-12-16 17:26:00
16/12/200617:27:005.390.50223423  01172006-12-16 17:27:00
16/12/200617:28:003.670.52823615.801172006-12-16 17:28:00
16/12/200617:29:003.520.52223515  02172006-12-16 17:29:00
16/12/200617:30:003.7 0.52 23515.801172006-12-16 17:30:00
16/12/200617:31:003.7 0.52 23515.801172006-12-16 17:31:00
16/12/200617:32:003.670.51 23415.801172006-12-16 17:32:00
16/12/200617:33:003.660.51 23415.802162006-12-16 17:33:00
16/12/200617:34:004.450.49823319.601172006-12-16 17:34:00
16/12/200617:35:005.410.47 23323.201172006-12-16 17:35:00
16/12/200617:36:005.220.47823322.401162006-12-16 17:36:00
16/12/200617:37:005.270.39823322.602172006-12-16 17:37:00
16/12/200617:38:004.050.42223517.601172006-12-16 17:38:00
16/12/200617:39:003.380.28223714.200172006-12-16 17:39:00
16/12/200617:40:003.270.15223713.800172006-12-16 17:40:00
16/12/200617:41:003.430.15623714.400172006-12-16 17:41:00
16/12/200617:42:003.270    23713.800182006-12-16 17:42:00
16/12/200617:43:003.730    23616.400172006-12-16 17:43:00
16/12/200617:44:005.890    23325.400162006-12-16 17:44:00
16/12/200617:45:007.710    23133.200172006-12-16 17:45:00
16/12/200617:46:007.030    23230.600162006-12-16 17:46:00
16/12/200617:47:005.170    23422  00172006-12-16 17:47:00
16/12/200617:48:004.470    23519.400172006-12-16 17:48:00
16/12/200617:49:003.250    23713.600172006-12-16 17:49:00
16/12/200617:50:003.240    23613.600172006-12-16 17:50:00
16/12/200617:51:003.230    23613.600172006-12-16 17:51:00
16/12/200617:52:003.260    23513.800172006-12-16 17:52:00
16/12/200617:53:003.180    23513.400172006-12-16 17:53:00
16/12/200617:54:002.720    23511.600172006-12-16 17:54:00
16/12/200617:55:003.760.07623416.400172006-12-16 17:55:00
16/12/200617:56:004.340.09 23418.400162006-12-16 17:56:00
16/12/200617:57:004.510    23419.200172006-12-16 17:57:00
16/12/200617:58:004.060.2  23517.600172006-12-16 17:58:00
16/12/200617:59:002.470.05823710.400172006-12-16 17:59:00
16/12/200618:00:002.790.18 23811.800182006-12-16 18:00:00
16/12/200618:01:002.620.14423811  00172006-12-16 18:01:00
16/12/200618:02:002.770.11823811.600172006-12-16 18:02:00
16/12/200618:03:003.740.10823716.4016182006-12-16 18:03:00
16/12/200618:04:004.930.20223521  037162006-12-16 18:04:00
16/12/200618:05:006.050.19223326.2037172006-12-16 18:05:00
16/12/200618:06:006.750.18623229  036172006-12-16 18:06:00
16/12/200618:07:006.470.14423227.8037162006-12-16 18:07:00
16/12/200618:08:006.310.11623227  036172006-12-16 18:08:00
16/12/200618:09:004.460.13623519  037162006-12-16 18:09:00
16/12/200618:10:003.4 0.14823615  022182006-12-16 18:10:00
16/12/200618:11:003.090.15223713.8012172006-12-16 18:11:00
16/12/200618:12:003.730.14423616.4027172006-12-16 18:12:00
16/12/200618:13:002.310.16 2379.601172006-12-16 18:13:00
16/12/200618:14:002.390.15823710  01172006-12-16 18:14:00
16/12/200618:15:004.6 0.1  23421.4020172006-12-16 18:15:00
16/12/200618:16:004.520.07623419.609172006-12-16 18:16:00
16/12/200618:17:004.2 0.08223417.801172006-12-16 18:17:00
16/12/200618:18:004.470    23319.201162006-12-16 18:18:00
16/12/200618:19:002.850    23612  01172006-12-16 18:19:00
16/12/200618:20:002.930    23512.401172006-12-16 18:20:00
16/12/200618:21:002.940    23612.402172006-12-16 18:21:00
16/12/200618:22:002.930    23612.401172006-12-16 18:22:00
16/12/200618:23:002.930    23612.401172006-12-16 18:23:00
16/12/200618:24:003.450    23515.201172006-12-16 18:24:00
16/12/200618:25:004.870    23420.801172006-12-16 18:25:00
16/12/200618:26:004.870    23420.801172006-12-16 18:26:00
16/12/200618:27:004.870    23420.801172006-12-16 18:27:00
16/12/200618:28:003.180    23613.801172006-12-16 18:28:00
16/12/200618:29:002.920    23612.401172006-12-16 18:29:00
16/12/200618:30:002.930    23612.401172006-12-16 18:30:00
16/12/200618:31:002.910.05 23612.401172006-12-16 18:31:00
16/12/200618:32:002.610.05223511  01172006-12-16 18:32:00
16/12/200618:33:002.710.16223511.600172006-12-16 18:33:00
16/12/200618:34:003.540.08623415.601162006-12-16 18:34:00
16/12/200618:35:006.070    23226.4027172006-12-16 18:35:00
16/12/200618:36:004.540    23419.401172006-12-16 18:36:00
16/12/200618:37:004.410    23218.801162006-12-16 18:37:00
16/12/200618:38:002.910.04823413  01172006-12-16 18:38:00
16/12/200618:39:002.330.0542359.801172006-12-16 18:39:00
16/12/200618:40:002.260.0542359.601172006-12-16 18:40:00
16/12/200618:41:002.270.0542359.601172006-12-16 18:41:00
16/12/200618:42:002.260.0542359.601172006-12-16 18:42:00
16/12/200618:43:002.190.0682369.201172006-12-16 18:43:00
16/12/200618:44:002.980.16623513.201172006-12-16 18:44:00
16/12/200618:45:004.2 0.17423417.801172006-12-16 18:45:00
16/12/200618:46:004.2 0.18623417.801162006-12-16 18:46:00
16/12/200618:47:004.220.17823418  01172006-12-16 18:47:00
16/12/200618:48:002.790.18823512  02172006-12-16 18:48:00
16/12/200618:49:002.540.08823510.804172006-12-16 18:49:00
16/12/200618:50:002.5 0.08 23410.603172006-12-16 18:50:00
16/12/200618:51:002.340.07 23410  01162006-12-16 18:51:00
16/12/200618:52:002.320    2339.800172006-12-16 18:52:00
16/12/200618:53:002.450    23410.601172006-12-16 18:53:00
16/12/200618:54:004.3 0    23218.401162006-12-16 18:54:00
16/12/200618:55:004.230.09 23218.201172006-12-16 18:55:00
16/12/200618:56:004.230.09 23218.202162006-12-16 18:56:00
16/12/200618:57:003.920.08423317  01172006-12-16 18:57:00
16/12/200618:58:004.220.09 23218  01172006-12-16 18:58:00
16/12/200618:59:004.220.09 23218.201162006-12-16 18:59:00
16/12/200619:00:004.070.08823217.401172006-12-16 19:00:00
16/12/200619:01:003.610.09 23215.602162006-12-16 19:01:00
16/12/200619:02:003.460.09 23314.801172006-12-16 19:02:00
16/12/200619:03:003.430.09 23214.801162006-12-16 19:03:00
DateTimeGlobal_active_powerGlobal_reactive_powerVoltageGlobal_intensitySub_metering_1Sub_metering_2Sub_metering_3datetime
16/12/200617:24:004.220.41823518.401172006-12-16 17:24:00
16/12/200617:25:005.360.43623423  01162006-12-16 17:25:00
16/12/200617:26:005.370.49823323  02172006-12-16 17:26:00
16/12/200617:27:005.390.50223423  01172006-12-16 17:27:00
16/12/200617:28:003.670.52823615.801172006-12-16 17:28:00
16/12/200617:29:003.520.52223515  02172006-12-16 17:29:00
16/12/200617:30:003.7 0.52 23515.801172006-12-16 17:30:00
16/12/200617:31:003.7 0.52 23515.801172006-12-16 17:31:00
16/12/200617:32:003.670.51 23415.801172006-12-16 17:32:00
16/12/200617:33:003.660.51 23415.802162006-12-16 17:33:00
16/12/200617:34:004.450.49823319.601172006-12-16 17:34:00
16/12/200617:35:005.410.47 23323.201172006-12-16 17:35:00
16/12/200617:36:005.220.47823322.401162006-12-16 17:36:00
16/12/200617:37:005.270.39823322.602172006-12-16 17:37:00
16/12/200617:38:004.050.42223517.601172006-12-16 17:38:00
16/12/200617:39:003.380.28223714.200172006-12-16 17:39:00
16/12/200617:40:003.270.15223713.800172006-12-16 17:40:00
16/12/200617:41:003.430.15623714.400172006-12-16 17:41:00
16/12/200617:42:003.270    23713.800182006-12-16 17:42:00
16/12/200617:43:003.730    23616.400172006-12-16 17:43:00
16/12/200617:44:005.890    23325.400162006-12-16 17:44:00
16/12/200617:45:007.710    23133.200172006-12-16 17:45:00
16/12/200617:46:007.030    23230.600162006-12-16 17:46:00
16/12/200617:47:005.170    23422  00172006-12-16 17:47:00
16/12/200617:48:004.470    23519.400172006-12-16 17:48:00
16/12/200617:49:003.250    23713.600172006-12-16 17:49:00
16/12/200617:50:003.240    23613.600172006-12-16 17:50:00
16/12/200617:51:003.230    23613.600172006-12-16 17:51:00
16/12/200617:52:003.260    23513.800172006-12-16 17:52:00
16/12/200617:53:003.180    23513.400172006-12-16 17:53:00
16/12/200617:54:002.720    23511.600172006-12-16 17:54:00
16/12/200617:55:003.760.07623416.400172006-12-16 17:55:00
16/12/200617:56:004.340.09 23418.400162006-12-16 17:56:00
16/12/200617:57:004.510    23419.200172006-12-16 17:57:00
16/12/200617:58:004.060.2  23517.600172006-12-16 17:58:00
16/12/200617:59:002.470.05823710.400172006-12-16 17:59:00
16/12/200618:00:002.790.18 23811.800182006-12-16 18:00:00
16/12/200618:01:002.620.14423811  00172006-12-16 18:01:00
16/12/200618:02:002.770.11823811.600172006-12-16 18:02:00
16/12/200618:03:003.740.10823716.4016182006-12-16 18:03:00
16/12/200618:04:004.930.20223521  037162006-12-16 18:04:00
16/12/200618:05:006.050.19223326.2037172006-12-16 18:05:00
16/12/200618:06:006.750.18623229  036172006-12-16 18:06:00
16/12/200618:07:006.470.14423227.8037162006-12-16 18:07:00
16/12/200618:08:006.310.11623227  036172006-12-16 18:08:00
16/12/200618:09:004.460.13623519  037162006-12-16 18:09:00
16/12/200618:10:003.4 0.14823615  022182006-12-16 18:10:00
16/12/200618:11:003.090.15223713.8012172006-12-16 18:11:00
16/12/200618:12:003.730.14423616.4027172006-12-16 18:12:00
16/12/200618:13:002.310.16 2379.601172006-12-16 18:13:00
16/12/200618:14:002.390.15823710  01172006-12-16 18:14:00
16/12/200618:15:004.6 0.1  23421.4020172006-12-16 18:15:00
16/12/200618:16:004.520.07623419.609172006-12-16 18:16:00
16/12/200618:17:004.2 0.08223417.801172006-12-16 18:17:00
16/12/200618:18:004.470    23319.201162006-12-16 18:18:00
16/12/200618:19:002.850    23612  01172006-12-16 18:19:00
16/12/200618:20:002.930    23512.401172006-12-16 18:20:00
16/12/200618:21:002.940    23612.402172006-12-16 18:21:00
16/12/200618:22:002.930    23612.401172006-12-16 18:22:00
16/12/200618:23:002.930    23612.401172006-12-16 18:23:00
16/12/200618:24:003.450    23515.201172006-12-16 18:24:00
16/12/200618:25:004.870    23420.801172006-12-16 18:25:00
16/12/200618:26:004.870    23420.801172006-12-16 18:26:00
16/12/200618:27:004.870    23420.801172006-12-16 18:27:00
16/12/200618:28:003.180    23613.801172006-12-16 18:28:00
16/12/200618:29:002.920    23612.401172006-12-16 18:29:00
16/12/200618:30:002.930    23612.401172006-12-16 18:30:00
16/12/200618:31:002.910.05 23612.401172006-12-16 18:31:00
16/12/200618:32:002.610.05223511  01172006-12-16 18:32:00
16/12/200618:33:002.710.16223511.600172006-12-16 18:33:00
16/12/200618:34:003.540.08623415.601162006-12-16 18:34:00
16/12/200618:35:006.070    23226.4027172006-12-16 18:35:00
16/12/200618:36:004.540    23419.401172006-12-16 18:36:00
16/12/200618:37:004.410    23218.801162006-12-16 18:37:00
16/12/200618:38:002.910.04823413  01172006-12-16 18:38:00
16/12/200618:39:002.330.0542359.801172006-12-16 18:39:00
16/12/200618:40:002.260.0542359.601172006-12-16 18:40:00
16/12/200618:41:002.270.0542359.601172006-12-16 18:41:00
16/12/200618:42:002.260.0542359.601172006-12-16 18:42:00
16/12/200618:43:002.190.0682369.201172006-12-16 18:43:00
16/12/200618:44:002.980.16623513.201172006-12-16 18:44:00
16/12/200618:45:004.2 0.17423417.801172006-12-16 18:45:00
16/12/200618:46:004.2 0.18623417.801162006-12-16 18:46:00
16/12/200618:47:004.220.17823418  01172006-12-16 18:47:00
16/12/200618:48:002.790.18823512  02172006-12-16 18:48:00
16/12/200618:49:002.540.08823510.804172006-12-16 18:49:00
16/12/200618:50:002.5 0.08 23410.603172006-12-16 18:50:00
16/12/200618:51:002.340.07 23410  01162006-12-16 18:51:00
16/12/200618:52:002.320    2339.800172006-12-16 18:52:00
16/12/200618:53:002.450    23410.601172006-12-16 18:53:00
16/12/200618:54:004.3 0    23218.401162006-12-16 18:54:00
16/12/200618:55:004.230.09 23218.201172006-12-16 18:55:00
16/12/200618:56:004.230.09 23218.202162006-12-16 18:56:00
16/12/200618:57:003.920.08423317  01172006-12-16 18:57:00
16/12/200618:58:004.220.09 23218  01172006-12-16 18:58:00
16/12/200618:59:004.220.09 23218.201162006-12-16 18:59:00
16/12/200619:00:004.070.08823217.401172006-12-16 19:00:00
16/12/200619:01:003.610.09 23215.602162006-12-16 19:01:00
16/12/200619:02:003.460.09 23314.801172006-12-16 19:02:00
16/12/200619:03:003.430.09 23214.801162006-12-16 19:03:00
DateTimeGlobal_active_powerGlobal_reactive_powerVoltageGlobal_intensitySub_metering_1Sub_metering_2Sub_metering_3datetime
16/12/200617:24:000.263  0.792 0.0162 0.277 NaN0.027 0.05562006-12-16 17:24:00
16/12/200617:25:000.412  0.826 0.0111 0.416 NaN0.027 0     2006-12-16 17:25:00
16/12/200617:26:000.413  0.943 0.009690.416 NaN0.05410.05562006-12-16 17:26:00
16/12/200617:27:000.415  0.951 0.0116 0.416 NaN0.027 0.05562006-12-16 17:27:00
16/12/200617:28:000.192  1     0.0197 0.199 NaN0.027 0.05562006-12-16 17:28:00
16/12/200617:29:000.173  0.989 0.017  0.175 NaN0.05410.05562006-12-16 17:29:00
16/12/200617:30:000.196  0.985 0.0172 0.199 NaN0.027 0.05562006-12-16 17:30:00
16/12/200617:31:000.196  0.985 0.0178 0.199 NaN0.027 0.05562006-12-16 17:31:00
16/12/200617:32:000.192  0.966 0.0126 0.199 NaN0.027 0.05562006-12-16 17:32:00
16/12/200617:33:000.191  0.966 0.0121 0.199 NaN0.05410     2006-12-16 17:33:00
16/12/200617:34:000.293  0.943 0.007890.313 NaN0.027 0.05562006-12-16 17:34:00
16/12/200617:35:000.418  0.89  0.007550.422 NaN0.027 0.05562006-12-16 17:35:00
16/12/200617:36:000.394  0.905 0.008440.398 NaN0.027 0     2006-12-16 17:36:00
16/12/200617:37:000.4    0.754 0.0081 0.404 NaN0.05410.05562006-12-16 17:37:00
16/12/200617:38:000.242  0.799 0.0179 0.253 NaN0.027 0.05562006-12-16 17:38:00
16/12/200617:39:000.155  0.534 0.0259 0.151 NaN0     0.05562006-12-16 17:39:00
16/12/200617:40:000.14   0.288 0.0241 0.139 NaN0     0.05562006-12-16 17:40:00
16/12/200617:41:000.161  0.295 0.0255 0.157 NaN0     0.05562006-12-16 17:41:00
16/12/200617:42:000.14   0     0.0258 0.139 NaN0     0.111 2006-12-16 17:42:00
16/12/200617:43:000.2    0     0.0204 0.217 NaN0     0.05562006-12-16 17:43:00
16/12/200617:44:000.481  0     0.007180.488 NaN0     0     2006-12-16 17:44:00
16/12/200617:45:000.716  0     0      0.723 NaN0     0.05562006-12-16 17:45:00
16/12/200617:46:000.628  0     0.005160.645 NaN0     0     2006-12-16 17:46:00
16/12/200617:47:000.387  0     0.0135 0.386 NaN0     0.05562006-12-16 17:47:00
16/12/200617:48:000.297  0     0.0167 0.307 NaN0     0.05562006-12-16 17:48:00
16/12/200617:49:000.138  0     0.0238 0.133 NaN0     0.05562006-12-16 17:49:00
16/12/200617:50:000.136  0     0.0204 0.133 NaN0     0.05562006-12-16 17:50:00
16/12/200617:51:000.135  0     0.0194 0.133 NaN0     0.05562006-12-16 17:51:00
16/12/200617:52:000.139  0     0.0189 0.139 NaN0     0.05562006-12-16 17:52:00
16/12/200617:53:000.128  0     0.018  0.127 NaN0     0.05562006-12-16 17:53:00
16/12/200617:54:000.069  0     0.0171 0.0723NaN0     0.05562006-12-16 17:54:00
16/12/200617:55:000.204  0.144 0.0134 0.217 NaN0     0.05562006-12-16 17:55:00
16/12/200617:56:000.28   0.17  0.0117 0.277 NaN0     0     2006-12-16 17:56:00
16/12/200617:57:000.302  0     0.0111 0.301 NaN0     0.05562006-12-16 17:57:00
16/12/200617:58:000.243  0.379 0.0155 0.253 NaN0     0.05562006-12-16 17:58:00
16/12/200617:59:000.0369 0.11  0.025  0.0361NaN0     0.05562006-12-16 17:59:00
16/12/200618:00:000.0781 0.341 0.0274 0.0783NaN0     0.111 2006-12-16 18:00:00
16/12/200618:01:000.0566 0.273 0.0303 0.0542NaN0     0.05562006-12-16 18:01:00
16/12/200618:02:000.0758 0.223 0.0306 0.0723NaN0     0.05562006-12-16 18:02:00
16/12/200618:03:000.201  0.205 0.025  0.217 NaN0.432 0.111 2006-12-16 18:03:00
16/12/200618:04:000.356  0.383 0.0169 0.355 NaN1     0     2006-12-16 18:04:00
16/12/200618:05:000.501  0.364 0.008180.512 NaN1     0.05562006-12-16 18:05:00
16/12/200618:06:000.592  0.352 0.004780.596 NaN0.973 0.05562006-12-16 18:06:00
16/12/200618:07:000.556  0.273 0.003650.56  NaN1     0     2006-12-16 18:07:00
16/12/200618:08:000.535  0.22  0.005330.536 NaN0.973 0.05562006-12-16 18:08:00
16/12/200618:09:000.295  0.258 0.0154 0.295 NaN1     0     2006-12-16 18:09:00
16/12/200618:10:000.157  0.28  0.0219 0.175 NaN0.595 0.111 2006-12-16 18:10:00
16/12/200618:11:000.117  0.288 0.0256 0.139 NaN0.324 0.05562006-12-16 18:11:00
16/12/200618:12:000.2    0.273 0.0201 0.217 NaN0.73  0.05562006-12-16 18:12:00
16/12/200618:13:000.0156 0.303 0.0271 0.012 NaN0.027 0.05562006-12-16 18:13:00
16/12/200618:14:000.026  0.299 0.0264 0.0241NaN0.027 0.05562006-12-16 18:14:00
16/12/200618:15:000.313  0.189 0.0137 0.367 NaN0.541 0.05562006-12-16 18:15:00
16/12/200618:16:000.303  0.144 0.0135 0.313 NaN0.243 0.05562006-12-16 18:16:00
16/12/200618:17:000.261  0.155 0.014  0.259 NaN0.027 0.05562006-12-16 18:17:00
16/12/200618:18:000.296  0     0.009690.301 NaN0.027 0     2006-12-16 18:18:00
16/12/200618:19:000.0862 0     0.0194 0.0843NaN0.027 0.05562006-12-16 18:19:00
16/12/200618:20:000.096  0     0.0179 0.0964NaN0.027 0.05562006-12-16 18:20:00
16/12/200618:21:000.0976 0     0.0212 0.0964NaN0.05410.05562006-12-16 18:21:00
16/12/200618:22:000.0968 0     0.019  0.0964NaN0.027 0.05562006-12-16 18:22:00
16/12/200618:23:000.0958 0     0.0197 0.0964NaN0.027 0.05562006-12-16 18:23:00
16/12/200618:24:000.164  0     0.0177 0.181 NaN0.027 0.05562006-12-16 18:24:00
16/12/200618:25:000.348  0     0.0116 0.349 NaN0.027 0.05562006-12-16 18:25:00
16/12/200618:26:000.348  0     0.012  0.349 NaN0.027 0.05562006-12-16 18:26:00
16/12/200618:27:000.348  0     0.0118 0.349 NaN0.027 0.05562006-12-16 18:27:00
16/12/200618:28:000.128  0     0.019  0.139 NaN0.027 0.05562006-12-16 18:28:00
16/12/200618:29:000.095  0     0.0204 0.0964NaN0.027 0.05562006-12-16 18:29:00
16/12/200618:30:000.0963 0     0.0217 0.0964NaN0.027 0.05562006-12-16 18:30:00
16/12/200618:31:000.094  0.09470.0203 0.0964NaN0.027 0.05562006-12-16 18:31:00
16/12/200618:32:000.0545 0.09850.0186 0.0542NaN0.027 0.05562006-12-16 18:32:00
16/12/200618:33:000.0683 0.307 0.0161 0.0723NaN0     0.05562006-12-16 18:33:00
16/12/200618:34:000.175  0.163 0.0117 0.193 NaN0.027 0     2006-12-16 18:34:00
16/12/200618:35:000.504  0     0.0063 0.518 NaN0.73  0.05562006-12-16 18:35:00
16/12/200618:36:000.305  0     0.0107 0.307 NaN0.027 0.05562006-12-16 18:36:00
16/12/200618:37:000.288  0     0.005620.289 NaN0.027 0     2006-12-16 18:37:00
16/12/200618:38:000.094  0.09090.0128 0.114 NaN0.027 0.05562006-12-16 18:38:00
16/12/200618:39:000.0179 0.102 0.0159 0.0181NaN0.027 0.05562006-12-16 18:39:00
16/12/200618:40:000.009860.102 0.0155 0.012 NaN0.027 0.05562006-12-16 18:40:00
16/12/200618:41:000.0106 0.102 0.018  0.012 NaN0.027 0.05562006-12-16 18:41:00
16/12/200618:42:000.009080.102 0.0174 0.012 NaN0.027 0.05562006-12-16 18:42:00
16/12/200618:43:000      0.129 0.0202 0     NaN0.027 0.05562006-12-16 18:43:00
16/12/200618:44:000.103  0.314 0.0161 0.12  NaN0.027 0.05562006-12-16 18:44:00
16/12/200618:45:000.261  0.33  0.0143 0.259 NaN0.027 0.05562006-12-16 18:45:00
16/12/200618:46:000.262  0.352 0.0135 0.259 NaN0.027 0     2006-12-16 18:46:00
16/12/200618:47:000.263  0.337 0.0126 0.265 NaN0.027 0.05562006-12-16 18:47:00
16/12/200618:48:000.0776 0.356 0.0168 0.0843NaN0.05410.05562006-12-16 18:48:00
16/12/200618:49:000.0457 0.167 0.0155 0.0482NaN0.108 0.05562006-12-16 18:49:00
16/12/200618:50:000.04   0.152 0.0123 0.0422NaN0.08110.05562006-12-16 18:50:00
16/12/200618:51:000.0192 0.133 0.0106 0.0241NaN0.027 0     2006-12-16 18:51:00
16/12/200618:52:000.0174 0     0.0103 0.0181NaN0     0.05562006-12-16 18:52:00
16/12/200618:53:000.0337 0     0.0112 0.0422NaN0.027 0.05562006-12-16 18:53:00
16/12/200618:54:000.274  0     0.005920.277 NaN0.027 0     2006-12-16 18:54:00
16/12/200618:55:000.265  0.17  0.005330.271 NaN0.027 0.05562006-12-16 18:55:00
16/12/200618:56:000.265  0.17  0.005620.271 NaN0.05410     2006-12-16 18:56:00
16/12/200618:57:000.225  0.159 0.0076 0.235 NaN0.027 0.05562006-12-16 18:57:00
16/12/200618:58:000.263  0.17  0.004660.265 NaN0.027 0.05562006-12-16 18:58:00
16/12/200618:59:000.264  0.17  0.004110.271 NaN0.027 0     2006-12-16 18:59:00
16/12/200619:00:000.244  0.167 0.004240.247 NaN0.027 0.05562006-12-16 19:00:00
16/12/200619:01:000.185  0.17  0.005790.193 NaN0.05410     2006-12-16 19:01:00
16/12/200619:02:000.165  0.17  0.007260.169 NaN0.027 0.05562006-12-16 19:02:00
16/12/200619:03:000.162  0.17  0.004320.169 NaN0.027 0     2006-12-16 19:03:00
57 / 67

Running an LSTM in R

savebest = keras::callback_early_stopping(restore_best_weights = T, patience = 6)
opt = optimizer_adam()
model <- keras_model_sequential() %>%
layer_conv_1d(input_shape = c(train_window, features), filters=21, kernel_size=1, strides = 1, activation = 'relu',name = 'conv-1d-1', padding = 'same') %>%
#layer_batch_normalization(name = 'batchnorm') %>%
#layer_activation_relu() %>%
layer_lstm(21, name = 'lstm_layer', return_sequences = T, stateful = F) %>%
layer_lstm(10, input_shape = c(train_window, features), name = 'lstm_layer_2', stateful = F) %>%
layer_dense(10, activation = "linear", name = 'outputlayer')
58 / 67

LSTM in Keras

summary(model)
59 / 67

Running an LSTM in R

compile(model, loss = 'MSE', optimizer = opt)
#normally - we'd want to reset_states(), and use a 'stateful' LSTM - then manually loop through the splits, but R-studio HATES that.
#Some debugging gets this to work with some effort as below. We also would prefer a layernorm layer
for(epoch in c(1:20)){
print(paste("Beginning epoch #", epoch))
keras::fit(model, x = data_prep_tf, y = data_prep_y_tf, epochs = 1, batch_size = 10,
validation_data = list(x = data_test_tf,y = data_test_y_tf),
shuffle=F, callbacks = savebest)
model %>% reset_states()
}
model %>% save_model_hdf5("/Users/connor/Desktop/GithubProjects/Econometrics/524/EC524W20/lab/005-Perceptrons_and_NeuralNets/lstm_model_sf.hdf5")
60 / 67

61 / 67

Other types of Networks

What if we literally don't know anything about the data, just how each object relates to one another (think DNA structures)?

62 / 67

Other types of Networks

What if we literally don't know anything about the data, just how each object relates to one another (think DNA structures)?

GCN: Graph Convolutional Networks. These read in graph data, and using the techniques you saw for CNNs, can predict new patterns.

62 / 67

Other types of Networks

Oh yeah, remember this?

63 / 67

Other types of Networks

Oh yeah, remember this?

That was generated by a neural network. Called a GAN or Generative Adversarial Network, it is trained by having two neural networks duke it out.

  • One of the networks tries to imitate the hand-drawn pictures

  • The other network tries to detect compute-generated pictures.

This model is used in the often cited deep-fake videos.

63 / 67

the only good use of a deepfake

Imagine Nicholas Cage was in every movie, playing every part.

deepfake1
64 / 67

Make Bob Ross Nightmare Fuel

backprop

The only difference between these is what data the model is trained on.

65 / 67

Make Bob Ross Nightmare Fuel

backprop

The only difference between these is what data the model is trained on.

But you basically understand how to do this yourself now.

65 / 67

GAN training process:

66 / 67

GAN training process:

Just by looking at the diagram for a while, and learning how convolutional neural nets work, you could figure this out.

66 / 67

With great power...

These models are powerful.

67 / 67

With great power...

These models are powerful.

However, they aren't interpretable (yet, and they're getting MUCH better at this every year.)

67 / 67

With great power...

These models are powerful.

However, they aren't interpretable (yet, and they're getting MUCH better at this every year.)

They also use hundreds of thousands of parameters for even very simple models.

67 / 67

With great power...

These models are powerful.

However, they aren't interpretable (yet, and they're getting MUCH better at this every year.)

They also use hundreds of thousands of parameters for even very simple models.

That means you have to be super careful in how you evaluate them.

67 / 67

With great power...

These models are powerful.

However, they aren't interpretable (yet, and they're getting MUCH better at this every year.)

They also use hundreds of thousands of parameters for even very simple models.

That means you have to be super careful in how you evaluate them.

67 / 67

Deep Learning and Neural Nets

2 / 67
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow