Published 2022. 7. 2. 00:10

[1.4.] Deep Neural Networks

인공지능/DLS

Deep L-layer Neural Network

[what is a deep neural network?]

- logistic regression 은 매우 'shallow model' 이다

- layer 개수 셀 때 input layer 는 포함하지 않음

[notation]

- $L = 4$ (레이어 개수)

- $n^{[l]}$ (레이어 $l$에 있는 unit 개수)

- $n^{[1]} = 5$, $n^{[2]} = 5$, $n^{[3]} = 3$, $n^{[4]} = n^{[L]} = 1$

- $n^{[0]} = n_{x} = 3$

- $a^{[l]} = g^{[l]}(z^{[l]})$ (레이어 $l$에 있는 activations)

- $W^{[l]}$ = weights for $z[l]$

- input features $x = a^{[0]}$ (activations of layer 0)

- $\hat{y} = a^{[L]}$

Forward Propagation in a Deep Network

given single training example $x = a^{[0]}$

- 첫번째 레이어

$z^{[1]} = w^{[1]}x + b^{[1]} = w^{[1]}a^{[0]} + b^{[1]}$

$a^{[1]} = g^{[1]}(z^{[1]})$

- 두번째 레이어

$z^{[2]} = w^{[2]}a^{[1]} + b^{[2]}$

$a^{[2]} = g^{[2]}(z^{[2]})$

...

- 네번째 레이어

$z^{[4]} = w^{[4]}a^{[3]} + b^{[4]}$

$a^{[4]} = g^{[4]}(z^{[4]}) = \hat{y}$

--> 일반화

$$z^{[l]} = w^{[l]}a^{[l-1]} + b^{[l]}$$

$$a^{[l]} = g^{[l]}(z^{[l]})$$

--> vectorized

$$Z^{[1]} = W^{[1]}X + b^{[1]} = W^{[1]}A^{[0]} + b^{[1]}$$

$$A^{[1]} = g^{[1]}(Z^{[1]})$$

$$Z^{[2]} = W^{[2]}A^{[1]} + b^{[2]}$$

$$A^{[2]} = g^{[2]}(Z^{[2]})$$

...

$$\hat{Y} = g(Z^{[4]}) = A^{[4]}$$

--> layer마다 for-loop이 가능함: $l = 1 \ldots L$

$X^{[l]}$, $A^{[l]}$는 training examples에 대한 값들을 왼쪽에서 오른쪽으로 column으로 세운 vector

Getting your Matrix Dimensions Right

Why Deep Representations?

왜 깊은 신경망이 잘 될까? 깊은 신경망이란?

- 얼굴 이미지를 입력했을 때,

- 첫번째 레이어 : feature detector or edge detector (각 unit이 픽셀을 그룹화해서 특정 방향의 edges를 찾아낼 수 있음)

- 그 다음 레이어 : edges 를 그룹화 해서 얼굴의 부위를 형성함

- 그 다음 레이어 : 얼굴의 부위들을 그룹화해서 다양한 얼굴을 detect할 수 있음

- simple to complex hierarchical / compositional representation

- 오디오 클립 등등 다양하게 적용 가능

혹은 .. circuit theory

Building Blocks of Deep Neural Networks

색칠한 레이어에서의 계산을 자세히 들여다보자

- layer $l$ : $w^{[l]}$, $b^{[l]}$

- forward propagation

- Input $a^{[l-1]}$, output $a^{[l]}$

- $z^{[l]} = w^{[l]}a^{[l-1]} + b^{[l]}$

- $a^{[l]} = g^{[l]}(z^{[l]})$

- cache $z^{[l]}$ --> store해두면 나중에 역전파 할 때 활용

- back propagation -> backward function

- Input $da^{[l]}$, output $da^{[l-1]}$

- Input에 대해서 cache($z^{[l]}$), 그리고 output에 대해서 $dw^{[l]}$, $db^{[l]}$

- 각 활성화값에 대한 미분값이 주어졌을 때 ....

[Forward and backward functions]

Forward and Backward Propagation

[forward]

- Input $a^{[l-1]}$

- output $a^{[l]}$, cache($z^{[l]}$) + $w^{[l]}$, $b^{[l]}$ 도 저장

- $z^{[l]} = w^{[l]} \cdot a^{[l-1]} + b^{[l]}$

- $a^{[l]} = g^{[l]}(z^{[l]})$

--> vectorized

- $Z^{[l]} = W^{[l]}A^{[l-1]} + b^{[l]}$ <- $b^{[l]}$는 broadcasting 될 것

- $A^{[l]} = g^{[l]}(z^{[l]})$

- 첫 input은 $A^{[l]}$ , 즉 $x$

[back]

- Input $da^{[l]}$

- output $da^{[l-1]}$, $dW^{[l]}$, $db^{[l]}$

- $dz^{[l]} = da^{[l]} \ast g^{[l]} \prime(z^{[l]})$

- $dw^{[l]} = dz^{[l]} \cdot {a^{[l-1]}}^{T}$ <-- cache 에 ${a^{[l-1]}}^{T}$ 포함하지 않았지만 이것도 필요

- $db^{[l]} = dz^{[l]}$

- $da^{[l-1]} = {w^{[l]}}^{T} \cdot dz^{[l]}$

- $da^{[l-1]}$을 첫 식의 $dz^{[l]}$에 넣는다고 하면

- $dz^{[l]} = {w^{[l+1]}}^{T} \cdot dz^{[l+1]} \ast g^{[l]}\prime(z^{[l]})$

--> vectorized

- $dZ^{[l]} - dA^{[l]} \ast g^{[l]} \prime (Z^{[l]})$

- $dW^{[l]} = \frac{1}{m}dZ^{[l]} \cdot {A^{[l-1]}}^{T}$

- $db^{[l]} = \frac{1}{m}np.sum(dZ^{[l]}, axis=1, keepdims=True)$

- $dA^{[l-1]} = {W^{[l]}}^{T} \cdot dZ^{[l]}$

- forward recursion의 경우 input $x$로 initialize 했음 -> 그렇다면 backward recursion은?

- logistic regression에서 보듯 $da^{[l]} = -\frac{y}{a} + \frac{(1-y)}{(1-a)}$

- vectorized 버전이라면 $dA^{[L]} = (- \frac{y^{(1)}}{a^{(1)}} + \frac{(1-y^{(1)})}{(1-a^{(1)})},\ \ldots,\ -\frac{y^{(m)}}{a^{(m)}} + \frac{(1-y^{(m)})}{(1-a^{(m)})})$

Parameters vs Hyperparameters

parameters: $W^{[1]}$, $b^{[1]}$, $W^{[2]}$, $b^{[2]}$, $W^{[3]}$, $b^{[3]}$, $\ldots$

Hyperparameters: learning rate $\alpha$, the number of iterations, the number of hidden layers $L$, the number of hidden units $n^{[1]}$, $n^{[2]}$, $\ldots$, choice of activation function

data : momentum, minibatch size, regularization ...

impirical 하게 하이퍼파라미터 적용하게 됨

best value 는 얼마든지 달라질 수 있다

What does this have to do with the brain?

QUIZ

'인공지능 > DLS' 카테고리의 다른 글

[2.1.] Setting up your Machine Learning Application (0)	2022.07.03
강좌1 정리 노트 (0)	2022.07.03
[1.3.] Shallow Neural Networks(2) (0)	2022.06.27
[1.3.] Shallow Neural Networks(1) (0)	2022.06.26
[1.2.] Neural Networks Basics - Python and Vectorization (0)	2022.06.26

[1.4.] Deep Neural Networks

Deep L-layer Neural Network

Forward Propagation in a Deep Network

Getting your Matrix Dimensions Right

Why Deep Representations?

Building Blocks of Deep Neural Networks

Forward and Backward Propagation

Parameters vs Hyperparameters

What does this have to do with the brain?

QUIZ

'인공지능 > DLS' 카테고리의 다른 글

티스토리툴바