Published 2022. 7. 6. 20:44

[2.1.] Setting Up your Optimization Problem(2)

인공지능/DLS

Numerical Approximation of Gradients

gradient checking

[Checking your derivative computation]

$f(\theta) = \theta^{3}$

$\theta = 1$, $\theta + \epsilon$, $\theta - \epsilon $

이때 $\epsilon = 0.01$ (매우 작은 값)

$\theta$ ~ $\theta + \epsilon$ 의 삼각형을 구하는 것보다, $\theta - \epsilon$ ~ $\theta + \epsilon$의 삼각형을 구하는 것이 더 낫다

큰 삼각형 안의 작은 삼각형 두 개를 고려하여, one sided difference가 아닌 "two sided difference"를 구하게 되는 것이다

<큰 삼각형에서 구하기>

높이: $f(\theta + \epsilon) - f(\theta - \epsilon)$

밑변: $2\epsilon$

$$\frac{f(\theta + \epsilon) - f(\theta - \epsilon)}{2\epsilon} \approx g(\theta)$$

실제로 매우 근사한 것을 알 수 있음

여기서 approx error 는 $0.0001$ (이전에 계산했을 때에는 $0.03$이었음)

---

이러한 방법을 back prop을 grade, check 하는 데 사용한다면, one sided difference보다 2배 걸리지만 더 정확함

<보충 설명> - 미적분에 익숙한 사람들만 이해해도 괜찮다고 함

$f\prime (\theta) = \lim_{\epsilon \rightarrow 0} \frac{f(\theta + \epsilon) - f(\theta - \epsilon)}{2\epsilon}$ 일 때, $O({\epsilon}^{3})$

$f\prime (\theta) = \lim_{\epsilon \rightarrow 0} \frac{f(\theta + \epsilon) - f(\epsilon)}{\epsilon}$ 일 때, $O({\epsilon})$

이때 $\epsilon$이 $0$보다 작은 값이므로 $O({\epsilon}^{3})$가 더 작음

Gradient Checking

to debug/verify back prop

[Gradient check for a neural network]

Take matrix $W^{[1]}$, $b^{[1]}$, ..., $W^{[L]}$, $b^{[L]}$ and reshape into a big vector $\theta$

1) reshape

2) concatenate

3) get the giant vector $\theta$

따라서 이제 cost function $J(W^{[1]}, b^{[1]}, \ldots, W^{[L]}, b^{[L]})$는 $J(\theta)$

Take matrix $dW^{[1]}$, $db^{[1]}$, ..., $dW^{[L]}$, $db^{[L]}$ and reshape into a big vector $d\theta$

*$dW^{[l]}$ 는 $W^{[l]}$와 동일한 shape 가지고 있음

1) reshape

2) concatenate

3) get the giant vector $d\theta$ ($\theta$와 동일한 shape)

To verify, Is $d\theta$ the gradient of $J(\theta)$?

[Gradient check or Grad check]

$J(\theta) = J({\theta}_{1}, {\theta}_{2}, {\theta}_{3}, \ldots)$

for each $i$:

$$d\theta_{approx}[i] = \frac{ J({\theta}_{1}, {\theta}_{2}, \dots, {\theta}_{i}+\epsilon, \ldots) - J( {\theta}_{1}, {\theta}_{2}, \dots, {\theta}_{i}-\epsilon, \ldots )}{2\epsilon}$$

$$\approx d\theta[i] = \frac{\partial J}{\partial {\theta}_{i}}$$

$d\theta_{approx} \approx d\theta$ ?

충분히 두 값이 비슷하다는 것을 정의하는 방법은?

유클리디안 거리를 구한 후 유클리디안 거리로 정규화, 이렇게 나온 값은 일종의 ratio 가 됨

Check:

$\theta = 10^{-7}$ 일 때, 이상적인 경우

$$\frac{\| d\theta_{approx} - d\theta \|_{2}}{\| d\theta_{approx}\|_{2} + \| d\theta\|_{2}} \approx 10^{-7}$$

$10^{-5}$ 정도는 주의, $10^{-3}$ 정도면 버그가 있는지 확인할 것

Gradient Checking Implementation Notes

grad check implementation

- Don't use in training - only to debug! (오래 걸리는 작업임)

- grad check 통과하지 못하면 component 를 확인할 것

- $i$의 vlaue 를 체크

- remember regularization

- $J$에 더해지는 regularization term 을 유의

- doesn't work with dropout

- keep.dim 을 $1.0$으로 설정할 것

- run at random initialization; perhaps again after some training

- $W,\ b \approx 0$

'인공지능 > DLS' 카테고리의 다른 글

[2.2.] Optimization Algorithms(2) (0)	2022.07.12
[2.2.] Optimization Algorithms(1) (0)	2022.07.11
[2.1.] Setting Up your Optimization Problem(1) (0)	2022.07.04
[2.1.] Regularizing your Neural Network (0)	2022.07.03
[2.1.] Setting up your Machine Learning Application (0)	2022.07.03

[2.1.] Setting Up your Optimization Problem(2)

Numerical Approximation of Gradients

Gradient Checking

Gradient Checking Implementation Notes

'인공지능 > DLS' 카테고리의 다른 글

티스토리툴바