논문리뷰) Neural Networks for Machine Learning

Notice

Recent Posts

Recent Comments

Link

« 2024/05 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

얼레벌레

논문리뷰) Neural Networks for Machine Learning 본문

AI/DL

논문리뷰) Neural Networks for Machine Learning

낭낭이 2023. 7. 19. 23:10

Overview of mini-batch gradient descent

The error surface for a linear neuron

* error surface는 각 weight를 horizontal axis, error를 vertical axis로 두는 공간에 있음

=> quadratic bowl모양으로, vertical cross-section은 parabolas(포물선), horizontal cross-section은 ellipses(타원)형으로 나온다.

* multi-layer이며 non-linear인 net들의 error surface는 더 복잡하지만 보통 quadratic bowl로 좋은 예측이 가능하다.

* Learning rate가 클 경우 진폭이 튀게된다

-> 빠르게 움직이지만 작고 일정한 gradients로 움직이거나

-> 느리게 움직이지만 크고 일정하지않은 gradient로 움직이거나

Stochastic gradient descent

- full gradient를 계산하는 것이 아닌 mini-batch를 사용하여 계속해서 업데이트 해나가는 방법

- online: 각 case마다 weights를 갱신하는 극단적인 방법

- mini-batch가 보통 online보다 낫고 weights를 갱신하는데 less computation

A basic mini-batch gradient descent algorithm

- 초기 LR 추측 -> error가 나빠지거나 크게 튀면 learning rate를 줄이고, error가 일정하게 떨어지지만 느리다면 learning rate 높이기

- error이 더이상 떨어지지 않을 때 learning rate 낮추기 -> seperate validation set에 에러 적용

A bag of tricks for mini-batch gradient descent

Be careful about turning down the learning rate
Initializing the weights
Shifting the inputs
Scaling the inputs
Decorrelate the input components

1. Learning rate 조절 시 유의

- turning down할 경우 에러의 random flunctuation을 줄여 다른방법보다 효율적이지만 학습이 느려진다

- 따라서 learning rate를 너무 빠르게 turning down하는 것은 위험

2. Initializing the weights

- two hidden units가 동일한 bias, incoming/outgoing weights를 가지면 항상 동일한 gradient를 얻게 되는데 weights를 initializing하여 symmetry 제거가능

- big fan-in(수용 가능한 최대 입력수)를 가지는 경우 작은 변화도 overshoot 학습을 야기할 수 있으므로 fan-in이 클 경우 incoming weights를 작게(proportional to sqrt(fan-in)) 만듦

3. Shifting the inputs

- 경사가 가파른 경우 input를 shifting함으로써 입력 벡터의 각 요소를 변형시켜 whole training set에서 zero mean을 가지게 할 수 있음

- hyperbolic tangent로 zero mean에 가까운 hidden activation 가능

4. Scaling the inputs

- 경사가 가파른 경우 input를 scaling함으로써 입력 벡터의 각 요소를 변형시켜 whole training set에서 unit variance를 가지게 할 수 있음

5. Decorrelate the input components

- 각각의 component를 decorrelate함으로써 효율적인 방법을 구사할 수 있다.

- Principal Components Analysis(PCA)

* smallest eigenvalue를 가진 주성분을 drop시켜 차원축소 가능

* 남은 주성분을 eigenvalue의 루트로 나눔으로써 elliptical error surface를 circular one으로 전환

=> circular error surface에선 gradient points가 minimum으로 바로 향하게 된다.

이외 mini-batch learning의 속도를 높이는 방법

- use momentum -> velocity를 바꾸는 데 gradient 사용

- 각 파라미터마다 적합한 LR 사용

- rmsprop

- curvature information에 적합한 다른 방법들 ..

The momentum method

A seperate, adaptive learning rate for each connection

rmsprop: Divide the gradient by a running average of its recent magnitude

저작자표시 비영리 변경금지

'AI > DL' 카테고리의 다른 글

논문리뷰) Deep Residual Learning for Image Recognition (0)	2023.08.03
논문리뷰) Batch Normalization (0)	2023.07.27
[논문리뷰] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (0)	2021.09.30
cs231n lecture 11 Detection and Segmentation (0)	2021.08.20
cs231n lecture 10 Recurrent Neural Networks (0)	2021.08.20

'AI/DL' Related Articles

Comments

얼레벌레

논문리뷰) Neural Networks for Machine Learning 본문

논문리뷰) Neural Networks for Machine Learning

Overview of mini-batch gradient descent

A bag of tricks for mini-batch gradient descent

The momentum method

A seperate, adaptive learning rate for each connection

rmsprop: Divide the gradient by a running average of its recent magnitude

'AI > DL' 카테고리의 다른 글

티스토리툴바