• 09-4. Batch Normalization

    2020. 3. 12.

    by. ํ•ด๋Š”์„ 

    ๋ณธ ๊ธ€์€ '๋ชจ๋‘๋ฅผ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ์‹œ์ฆŒ 2'์™€ 'pytorch๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ์ž…๋ฌธ'์„ ๋ณด๋ฉฐ ๊ณต๋ถ€ํ•œ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค.

    ํ•„์ž์˜ ์˜๊ฒฌ์ด ์„ž์—ฌ ๋“ค์–ด๊ฐ€ ๋ถ€์ •ํ™•ํ•œ ๋‚ด์šฉ์ด ์กด์žฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


    0. Gradient Vanishing / Gradient Exploding

    • Gradient Vanishing : sigmid function์„ ์‚ฌ์šฉํ•  ๋•Œ ์ƒ๊ธฐ๋Š” ๋ฌธ์ œ์ . (๊ธฐ์šธ๊ธฐ ์†Œ์‹ค) 
    • Gradient Exploding : ๊ธฐ์šธ๊ธฐ๊ฐ€ ๋„ˆ๋ฌด ํฌ๊ฒŒ ๊ณ„์‚ฐ์ด ๋˜์„œ ์ƒ๊ธฐ๋Š” ๋ฌธ์ œ์ . (Nand ๊ฐ’์ด ๋‚˜์˜ด)

    ์ด๋Ÿฐ ๋ฌธ์ œ์ ๋“ค์€ ์–ด๋–ป๊ฒŒ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์„๊นŒ?

    1. ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ๋ณ€๊ฒฝ
    2. ์‹ ์ค‘ํ•œ ์ดˆ๊ธฐํ™” (Xavier, he initialization)
    3. ์ž‘์€ learing rate๋ฅผ ์‚ฌ์šฉ

     

    ์•—! ํ•˜์ง€๋งŒ ์œ„์˜ ํ•ด๊ฒฐ์ฑ…๋“ค์„ ์“ฐ์ง€ ์•Š๊ณ , ๋ณธ์งˆ์ ์œผ๋กœ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค?

     

    1. Internal Covariate Shift (ICS)

    ์œ„์˜ ๋ฌธ์ œ๋“ค์˜ ๊ทผ๋ณธ์ ์ธ ์›์ธ์ด๋‹ค.

     

    Covariate : ๊ณต๋ณ€๋Ÿ‰

    ์—ฌ๋Ÿฌ ๋ณ€์ˆ˜๋“ค์ด ๊ณตํ†ต์ ์œผ๋กœ ํ•จ๊ป˜ ๊ณต์œ ํ•˜๋Š” ๋ณ€๋Ÿ‰. 

     

    Covariate shift : ๊ณต๋ณ€๋Ÿ‰ ๋ณ€ํ™”

    ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ์™€ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๊ฐ€ ๋‹ค๋ฅธ ๊ฒฝ์šฐ.

     

    Internal Covariate Shift : ๋‚ด๋ถ€ ๊ณต๋ณ€๋Ÿ‰ ๋ณ€ํ™”

    ์‹ ๊ฒฝ๋ง ์ธต ์‚ฌ์ด์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ ๋ณ€ํ™”.

     

     ํ•™์Šตํ•˜๋Š” ๋™์•ˆ ์ด์ „ ๋ ˆ์ด์–ด์—์„œ์˜ ๊ฐ€์ค‘์น˜ ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ๋ณ€ํ•จ์— ๋”ฐ๋ผ ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ์ถœ๋ ฅ๊ฐ’์˜ ๋ถ„ํฌ๊ฐ€ ๋ณ€ํ™”๋Š” ๋ฌธ์ œ๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋Š” ํ›ˆ๋ จ์„ ๋Šฆ์ถ”๊ณ  ๋‚ฎ์€ ํ•™์Šต๋ฅ ์˜ ์›์ธ์ด ๋œ๋‹ค. ํ•œ ๋ ˆ์ด์–ด ๋งˆ๋‹ค ์ž…์ถœ๋ ฅ์„ ๊ฐ๊ฐ ๊ฐ€์ง€๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ ˆ์ด์–ด ๋งˆ๋‹ค covariate shift๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋ ˆ์ด์–ด๊ฐ€ ๊นŠ์„์ˆ˜๋ก ๋” ์ปค์ง€๊ฒŒ ๋œ๋‹ค.

     

     ์ด์ „ ์ธต๋“ค์˜ ํ•™์Šต์— ์˜ํ•ด ์ด์ „ ์ธต์˜ ๊ฐ€์ค‘์น˜ ๊ฐ’์ด ๋ฐ”๋€Œ๊ฒŒ ๋˜๋ฉด, ํ˜„์žฌ ์ธต์— ์ „๋‹ฌ๋˜๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๊ฐ€ ํ˜„์žฌ ์ธต์ด ํ•™์Šตํ–ˆ๋˜ ์‹œ์ ์˜ ๋ถ„ํฌ์™€ ์ฐจ์ด๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค. ๋ฐฐ์น˜ ์ •๊ทœํ™”๋ฅผ ์ œ์•ˆํ•œ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค/ํญ์ฃผ ๋“ฑ์˜ ๋”ฅ ๋Ÿฌ๋‹ ๋ชจ๋ธ์˜ ๋ถˆ์•ˆ์ „์„ฑ์ด ์ธต๋งˆ๋‹ค ์ž…๋ ฅ์˜ ๋ถ„ํฌ๊ฐ€ ๋‹ฌ๋ผ์ง€๊ธฐ ๋•Œ๋ฌธ์ด๋ผ๊ณ  ์ฃผ์žฅํ•œ๋‹ค.

    ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ œ์•ˆ๋œ ํ•ด๊ฒฐ์ฑ…์ด ๋ฐฐ์น˜ ์ •๊ทœํ™”(Batch Normalization)๋‹ค.

     

    2. ๋ฐฐ์น˜ ์ •๊ทœํ™” (Batch Normalization)

     ๋ง ๊ทธ๋Œ€๋กœ ๋ฐฐ์น˜ ๋‹จ์œ„๋กœ ์ •๊ทœํ™” ํ•œ๋‹ค๋Š” ์˜๋ฏธ๋‹ค. ๋ฐฐ์น˜ ์ •๊ทœํ™”๋Š” ๊ฐ ๋ ˆ์ด์–ด์˜ ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ์ถœ๋ ฅ๊ฐ’ ๋ถ„ํฌ๊ฐ€ ๊ณ ๋ฃจ ๋ถ„ํฌ๋˜๋„๋ก '๊ฐ•์ œ'ํ•œ๋‹ค. ๊ทธ๋ž˜์„œ ์ด๋ ‡๊ฒŒ ์ถœ๋ ฅ๊ฐ’๋“ค์ด ์ •๊ทœ ๋ถ„ํฌ๋ฅผ ์ด๋ฃจ๋„๋ก ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. 

     ๋ฐฐ์น˜ ์ •๊ทœํ™”๋Š” ๊ฐ ์ธต์—์„œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋ฅผ ํ†ต๊ณผํ•˜๊ธฐ ์ „์— ์ˆ˜ํ–‰๋œ๋‹ค.

     

    1. ์ž…๋ ฅ์— ๋Œ€ํ•ด ํ‰๊ท ์„ 0์œผ๋กœ ๋งŒ๋“ค๊ณ , ์ •๊ทœํ™”.
    2. ์ •๊ทœํ™” ๋œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ์Šค์ผ€์ผ๊ณผ ์‹œํ”„ํŠธ๋ฅผ ์ˆ˜ํ–‰. (์ด ๋•Œ ๋‘ ๊ฐœ์˜ ๋งค๊ฐœ๋ณ€์ˆ˜ γ์™€ β๋ฅผ ์‚ฌ์šฉํ•˜๋Š”๋ฐ, γ๋Š” ์Šค์ผ€์ผ์„ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๊ณ , β๋Š” ์‹œํ”„ํŠธ๋ฅผ ํ•˜๋Š” ๊ฒƒ์— ์‚ฌ์šฉํ•˜๋ฉฐ ๋‹ค์Œ ๋ ˆ์ด์–ด์— ์ผ์ •ํ•œ ๋ฒ”์œ„์˜ ๊ฐ’๋“ค๋งŒ ์ „๋‹ฌ๋˜๊ฒŒ ํ•จ.)

     ๋ฐฐ์น˜ ์ •๊ทœํ™”๋Š” ํ•™์Šต ์‹œ ๋ฐฐ์น˜ ๋‹จ์œ„์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ๋“ค์„ ์ฐจ๋ก€๋Œ€๋กœ ๋ฐ›์•„ ์ด๋™ ํ‰๊ท ๊ณผ ์ด๋™ ๋ถ„์‚ฐ์„ ์ €์žฅํ•ด๋†“์•˜๋‹ค๊ฐ€ ํ…Œ์ŠคํŠธ ํ•  ๋•Œ๋Š” ํ•ด๋‹น ๋ฐฐ์น˜์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ๊ตฌํ•˜์ง€ ์•Š๊ณ  ๊ตฌํ•ด๋†“์•˜๋˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์œผ๋กœ ์ •๊ทœํ™”๋ฅผ ํ•œ๋‹ค.

     

    ์žฅ์ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

    1. ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ ํฌ๊ฒŒ ๊ฐœ์„ .
    2. ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”์— ๋Œ€ํ•ด ๋œ ๋ฏผ๊ฐ.
    3. ํ›จ์”ฌ ํฐ ํ•™์Šต๋ฅ  ์‚ฌ์šฉ ๊ฐ€๋Šฅ -> ํ•™์Šต ์†๋„ ๊ฐœ์„ 

     

    ๊ทธ๋Ÿฐ๋ฐ ๋ฐฐ์น˜ ์ •๊ทœํ™”์˜ ํšจ๊ณผ๋Š” ๊ต‰์žฅํ•˜์ง€๋งŒ, ๋” ๋‚˜์€ ํ•™์Šต์— ์˜ํ–ฅ์„ ๋ผ์น˜๋Š” ์š”์ธ์ด ๋ฐฐ์น˜ ์ •๊ทœํ™”๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์›์ธ์ธ ๋‚ด๋ถ€ ๊ณต๋ณ€๋Ÿ‰ ๋ณ€ํ™” ๋–„๋ฌธ์€ ์•„๋‹ˆ๋ผ๋Š” ๋…ผ๋ฌธ๋„ ์žˆ๋‹ค.

    https://arxiv.org/pdf/1805.11604.pdf

     

    ํ•œ๊ณ„๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

    1. ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ํฌ๊ธฐ์— ์˜์กด์  (๋„ˆ๋ฌด ์ž‘์€ ๋ฐฐ์น˜ ํฌ๊ธฐ์—์„œ๋Š” ์ž˜ ๋™์ž‘ ์•ˆํ•จ)
    2. RNN์— ์ ์šฉ ์–ด๋ ค์›€

     


    <Reference>

    https://deeplearningzerotoall.github.io/season2/lec_pytorch.html

    https://wikidocs.net/61271

    https://data-newbie.tistory.com/356

    https://excelsior-cjh.tistory.com/178

    ๋Œ“๊ธ€