• 06. softmax classification

    2020. 2. 28.

    by. ํ•ด๋Š”์„ 

    ๋ณธ ๊ธ€์€ '๋ชจ๋‘๋ฅผ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ์‹œ์ฆŒ 2'์™€ 'pytorch๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ์ž…๋ฌธ'์„ ๋ณด๋ฉฐ ๊ณต๋ถ€ํ•œ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค.

    ํ•„์ž์˜ ์˜๊ฒฌ์ด ์„ž์—ฌ ๋“ค์–ด๊ฐ€ ๋ถ€์ •ํ™•ํ•œ ๋‚ด์šฉ์ด ์กด์žฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


    3๊ฐœ ์ด์ƒ์˜ ์„ ํƒ์ง€์—์„œ 1๊ฐœ๋ฅผ ์„ ํƒ!  (softํ•˜๊ฒŒ max๊ฐ’์„ ๋ฝ‘์•„์ฃผ๋Š”)

    ⇒ ๋‹ค์ค‘ ํด๋ž˜์Šค ๋ถ„๋ฅ˜ (Multi-class classification)

    ์„ธ ๊ฐœ ์ด์ƒ์˜ ๋‹ต ์ค‘ ํ•˜๋‚˜๋ฅผ ๊ณ ๋ฅด๋Š” ๋ฌธ์ œ.

    ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜๋Š” ๋กœ์ง€์Šคํ‹ฑ ํ•จ์ˆ˜์˜ ํ•œ ์ผ€์ด์Šค๋ผ ๋ณผ ์ˆ˜ ์žˆ๊ณ , ์ธํ’‹์ด ํ•˜๋‚˜์ผ ๋•Œ ์‚ฌ์šฉ๋˜๋Š” ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜๋ฅผ ์ธํ’‹์ด ์—ฌ๋Ÿฌ๊ฐœ์ผ ๋•Œ๋„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ์ผ๋ฐ˜ํ™” ํ•œ ๊ฒƒ์ด ์†Œํ”„ํŠธ๋งฅ์Šค ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

     

    0. ์›-ํ•ซ ์ธ์ฝ”๋”ฉ(one-Hot Encoding)

    1. ์„ ํƒ์ง€์˜ ๊ฐœ์ˆ˜๋งŒํผ ์ฐจ์›์„ ๊ฐ€์ง„๋‹ค.
    2. ์„ ํƒ์ง€์— ํ•ด๋‹นํ•˜๋Š” ์ธ๋ฑ์Šค๋Š” 1, ๋‚˜๋จธ์ง€๋Š” 0์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค.

    ex)

    ๊ฐ•์•„์ง€ = [1, 0, 0]
    ๊ณ ์–‘์ด = [0, 1, 0]
    ๋ƒ‰์žฅ๊ณ  = [0, 0, 1]

     

    • ์ •์ˆ˜ ์ธ์ฝ”๋”ฉ(1, 2, 3)๊ณผ์˜ ์ฐจ์ด์ 

      ⇒ ์ •์ˆ˜ ์ธ์ฝ”๋”ฉ์€ ๊ฐ ํด๋ž˜์Šค๊ฐ€ ์ˆœ์„œ ์ •๋ณด๋ฅผ ํ•„์š”๋กœ ํ•  ๋•Œ ์œ ์šฉํ•˜๋‹ค.

      ⇒ ์› ํ•ซ ์ธ์ฝ”๋”ฉ์€ ์ผ๋ฐ˜์ ์ธ ๋ถ„๋ฅ˜๋ฌธ์ œ, ์ฆ‰ ์ˆœ์„œ๊ฐ€ ์˜๋ฏธ์—†๊ณ  ๋ฌด์ž‘์œ„์„ฑ์ด ์žˆ์„ ๋•Œ ์œ ์šฉํ•˜๋‹ค.

      (๋ชจ๋“  ํด๋ž˜์Šค์˜ ๊ด€๊ณ„๋ฅผ ๊ท ๋“ฑํ•˜๊ฒŒ ๋ถ„๋ฐฐํ•˜๊ธฐ ๋•Œ๋ฌธ!)

    1. softmax function

    ๊ฐ ์„ ํƒ์ง€๋งˆ๋‹ค ์†Œ์ˆ˜๋ฅผ ํ• ๋‹นํ•ด์„œ ๊ทธ ํ•ฉ์ด 1์ด ๋˜๊ฒŒ ๋งŒ๋“œ๋Š” ํ•จ์ˆ˜.

     

    for i = 1, 2, ..., k

     

    pi๋Š” i๋ฒˆ ํด๋ž˜์Šค๊ฐ€ ์ •๋‹ต์ผ ํ™•๋ฅ ์„ ๋œปํ•œ๋‹ค. pi(i=1~k)๋ฅผ ๋‹ค ๋”ํ•˜๋ฉด, ๊ทธ ํ•ฉ์€ 1์ด ๋œ๋‹ค. ์ฆ‰, ์†Œํ”„ํŠธ ๋งฅ์Šค ํ•จ์ˆ˜๋Š” ์–ด๋ ต๊ฒŒ ์ƒ๊ฐํ•  ํ•„์š” ์—†์ด ์ฃผ์–ด์ง„ ๊ฐ’๋“ค์— ๋Œ€ํ•ด ํ•ฉ์ด 1์ด ๋˜๋„๋ก ๊ทธ ๊ฐ’๋“ค์„ ๋น„์œจ์— ๋งž์ถฐ ์†Œ์ˆ˜๋กœ ์ •๊ทœํ™” ์‹œ์ผœ์ฃผ๋Š” ํ•จ์ˆ˜๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค.

     

    Softmax( ( 1xf ) * ( fxC ) + ( Cx1 ) ) = C x 1

    ์ฐจ๋ก€๋กœ ์ž…๋ ฅ๊ฐ’, ๊ฐ€์ค‘์น˜, ํŽธํ–ฅ, ์˜ˆ์ธก๊ฐ’์ด๋‹ค. (f๋Š” ํŠน์„ฑ์˜ ์ˆ˜, C๋Š” ํด๋ž˜์Šค์˜ ๊ฐœ์ˆ˜)

    ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜์— ๋”ฐ๋ผ์„œ ์ž…๋ ฅ๊ฐ’์˜ 1์ด ๋ฐ”๋€๋‹ค.

     

    2. cost function

    logistic regresstion์—์„œ๋Š” binary cross-entropy๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค. ์–˜๋Š” 2๊ฐœ ์ค‘ ํ•˜๋‚˜๋ฅผ ๊ฒฐ๊ณผ๊ฐ’์œผ๋กœ ๋‚ด ๋†“์•˜์—ˆ๋Š”๋ฐ, ์ด BCE ๋ณด๋‹ค ๋” ๊ทผ์›์ ์ธ? ํ•จ์ˆ˜๊ฐ€ ์žˆ๋‹ค. ๋ฐ”๋กœ CE! cross entropy!

    CE๋Š” 3๊ฐœ ์ด์ƒ์˜ ๊ฐ’ ์ค‘ ํ•˜๋‚˜๋ฅผ ๋‚ด์–ด ๋†“๋Š”๋‹ค.

    ์—ฌ๊ธฐ์„œ ์ตœ๋Œ€๊ฐ’์ธ K๋ฅผ 2๋กœ ์ง€์ •ํ•˜๊ฒŒ ๋œ๋‹ค๋ฉด, BCE์˜ ์‹์ด ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค!

     

    3. Code ๊ตฌํ˜„

    softmax์™€ cross-entropy์˜ ๊ตฌํ˜„ ๋ฐฉ๋ฒ•์—๋Š” 3๊ฐ€์ง€๊ฐ€ ์žˆ๋‹ค.

     

    #1
    F.softmax() + torch.log()  # = F.log_softmax() 
    
    #2
    F.log_softmax() + F.nll_loss() # = F.cross_entropy()
    
    #3
    F.cross_entropy()

    ๊ฒฐ๋ก ์ ์œผ๋กœ๋Š” ํŽธํ•˜๊ฒŒ 3๋ฒˆ๋งŒ ์‚ฌ์šฉํ•˜๋ฉด ๋œ๋‹ค! ํŠน์ดํ•˜๊ฒŒ ๊ฐ€์„ค ํ•จ์ˆ˜์™€ ์†์‹ค ํ•จ์ˆ˜๋ฅผ ํ•œ๋ฒˆ์— ์“ธ ์ˆ˜ ์žˆ๋‹ค! ์ด๋Ÿด ๊ฒฝ์šฐ, ์‹ค์ œ code์—์„œ๋Š” ํ–‰๋ ฌ์˜ ๊ณฑ๋งŒ ์‹œ์ผœ์ฃผ๊ณ , ์†Œ์ˆ˜ ํ•ฉ์ด 1์ด ๋˜๋„๋ก ์ •๊ทœํ™” ์‹œ์ผœ์ฃผ๋Š” ๊ณผ์ •์€ ์†์‹คํ•จ์ˆ˜๋ฅผ ์“ธ ๋•Œ ๊ฐ™์ด ํ•  ์ˆ˜ ์žˆ๋Š” ์ € F.cross_entropy()์— ๋งž๊ฒจ์ฃผ๋ฉด ๋œ๋‹ค.

     

    4. Full Code

    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    import torch.optim as optim
    
    x_train = [[1, 2, 1, 1], #4๊ฐœ์˜ ํŠน์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” 8๊ฐœ์˜ ํ…Œ์ŠคํŠธ ์ผ€์ด์Šค
               [2, 1, 3, 2],
               [3, 1, 3, 4],
               [4, 1, 5, 5],
               [1, 7, 5, 5],
               [1, 2, 5, 6],
               [1, 6, 6, 6],
               [1, 7, 7, 7]]
    y_train = [2, 2, 2, 1, 1, 1, 0, 0]
    
    x_train = torch.FloatTensor(x_train) #ํ…์„œ๋กœ ๋ณ€ํ™˜
    y_train = torch.LongTensor(y_train)
    
    y_one_hot = torch.zeros(8, 3) 
    #์•ž์˜ 8์€ ํ…Œ์ŠคํŠธ ์ผ€์ด์Šค ๊ฐœ์ˆ˜, ๋’ค์˜ 3์€ ๋‹ต์ด 2์ผ๋•Œ [0 0 1] ์ด๋Ÿฐ์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ผ ๊ฒƒ(์ง€๊ธˆ์€ ์ž๋ฆฌ๋งŒ ๋งŒ๋“ฆ)
    
    y_one_hot.scatter_(1, y_train.unsqueeze(1), 1) #์‹ค์ œ ๊ฐ’ y๋ฅผ ์›-ํ•ซ ๋ฒกํ„ฐ๋กœ ๋ฐ”๊ฟˆ
    print(y_one_hot.shape)
    
    
    # ๋ชจ๋ธ ์ดˆ๊ธฐํ™”
    W = torch.zeros((4, 3), requires_grad=True) #ํŠน์„ฑ์€ 4๊ฐœ, ๊ฒฐ๊ณผ ๊ฐ€์ง€์ˆ˜๋Š” 3๊ฐœ
    b = torch.zeros(1, requires_grad=True) #1๋กœ ํ•˜๋ฉด 3๊ฐœ์— ๊ฐ™์€ ๊ฐ’์ด ๋”ํ•ด์ง, 3์œผ๋กœ ํ•ด๋„ ์ƒ๊ด€์—†์Œ!
    
    # optimizer ์„ค์ •
    optimizer = optim.SGD([W, b], lr=0.1)
    
    nb_epochs = 1000
    for epoch in range(nb_epochs + 1):
    
        # ๊ฐ€์„ค
        hypothesis = F.softmax(x_train.matmul(W) + b, dim=1) 
    
        # ๋น„์šฉ ํ•จ์ˆ˜ - ์ง์ ‘ ๊ณ„์‚ฐ ๋ฒ„์ „
        cost = (y_one_hot * -torch.log(hypothesis)).sum(dim=1).mean()
    
        # cost๋กœ H(x) ๊ฐœ์„ 
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()
    
        # 100๋ฒˆ๋งˆ๋‹ค ๋กœ๊ทธ ์ถœ๋ ฅ
        if epoch % 100 == 0:
            print('Epoch {:4d}/{} Cost: {:.6f}'.format(
                epoch, nb_epochs, cost.item()
            ))
    

     

    4-1. Full Code with nn,Module

    class SoftmaxClassifierModel(nn.Module):
        def __init__(self):
            super().__init__()
            self.linear = nn.Linear(4, 3) # ์ธํ’‹์€ 4, Output์ด 3!
    
        def forward(self, x):
            return self.linear(x)
    
    model = SoftmaxClassifierModel() #๋ชจ๋ธ ์ƒ์„ฑ
    
    # optimizer ์„ค์ •
    optimizer = optim.SGD(model.parameters(), lr=0.1)
    
    nb_epochs = 1000
    for epoch in range(nb_epochs + 1):
    
        # H(x) ๊ณ„์‚ฐ - ํ–‰๋ ฌ ๊ณฑ๋งŒ ํ•ด์ค€๋‹ค
        prediction = model(x_train)
    
        # cost ๊ณ„์‚ฐ - ์ด ํ•จ์ˆ˜์„œ softmax์ž๋™์œผ๋กœ ์ ์šฉ๋จ
        cost = F.cross_entropy(prediction, y_train)
    
        # cost๋กœ H(x) ๊ฐœ์„ 
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()
    
        # 20๋ฒˆ๋งˆ๋‹ค ๋กœ๊ทธ ์ถœ๋ ฅ
        if epoch % 100 == 0:
            print('Epoch {:4d}/{} Cost: {:.6f}'.format(
                epoch, nb_epochs, cost.item()
            ))

     


    <Reference>

    https://deeplearningzerotoall.github.io/season2/lec_pytorch.html

    https://wikidocs.net/59427

    https://wikidocs.net/60572

    https://wikidocs.net/60575

     

     

    '๐Ÿ“šSTUDY > ๐Ÿ”ฅPytorch ML&DL' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

    08. Perceptron  (0) 2020.03.03
    07. Tips and MNIST data  (0) 2020.03.01
    05. Logistic Regression  (0) 2020.02.28
    04-2. Loading Data(Mini batch and data load)  (0) 2020.02.24
    04-1. Multivariable Linear regression  (0) 2020.02.24

    ๋Œ“๊ธ€