• 05. Logistic Regression

    2020. 2. 28.

    by. ํ•ด๋Š”์„ 

    ๋ณธ ๊ธ€์€ '๋ชจ๋‘๋ฅผ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ์‹œ์ฆŒ 2'์™€ 'pytorch๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ์ž…๋ฌธ'์„ ๋ณด๋ฉฐ ๊ณต๋ถ€ํ•œ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค.

    ํ•„์ž์˜ ์˜๊ฒฌ์ด ์„ž์—ฌ ๋“ค์–ด๊ฐ€ ๋ถ€์ •ํ™•ํ•œ ๋‚ด์šฉ์ด ์กด์žฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


    ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ : ์ด์ง„ ๋ถ„๋ฅ˜!

     

    0. binary classification

    • ์ด์ง„ ๋ถ„๋ฅ˜, 0 ๋˜๋Š” 1๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ํ˜•ํƒœ.

    0๋˜๋Š” 1๋กœ ๋ถ„๋ฅ˜๋ฅผ ํ•  ๋•Œ๋Š” 0์œผ๋กœ ์ญ‰ ๊ฐ€๋‹ค๊ฐ€ ํŠน์ •ํ•œ ์ ์—์„œ ๊ฐ‘์ž๊ธฐ 1์ด๋˜๊ณ , ์ดํ›„์— ์ญ‰ 1์ด ๋˜๋Š” ๊ณ„๋‹จ ํ•จ์ˆ˜, step function์ด ์ œ์ผ ์ด์ƒ์ ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฏธ๋ถ„์„ ํ•  ๋•Œ ๋“ฑ๋“ฑ ๊ณ„์‚ฐ์— ๋ถˆํŽธํ•จ์ด ๋งŽ๋‹ค. ๊ทธ๋ž˜์„œ ๊ณ„๋‹จ๊ณผ ์œ ์‚ฌํ•œ S์ž ํ˜•ํƒœ๋ฅผ ํ‘œํ˜„ ํ•  ์ˆ˜ ์žˆ๋Š” ํ•จ์ˆ˜๊ฐ€ ํ•„์š”ํ•˜๋‹ค. S์ž ํ˜•ํƒœ๋ฅผ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํŠน์ • ํ•จ์ˆ˜ f๋ฅผ ์ถ”๊ฐ€์ ์œผ๋กœ ์‚ฌ์šฉํ•ด์„œ ์•„๋ž˜์˜ ํ˜•ํƒœ๋ฅผ ์ด๋ค„์•ผ ํ•œ๋‹ค.

    ์—ฌ๊ธฐ์„œ f ์— ์ ํ•ฉํ•œ ํ•จ์ˆ˜๊ฐ€ ๋ฐ”๋กœ ์‹œ๊ทธ๋ชจ์ด๋“œ!

     

    1. Sigmoid function

    ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์˜ ๋ฐฉ์ ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

    ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜๋Š” W์˜ ๊ฐ’์ด ์ปค์ง€๋ฉด ๊ฒฝ์‚ฌ๊ฐ€ ์ปค์ง€๊ณ  W์˜ ๊ฐ’์ด ์ž‘์•„์ง€๋ฉด ๊ฒฝ์‚ฌ๊ฐ€ ์ž‘์•„์ง€๋ฉฐ, b์˜ ๊ฐ’์— ์˜ํ•ด ๊ทธ๋ž˜ํ”„๊ฐ€ ์ขŒ์šฐ๋กœ ์ด๋™ํ•œ๋‹ค. ์„ ํ˜• ํšŒ๊ท€์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, ์—ฌ๊ธฐ์„œ๋„ ์ตœ์ ์˜  W์™€ b๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ๊ฐ€ ๋œ๋‹ค.

    ๋˜ํ•œ, ์‹œ๊ทธ๋ชจ์ด๋“œ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด, ์ž„๊ณ„๊ฐ’์„ ์กฐ์ ˆํ•ด์„œ 0๊ณผ 1์˜ ๋ถ„๋ฅ˜๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค. ๋งŒ์•ฝ 0.5๋ฅผ ์ž„๊ณ„๊ฐ’์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด, ์ด๋ฅผ ๋„˜์œผ๋ฉด 1, ์•„๋‹ˆ๋ฉด 0์œผ๋กœ ์ฒ˜๋ฆฌํ•ด์„œ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

     

    2. Cost function

    ์„ ํ˜• ํšŒ๊ท€ ๋•Œ ์‚ฌ์šฉํ–ˆ๋˜ ๊ฒƒ ์ฒ˜๋Ÿผ MSE๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋œ๋‹ค๋ฉด, non-convexํ˜•ํƒœ์˜ ๋ฏธ๋ถ„๊ฐ’์ด ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฐ ๊ทธ๋ž˜ํ”„์—๋Š” ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋ถ€์ ์ ˆํ•ด์ง„๋‹ค. ๊ทธ๋Ÿผ, convexํ•˜๊ฒŒ ๊ทธ๋ž˜ํ”„๋ฅผ ๋งŒ๋“ค๋ ค๋ฉด ์–ด๋–ค cost function์„ ์‚ฌ์šฉํ•ด์•ผ ํ• ๊นŒ?

     

    ์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์˜ ํŠน์ง•์€ ํ•จ์ˆ˜์˜ ์ถœ๋ ฅ๊ฐ’์ด 0๊ณผ 1์‚ฌ์ด์˜ ๊ฐ’์ด๋ผ๋Š” ์ ์ด๋‹ค. ์ฆ‰, ์‹ค์ œ๊ฐ’์ด 1์ผ ๋•Œ ์˜ˆ์ธก๊ฐ’์ด 0์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉด ์˜ค์ฐจ๊ฐ€ ์ปค์ ธ์•ผ ํ•˜๋ฉฐ, ์‹ค์ œ๊ฐ’์ด 0์ผ ๋•Œ, ์˜ˆ์ธก๊ฐ’์ด 1์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉด ์˜ค์ฐจ๊ฐ€ ์ปค์ ธ์•ผ ํ•œ๋‹ค. ์ด๋ฅผ ์ถฉ์กฑํ•˜๋Š” ํ•จ์ˆ˜๊ฐ€ ๋ฐ”๋กœ ๋กœ๊ทธ ํ•จ์ˆ˜๋‹ค! ๋‘๊ฐœ์˜ ๋กœ๊ทธํ•จ์ˆ˜๋ฅผ 0.5๋ฅผ ๋Œ€์นญ์œผ๋กœ ๊ฒน์นœ๋‹ค๋ฉด, ์ € ์กฐ๊ฑด์„ ์ถฉ์กฑํ•˜๋Š” cost function์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค. ๋‹ค์Œ์€ y=0.5์— ๋Œ€์นญํ•˜๋Š” ๋‘ ๊ฐœ์˜ ๋กœ๊ทธ ํ•จ์ˆ˜ ๊ทธ๋ž˜ํ”„์ด๋‹ค.

     

    ์‹์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค๋ฉด, ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

     

    ์ด ๋‘์‹์„ ํ•˜๋‚˜๋กœ ํ•ฉ์น  ์ˆ˜๋„ ์žˆ๋‹ค.

    ๊ฒฐ๊ณผ์ ์œผ๋กœ ์ด ์‹์„ ์‚ฌ์šฉํ•ด์„œ sigmoid์˜ "๋ชจ๋“  ์˜ค์ฐจ์˜ ํ‰๊ท "์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

     

    ์ด ๋น„์šฉ ํ•จ์ˆ˜๋ฅผ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•  ๋•Œ, pytorch๋ฅผ ์ด์šฉํ•ด์„œ ๋‹ค์Œ์™€ ๊ฐ™์ด ํ•  ์ˆ˜ ์žˆ๋‹ค.

    F.binary_cross_entropy(H(x), y)

    binary cross entropy, BCE ๋ผ๊ณ ๋„ ๋ถˆ๋ฆฌ๋ฉฐ, 0์•„๋‹ˆ๋ฉด 1์„ ๋ฆฌํ„ดํ•œ๋‹ค.

     

    3. Full code

    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    import torch.optim as optim
    
    x_data = [[1, 2], [2, 3], [3, 1], [4, 3], [5, 3], [6, 2]]
    y_data = [[0], [0], [0], [1], [1], [1]] #๋ถ„๋ฅ˜๋‹ˆ๊นŒ 0 ์•„๋‹ˆ๋ฉด 1
    x_train = torch.FloatTensor(x_data) # ๋ฐ์ดํ„ฐ๋ฅผ ํ…์„œ๋กœ ๋ฐ”๊ฟ”์ฃผ๊ธฐ
    y_train = torch.FloatTensor(y_data)
    
    W = torch.zeros((2, 1), requires_grad=True) # ํฌ๊ธฐ๋Š” 2 x 1
    b = torch.zeros(1, requires_grad=True)
    
    # hypothesis = 1 / (1 + torch.exp(-(x_train.matmul(W) + b))) #์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜
    hypothesis = torch.sigmoid(x_train.matmul(W) + b) #์œ„๋ž‘ ๊ฐ™์€ ๋œป์ธ๋ฐ ๋” ๊ฐ„๋‹จํ•˜๊ฒŒ ์—ฐ์‚ฐ
    
    #cost function - ์‹์œผ๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ
    # losses = -(y_train * torch.log(hypothesis) + (1 - y_train) * torch.log(1 - hypothesis))
    # cost = losses.mean()
    
    F.binary_cross_entropy(hypothesis, y_train) #์œ„์˜ ๋‘์ค„๊ณผ ๊ฐ™์€ ๋œป์ธ๋ฐ ๋” ๊ฐ„๋‹จํ•˜๊ฒŒ!
    
    
    #---------------------------------------------------------
    #์ง์ ‘ ๋Œ๋ ค๋ณด์ž!
    
    x_data = [[1, 2], [2, 3], [3, 1], [4, 3], [5, 3], [6, 2]]
    y_data = [[0], [0], [0], [1], [1], [1]]
    x_train = torch.FloatTensor(x_data)
    y_train = torch.FloatTensor(y_data)
    
    W = torch.zeros((2, 1), requires_grad=True)
    b = torch.zeros(1, requires_grad=True)
    
    # optimizer ์„ค์ • - sgd๋ฅผ ์‚ฌ์šฉ, ํ•™์Šต๋ฅ ์€ 1
    optimizer = optim.SGD([W, b], lr=1)
    
    nb_epochs = 1000
    for epoch in range(nb_epochs + 1):
    
        #์‹œ๊ทธ๋ชจ์ด๋“œ ๊ณ„์‚ฐ(๊ฐ€์„ค)
        hypothesis = torch.sigmoid(x_train.matmul(W) + b)
       
        # Cost ๊ณ„์‚ฐ
        cost = F.binary_cross_entropy(hypothesis, y_train)
        
       
        # cost๋กœ H(x) ๊ฐœ์„ 
        
        optimizer.zero_grad() 
        cost.backward()  # ๋ฏธ๋ถ„ํ•˜๊ธฐ
        # ๊ตฌํ•œ loss๋กœ๋ถ€ํ„ฐ back propagation์„ ํ†ตํ•ด ๊ฐ ๋ณ€์ˆ˜๋งˆ๋‹ค loss์— ๋Œ€ํ•œ gradient ๋ฅผ ๊ตฌํ•ด์ฃผ๊ธฐ
        
        optimizer.step() #model์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์ด ์—…๋ฐ์ดํŠธ ๋จ
    
        # 100๋ฒˆ๋งˆ๋‹ค ๋กœ๊ทธ ์ถœ๋ ฅ
        if epoch % 100 == 0:
            print('Epoch {:4d}/{} Cost: {:.6f}'.format(
                epoch, nb_epochs, cost.item()
            ))
    
    
    prediction = hypothesis >= torch.FloatTensor([0.5]) #์ž„๊ณ„๊ฐ’์„ ์ฃผ๊ณ  0, 1๋กœ ๊ตฌ๋ถ„
    

     

    4. Full code with nn.Module

    #import, data๋Š” ์œ„์™€ ๋™์ผ
    
    class BinaryClassifier(nn.Module): #class ๋งŒ๋“ค์–ด์ฃผ๊ธฐ
        def __init__(self):
            super().__init__()
            self.linear = nn.Linear(2, 1)
            self.sigmoid = nn.Sigmoid()
    
        def forward(self, x):
            return self.sigmoid(self.linear(x)) #linear๋ฅผ sigmoidํƒœ์›Œ์คŒ
    
    model = BinaryClassifier() #model์ƒ์„ฑ
    
    # optimizer ์„ค์ •
    optimizer = optim.SGD(model.parameters(), lr=1)
    
    nb_epochs = 10000
    for epoch in range(nb_epochs + 1):
    
        # H(x) ๊ณ„์‚ฐ
        hypothesis = model(x_train)
    
        # cost ๊ณ„์‚ฐ
        cost = F.binary_cross_entropy(hypothesis, y_train)
    
        # cost๋กœ H(x) ๊ฐœ์„ 
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()
        
        # 20๋ฒˆ๋งˆ๋‹ค ๋กœ๊ทธ ์ถœ๋ ฅ
        if epoch % 100 == 0:
            prediction = hypothesis >= torch.FloatTensor([0.5])
            correct_prediction = prediction.float() == y_train
            accuracy = correct_prediction.sum().item() / len(correct_prediction)
            print('Epoch {:4d}/{} Cost: {:.6f} Accuracy {:2.2f}%'.format(
                epoch, nb_epochs, cost.item(), accuracy * 100,
            ))

     

     

    etc. regression, classification

    regression์€ ํšŒ๊ท€๋ผ๊ณ  ํ•˜๋Š”๋ฐ, ์ผ๋ฐ˜์ ์œผ๋กœ ํ‘œํ˜„ํ•˜์ž๋ฉด fitting์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

    ์ฆ‰, lienar regression์ด๋ž€ ์–ด๋–ค data์˜ ๋ถ„ํฌ๊ฐ€ linearํ•˜๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ , ์ด linear ํ•จ์ˆ˜๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด๋‹ค.

    1์ฐจ ํ•จ์ˆ˜๋ฅผ ๋ง‰๋Œ€๊ธฐ๋กœ ํ‘œํ˜„ ํ•˜์ง€๋ฉด data์˜ ๋ถ„ํฌ๋ฅผ ๊ฐ€์žฅ ์ž˜ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋ง‰๋Œ€๊ธฐ์˜ ์œ„์น˜์™€ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด๋‹ค.

     

    ๋‹ค๋ฅด๊ฒŒ ๋งํ•˜์ž๋ฉด data๋ผ ์ด linear ํ•จ์ˆ˜๋ฅผ ๋”ฐ๋ฅด๊ธฐ ๋•Œ๋ฌธ์—, ๊ทธ๋Ÿฌํ•œ ๋ชจ์–‘์œผ๋กœ ๋ถ„ํฌ๋˜์–ด ์žˆ๋‹ค๊ณ ๋„ ํ•  ์ˆ˜ ์žˆ๋‹ค.

    ์ฆ‰, data์˜ ๋ถ„ํฌ๋ฅผ ์ด๋ฃจ๋Š” ํ•จ์ˆ˜์˜ ์›ํ˜•์„ ์ฐพ์•„๊ฐ€๋Š” ๊ฒƒ์ด ํšŒ๊ท€๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

     

    logstic regression์€ ๋ฐ์ดํ„ฐ์˜ cost funtion์„ ์ตœ์†Œํ™” ํ•˜๋„๋ก logistic function์„ regressionํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.

     

    ๋ณดํ†ต continuous ํ•œ ๊ฐ’์— ๋Œ€ํ•ด์„œ๋Š” linear regression์„ ์‚ฌ์šฉํ•˜๊ณ , 0 ๋˜๋Š” 1์˜ classification์€ logistic regression์„ ์‚ฌ์šฉํ•œ๋‹ค.

     

    classification์€ 0์ด๋ƒ 1์ด๋ƒ ๊ฐ’์„ ๋งค๊ธฐ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— logistic function์„ ์ด์šฉํ•œ regression์ด ๋” ์ ํ•ฉํ•˜๋‹ค.

     

     

     


    <Reference>

    https://deeplearningzerotoall.github.io/season2/lec_pytorch.html

    https://wikidocs.net/57805

    https://wikidocs.net/60037

    https://wikidocs.net/58686

    '๐Ÿ“šSTUDY > ๐Ÿ”ฅPytorch ML&DL' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

    07. Tips and MNIST data  (0) 2020.03.01
    06. softmax classification  (0) 2020.02.28
    04-2. Loading Data(Mini batch and data load)  (0) 2020.02.24
    04-1. Multivariable Linear regression  (0) 2020.02.24
    03. Deeper Look at Gradient Descent  (0) 2020.02.24

    ๋Œ“๊ธ€