๐Ÿ“šSTUDY/๐Ÿ”ฅPytorch ML&DL

05. Logistic Regression

ํ•ด๋Š”์„  2020. 2. 28. 16:35

๋ณธ ๊ธ€์€ '๋ชจ๋‘๋ฅผ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ์‹œ์ฆŒ 2'์™€ 'pytorch๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ์ž…๋ฌธ'์„ ๋ณด๋ฉฐ ๊ณต๋ถ€ํ•œ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค.

ํ•„์ž์˜ ์˜๊ฒฌ์ด ์„ž์—ฌ ๋“ค์–ด๊ฐ€ ๋ถ€์ •ํ™•ํ•œ ๋‚ด์šฉ์ด ์กด์žฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ : ์ด์ง„ ๋ถ„๋ฅ˜!

 

0. binary classification

  • ์ด์ง„ ๋ถ„๋ฅ˜, 0 ๋˜๋Š” 1๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ํ˜•ํƒœ.

0๋˜๋Š” 1๋กœ ๋ถ„๋ฅ˜๋ฅผ ํ•  ๋•Œ๋Š” 0์œผ๋กœ ์ญ‰ ๊ฐ€๋‹ค๊ฐ€ ํŠน์ •ํ•œ ์ ์—์„œ ๊ฐ‘์ž๊ธฐ 1์ด๋˜๊ณ , ์ดํ›„์— ์ญ‰ 1์ด ๋˜๋Š” ๊ณ„๋‹จ ํ•จ์ˆ˜, step function์ด ์ œ์ผ ์ด์ƒ์ ์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฏธ๋ถ„์„ ํ•  ๋•Œ ๋“ฑ๋“ฑ ๊ณ„์‚ฐ์— ๋ถˆํŽธํ•จ์ด ๋งŽ๋‹ค. ๊ทธ๋ž˜์„œ ๊ณ„๋‹จ๊ณผ ์œ ์‚ฌํ•œ S์ž ํ˜•ํƒœ๋ฅผ ํ‘œํ˜„ ํ•  ์ˆ˜ ์žˆ๋Š” ํ•จ์ˆ˜๊ฐ€ ํ•„์š”ํ•˜๋‹ค. S์ž ํ˜•ํƒœ๋ฅผ ํ‘œํ˜„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํŠน์ • ํ•จ์ˆ˜ f๋ฅผ ์ถ”๊ฐ€์ ์œผ๋กœ ์‚ฌ์šฉํ•ด์„œ ์•„๋ž˜์˜ ํ˜•ํƒœ๋ฅผ ์ด๋ค„์•ผ ํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ f ์— ์ ํ•ฉํ•œ ํ•จ์ˆ˜๊ฐ€ ๋ฐ”๋กœ ์‹œ๊ทธ๋ชจ์ด๋“œ!

 

1. Sigmoid function

์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์˜ ๋ฐฉ์ ์‹์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜๋Š” W์˜ ๊ฐ’์ด ์ปค์ง€๋ฉด ๊ฒฝ์‚ฌ๊ฐ€ ์ปค์ง€๊ณ  W์˜ ๊ฐ’์ด ์ž‘์•„์ง€๋ฉด ๊ฒฝ์‚ฌ๊ฐ€ ์ž‘์•„์ง€๋ฉฐ, b์˜ ๊ฐ’์— ์˜ํ•ด ๊ทธ๋ž˜ํ”„๊ฐ€ ์ขŒ์šฐ๋กœ ์ด๋™ํ•œ๋‹ค. ์„ ํ˜• ํšŒ๊ท€์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ, ์—ฌ๊ธฐ์„œ๋„ ์ตœ์ ์˜  W์™€ b๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ๊ฐ€ ๋œ๋‹ค.

๋˜ํ•œ, ์‹œ๊ทธ๋ชจ์ด๋“œ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด, ์ž„๊ณ„๊ฐ’์„ ์กฐ์ ˆํ•ด์„œ 0๊ณผ 1์˜ ๋ถ„๋ฅ˜๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค. ๋งŒ์•ฝ 0.5๋ฅผ ์ž„๊ณ„๊ฐ’์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค๋ฉด, ์ด๋ฅผ ๋„˜์œผ๋ฉด 1, ์•„๋‹ˆ๋ฉด 0์œผ๋กœ ์ฒ˜๋ฆฌํ•ด์„œ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

2. Cost function

์„ ํ˜• ํšŒ๊ท€ ๋•Œ ์‚ฌ์šฉํ–ˆ๋˜ ๊ฒƒ ์ฒ˜๋Ÿผ MSE๋ฅผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋œ๋‹ค๋ฉด, non-convexํ˜•ํƒœ์˜ ๋ฏธ๋ถ„๊ฐ’์ด ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค. ์ด๋Ÿฐ ๊ทธ๋ž˜ํ”„์—๋Š” ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋ถ€์ ์ ˆํ•ด์ง„๋‹ค. ๊ทธ๋Ÿผ, convexํ•˜๊ฒŒ ๊ทธ๋ž˜ํ”„๋ฅผ ๋งŒ๋“ค๋ ค๋ฉด ์–ด๋–ค cost function์„ ์‚ฌ์šฉํ•ด์•ผ ํ• ๊นŒ?

 

์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜์˜ ํŠน์ง•์€ ํ•จ์ˆ˜์˜ ์ถœ๋ ฅ๊ฐ’์ด 0๊ณผ 1์‚ฌ์ด์˜ ๊ฐ’์ด๋ผ๋Š” ์ ์ด๋‹ค. ์ฆ‰, ์‹ค์ œ๊ฐ’์ด 1์ผ ๋•Œ ์˜ˆ์ธก๊ฐ’์ด 0์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉด ์˜ค์ฐจ๊ฐ€ ์ปค์ ธ์•ผ ํ•˜๋ฉฐ, ์‹ค์ œ๊ฐ’์ด 0์ผ ๋•Œ, ์˜ˆ์ธก๊ฐ’์ด 1์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉด ์˜ค์ฐจ๊ฐ€ ์ปค์ ธ์•ผ ํ•œ๋‹ค. ์ด๋ฅผ ์ถฉ์กฑํ•˜๋Š” ํ•จ์ˆ˜๊ฐ€ ๋ฐ”๋กœ ๋กœ๊ทธ ํ•จ์ˆ˜๋‹ค! ๋‘๊ฐœ์˜ ๋กœ๊ทธํ•จ์ˆ˜๋ฅผ 0.5๋ฅผ ๋Œ€์นญ์œผ๋กœ ๊ฒน์นœ๋‹ค๋ฉด, ์ € ์กฐ๊ฑด์„ ์ถฉ์กฑํ•˜๋Š” cost function์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค. ๋‹ค์Œ์€ y=0.5์— ๋Œ€์นญํ•˜๋Š” ๋‘ ๊ฐœ์˜ ๋กœ๊ทธ ํ•จ์ˆ˜ ๊ทธ๋ž˜ํ”„์ด๋‹ค.

 

์‹์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค๋ฉด, ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

 

์ด ๋‘์‹์„ ํ•˜๋‚˜๋กœ ํ•ฉ์น  ์ˆ˜๋„ ์žˆ๋‹ค.

๊ฒฐ๊ณผ์ ์œผ๋กœ ์ด ์‹์„ ์‚ฌ์šฉํ•ด์„œ sigmoid์˜ "๋ชจ๋“  ์˜ค์ฐจ์˜ ํ‰๊ท "์„ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

 

์ด ๋น„์šฉ ํ•จ์ˆ˜๋ฅผ ์ฝ”๋“œ๋กœ ๊ตฌํ˜„ํ•  ๋•Œ, pytorch๋ฅผ ์ด์šฉํ•ด์„œ ๋‹ค์Œ์™€ ๊ฐ™์ด ํ•  ์ˆ˜ ์žˆ๋‹ค.

F.binary_cross_entropy(H(x), y)

binary cross entropy, BCE ๋ผ๊ณ ๋„ ๋ถˆ๋ฆฌ๋ฉฐ, 0์•„๋‹ˆ๋ฉด 1์„ ๋ฆฌํ„ดํ•œ๋‹ค.

 

3. Full code

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

x_data = [[1, 2], [2, 3], [3, 1], [4, 3], [5, 3], [6, 2]]
y_data = [[0], [0], [0], [1], [1], [1]] #๋ถ„๋ฅ˜๋‹ˆ๊นŒ 0 ์•„๋‹ˆ๋ฉด 1
x_train = torch.FloatTensor(x_data) # ๋ฐ์ดํ„ฐ๋ฅผ ํ…์„œ๋กœ ๋ฐ”๊ฟ”์ฃผ๊ธฐ
y_train = torch.FloatTensor(y_data)

W = torch.zeros((2, 1), requires_grad=True) # ํฌ๊ธฐ๋Š” 2 x 1
b = torch.zeros(1, requires_grad=True)

# hypothesis = 1 / (1 + torch.exp(-(x_train.matmul(W) + b))) #์‹œ๊ทธ๋ชจ์ด๋“œ ํ•จ์ˆ˜
hypothesis = torch.sigmoid(x_train.matmul(W) + b) #์œ„๋ž‘ ๊ฐ™์€ ๋œป์ธ๋ฐ ๋” ๊ฐ„๋‹จํ•˜๊ฒŒ ์—ฐ์‚ฐ

#cost function - ์‹์œผ๋กœ ํ‘œํ˜„ํ•œ ๊ฒƒ
# losses = -(y_train * torch.log(hypothesis) + (1 - y_train) * torch.log(1 - hypothesis))
# cost = losses.mean()

F.binary_cross_entropy(hypothesis, y_train) #์œ„์˜ ๋‘์ค„๊ณผ ๊ฐ™์€ ๋œป์ธ๋ฐ ๋” ๊ฐ„๋‹จํ•˜๊ฒŒ!


#---------------------------------------------------------
#์ง์ ‘ ๋Œ๋ ค๋ณด์ž!

x_data = [[1, 2], [2, 3], [3, 1], [4, 3], [5, 3], [6, 2]]
y_data = [[0], [0], [0], [1], [1], [1]]
x_train = torch.FloatTensor(x_data)
y_train = torch.FloatTensor(y_data)

W = torch.zeros((2, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

# optimizer ์„ค์ • - sgd๋ฅผ ์‚ฌ์šฉ, ํ•™์Šต๋ฅ ์€ 1
optimizer = optim.SGD([W, b], lr=1)

nb_epochs = 1000
for epoch in range(nb_epochs + 1):

    #์‹œ๊ทธ๋ชจ์ด๋“œ ๊ณ„์‚ฐ(๊ฐ€์„ค)
    hypothesis = torch.sigmoid(x_train.matmul(W) + b)
   
    # Cost ๊ณ„์‚ฐ
    cost = F.binary_cross_entropy(hypothesis, y_train)
    
   
    # cost๋กœ H(x) ๊ฐœ์„ 
    
    optimizer.zero_grad() 
    cost.backward()  # ๋ฏธ๋ถ„ํ•˜๊ธฐ
    # ๊ตฌํ•œ loss๋กœ๋ถ€ํ„ฐ back propagation์„ ํ†ตํ•ด ๊ฐ ๋ณ€์ˆ˜๋งˆ๋‹ค loss์— ๋Œ€ํ•œ gradient ๋ฅผ ๊ตฌํ•ด์ฃผ๊ธฐ
    
    optimizer.step() #model์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์ด ์—…๋ฐ์ดํŠธ ๋จ

    # 100๋ฒˆ๋งˆ๋‹ค ๋กœ๊ทธ ์ถœ๋ ฅ
    if epoch % 100 == 0:
        print('Epoch {:4d}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))


prediction = hypothesis >= torch.FloatTensor([0.5]) #์ž„๊ณ„๊ฐ’์„ ์ฃผ๊ณ  0, 1๋กœ ๊ตฌ๋ถ„

 

4. Full code with nn.Module

#import, data๋Š” ์œ„์™€ ๋™์ผ

class BinaryClassifier(nn.Module): #class ๋งŒ๋“ค์–ด์ฃผ๊ธฐ
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(2, 1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        return self.sigmoid(self.linear(x)) #linear๋ฅผ sigmoidํƒœ์›Œ์คŒ

model = BinaryClassifier() #model์ƒ์„ฑ

# optimizer ์„ค์ •
optimizer = optim.SGD(model.parameters(), lr=1)

nb_epochs = 10000
for epoch in range(nb_epochs + 1):

    # H(x) ๊ณ„์‚ฐ
    hypothesis = model(x_train)

    # cost ๊ณ„์‚ฐ
    cost = F.binary_cross_entropy(hypothesis, y_train)

    # cost๋กœ H(x) ๊ฐœ์„ 
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()
    
    # 20๋ฒˆ๋งˆ๋‹ค ๋กœ๊ทธ ์ถœ๋ ฅ
    if epoch % 100 == 0:
        prediction = hypothesis >= torch.FloatTensor([0.5])
        correct_prediction = prediction.float() == y_train
        accuracy = correct_prediction.sum().item() / len(correct_prediction)
        print('Epoch {:4d}/{} Cost: {:.6f} Accuracy {:2.2f}%'.format(
            epoch, nb_epochs, cost.item(), accuracy * 100,
        ))

 

 

etc. regression, classification

regression์€ ํšŒ๊ท€๋ผ๊ณ  ํ•˜๋Š”๋ฐ, ์ผ๋ฐ˜์ ์œผ๋กœ ํ‘œํ˜„ํ•˜์ž๋ฉด fitting์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

์ฆ‰, lienar regression์ด๋ž€ ์–ด๋–ค data์˜ ๋ถ„ํฌ๊ฐ€ linearํ•˜๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ , ์ด linear ํ•จ์ˆ˜๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด๋‹ค.

1์ฐจ ํ•จ์ˆ˜๋ฅผ ๋ง‰๋Œ€๊ธฐ๋กœ ํ‘œํ˜„ ํ•˜์ง€๋ฉด data์˜ ๋ถ„ํฌ๋ฅผ ๊ฐ€์žฅ ์ž˜ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋ง‰๋Œ€๊ธฐ์˜ ์œ„์น˜์™€ ๊ธฐ์šธ๊ธฐ๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด๋‹ค.

 

๋‹ค๋ฅด๊ฒŒ ๋งํ•˜์ž๋ฉด data๋ผ ์ด linear ํ•จ์ˆ˜๋ฅผ ๋”ฐ๋ฅด๊ธฐ ๋•Œ๋ฌธ์—, ๊ทธ๋Ÿฌํ•œ ๋ชจ์–‘์œผ๋กœ ๋ถ„ํฌ๋˜์–ด ์žˆ๋‹ค๊ณ ๋„ ํ•  ์ˆ˜ ์žˆ๋‹ค.

์ฆ‰, data์˜ ๋ถ„ํฌ๋ฅผ ์ด๋ฃจ๋Š” ํ•จ์ˆ˜์˜ ์›ํ˜•์„ ์ฐพ์•„๊ฐ€๋Š” ๊ฒƒ์ด ํšŒ๊ท€๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

logstic regression์€ ๋ฐ์ดํ„ฐ์˜ cost funtion์„ ์ตœ์†Œํ™” ํ•˜๋„๋ก logistic function์„ regressionํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•œ๋‹ค.

 

๋ณดํ†ต continuous ํ•œ ๊ฐ’์— ๋Œ€ํ•ด์„œ๋Š” linear regression์„ ์‚ฌ์šฉํ•˜๊ณ , 0 ๋˜๋Š” 1์˜ classification์€ logistic regression์„ ์‚ฌ์šฉํ•œ๋‹ค.

 

classification์€ 0์ด๋ƒ 1์ด๋ƒ ๊ฐ’์„ ๋งค๊ธฐ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— logistic function์„ ์ด์šฉํ•œ regression์ด ๋” ์ ํ•ฉํ•˜๋‹ค.

 

 

 


<Reference>

https://deeplearningzerotoall.github.io/season2/lec_pytorch.html

https://wikidocs.net/57805

https://wikidocs.net/60037

https://wikidocs.net/58686