ํ•ด๋Š”์„  2020. 3. 3. 20:59

๋ณธ ๊ธ€์€ '๋ชจ๋‘๋ฅผ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ์‹œ์ฆŒ 2'์™€ 'pytorch๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ์ž…๋ฌธ'์„ ๋ณด๋ฉฐ ๊ณต๋ถ€ํ•œ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค.

ํ•„์ž์˜ ์˜๊ฒฌ์ด ์„ž์—ฌ ๋“ค์–ด๊ฐ€ ๋ถ€์ •ํ™•ํ•œ ๋‚ด์šฉ์ด ์กด์žฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


0. ํผ์…‰ํŠธ๋ก  (Perceptron)

๋‹ค์ˆ˜์˜ ์ž…๋ ฅ์œผ๋กœ๋ถ€ํ„ฐ ํ•˜๋‚˜์˜ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋ณด๋‚ด๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜. ๋‰ด๋Ÿฐ์˜ ๋™์ž‘ ๋ฐฉ์‹๊ณผ ๋งค์šฐ ์œ ์‚ฌํ•˜๋‹ค.

 

๊ฐ๊ฐ์˜ ์ž…๋ ฅ ๊ฐ’์— ๊ฐ€์ค‘์น˜์— ๊ณฑํ•ด์„œ y์— ์ „๋‹ฌ ๋œ๋‹ค. ์ด๋•Œ, ์ „๋‹ฌ๋œ ๊ฐ’์ด ์ž„๊ณ„์น˜๋ฅผ ๋…ธ๋“œ๊ฐ€ ํ™œ์„ฑํ™”๋˜๊ณ , ์•„๋‹ˆ๋ผ๋ฉด ํ™œ์„ฑํ™” ๋˜์ง€ ์•Š๋Š”๋‹ค. ์ด๋ ‡๊ฒŒ ํ™œ์„ฑํ™”๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ํ™œ์„ฑํ™” ํ•จ์ˆ˜(active function)๋ผ๊ณ  ํ•˜๊ณ , ๊ทธ ์ข…๋ฅ˜์—๋Š” ๊ณ„๋‹จํ•จ์ˆ˜, ์‹œ๊ทธ๋ชจ์ด๋“œ, ReLU ๋“ฑ์ด ์žˆ๋‹ค.

 

1. ๋‹จ์ธต ํผ์…‰ํŠธ๋ก 

๋‹จ์ธต ํผ์…‰ํŠธ๋ก ์€ ๊ฐ’์„ ๋ณด๋‚ด๋Š” ๋‹จ๊ณ„๊ณผ ๊ฐ’์„ ๋ฐ›์•„์„œ ์ถœ๋ ฅํ•˜๋Š” ๋‘ ๋‹จ๊ณ„๋กœ๋งŒ ์ด๋ฃจ์–ด์กŒ๋‹ค. ์ฆ‰ ์ž…๋ ฅ์ธต๊ณผ ๊ฒฐ๊ณผ์ธต์œผ๋กœ๋งŒ ์ด๋ค„์ง„๋‹ค. ์ธต์ด ํ•˜๋‚˜๋ผ ์„ ํ˜• ์˜์—ญ์— ๋Œ€ํ•ด์„œ๋งŒ ๋ถ„๋ฅ˜๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค. ๊ทธ๋ž˜์„œ ๊ฐ€๋Šฅํ•œ ์—ฐ์‚ฐ์€ AND์—ฐ์‚ฐ, OR์—ฐ์‚ฐ, NAND์—ฐ์‚ฐ์ด ์žˆ๋‹ค. XOR ์—ฐ์‚ฐ์€ ๋ถˆ๊ฐ€๋Šฅ ํ•œ๋ฐ, ์•„๋ž˜์˜ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๋ฉด ๋” ์‰ฝ๊ฒŒ ์ดํ•ด๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.

AND GATE
OR, NAND GATE

and, or, nand gate๋ฅผ ๊ทธ๋ ค๋ณด๋ฉด, ๋ชจ๋‘ ์ง์„  ํ•˜๋‚˜๋กœ ๊ฒฝ์šฐ๋“ค์„ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

XOR GATE

๊ทธ๋Ÿฌ๋‚˜, XOR gate๋Š” ์ง์„  ํ•˜๋‚˜๋กœ๋Š” ๋ถ„๋ฅ˜๋ฅผ ํ•  ์ˆ˜๊ฐ€ ์—†๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ• ๊นŒ? ์ง์„ ์„ ๊ณก์„ ์œผ๋กœ ๋งŒ๋“ค๋ฉด ๋ ๊นŒ?

 

๊ณก์„ ์œผ๋กœ ํ•˜๋‹ˆ, ๋ถ„๋ฅ˜๊ฐ€ ๋œ๋‹ค!

 

2. ๋‹ค์ธต ํผ์…‰ํŠธ๋ก 

๋‹ค์ธต ํผ์…‰ํŠธ๋ก ์€ ์ž…๋ ฅ์ธต, ์ถœ๋ ฅ์ธต ์™ธ์— '์€๋‹‰์ธต(hidden layer)'์ด๋ผ๋Š” ๋˜ ๋‹ค๋ฅธ ์ธต์ด ์žˆ๋Š” ํผ์…‰ํŠธ๋ก ์„ ์˜๋ฏธํ•œ๋‹ค. ๋‹ค์ธต ํผ์…‰ํŠธ๋ก ์—์„œ๋Š” XOR์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. Multylayer perceptron์ด๋ผ๊ณ ๋„ ํ•˜๋ฉฐ, ์ค„์—ฌ์„œ MLP ๋ผ๊ณ ๋„ ๋ถ€๋ฅธ๋‹ค.

 

 

์ €๋ ‡๊ฒŒ ์€๋‹‰์ธต์ด 2๊ฐœ ์ด์ƒ์ธ ์‹ ๊ฒฝ๋ง์„ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง(Deep Neural Network, DNN)๋ผ๊ณ  ํ•œ๋‹ค.

 

๊ทธ๋Ÿฐ๋ฐ ๋ง์ž…๋‹ˆ๋‹ค, ๋‹จ์ผ ํผ์…‰ํŠธ๋ก ์˜ ๊ฒฝ์šฐ ์˜ค์ฐจ๋ฅผ ๋ฐ”๋กœ๋ฐ”๋กœ ๊ฐœ์„ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. (h(x) = xw + b ์—์„œ w๊ฐ™์€ ๊ฒฝ์šฐ, ๋ฐ”๋กœ๋ฐ”๋กœ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์„ ์ ์šฉํ–ˆ๋‹ค) ๊ทธ๋Ÿฐ๋ฐ ๋‹ค์ธต ํผ์…‰ํŠธ๋ก ์˜ ๊ฒฝ์šฐ, ๊ฐ๊ฐ์˜ ๊ฐ€์ค‘์น˜์˜ ์˜ค์ฐจ๋ฅผ ์–ด๋–ป๊ฒŒ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์„๊นŒ?

 

์ด์ œ, ์ธ๊ณต ์‹ ๊ฒฝ๋ง์ด ์ˆœ์ „ํŒŒ ๊ณผ์ •์„ ์ง„ํ–‰ํ•˜์—ฌ ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’์˜ ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•˜์˜€์„ ๋•Œ ์–ด๋–ป๊ฒŒ ์—ญ์ „ํŒŒ ๊ณผ์ •์—์„œ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š”์ง€ ์•Œ์•„๋ณด์ž. (์ธ๊ณต ์‹ ๊ฒฝ๋ง์˜ ํ•™์Šต์€ ์˜ค์ฐจ๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฐ€์ค‘์น˜๋ฅผ ์ฐพ๋Š” ๋ชฉ์ ์œผ๋กœ ์ˆœ์ „ํŒŒ์™€ ์—ญ์ „ํŒŒ๋ฅผ ๋ฐ˜๋ณตํ•˜๋Š” ๊ฒƒ์„ ๋งํ•œ๋‹ค.)

 

3. ์ˆœ์ „ํŒŒ (Forward propagation)

์ž…๋ ฅ์ธต -> ์€๋‹‰์ธต -> ์ถœ๋ ฅ์ธต ๋ฐฉํ–ฅ์œผ๋กœ ํ–ฅํ•˜๋ฉด์„œ, ๊ฐ’์„ ๋‚ด์–ด๋†“๋Š” ํ–‰์œ„. ์ด ๊ณผ์ •์œผ๋กœ ์•Œ์•„๋‚ธ ์˜ˆ์ธก๊ฐ’์œผ๋กœ ์‹ค์ œ๊ฐ’๊ณผ์˜ ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.

 

4. ์—ญ์ „ํŒŒ (Backpropagation)

์ถœ๋ ฅ์ธต -> ์€๋‹‰์ธต -> ์ž…๋ ฅ์ธต ๋ฐฉํ–ฅ์œผ๋กœ ํ–ฅํ•˜๋ฉด์„œ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•œ๋‹ค. ์ˆœ์ „ํŒŒ๋ฅผ ํ†ตํ•ด ์–ป์€ ์˜ค์ฐจ๋ฅผ ์ด์šฉํ•ด์„œ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธ ํ•˜๊ณ  ์˜ค์ฐจ๋ฅผ ์ค„์—ฌ๋‚˜๊ฐ„๋‹ค. ๊ณ„์‚ฐ์—๋Š” ๋ฏธ๋ถ„์˜ ์—ฐ์‡„๋ฒ•์น™์„ ์ด์šฉํ•œ๋‹ค.

 

์ฆ‰, ๊ฐ๊ฐ์˜ ๊ฐ€์ค‘์น˜๋“ค์ด ๊ฒฐ๊ณผ๊ฐ’์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๋น„์œจ(๋ฏธ๋ถ„๊ฐ’)์„ ๊ตฌํ•œ ๋’ค, ์˜ค์ฐจ๋ฅผ ์ค„์—ฌ๋‚˜๊ฐˆ ์ˆ˜ ์žˆ๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋นผ ์ฃผ๋Š” ๊ฒƒ์ด๋‹ค. 

 

๊ฒฐ๊ตญ ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•œ ๋น„์šฉํ•จ์ˆ˜์˜ ๋ณ€ํ™”์œจ = ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•œ ๊ฐ€์„ค์˜ ๋ณ€ํ™”์œจ x ๊ฐ€์„ค์— ๋Œ€ํ•œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ๋ณ€ํ™”์œจ x ํ™œ์„ฑํ™” ํ•จ์ˆ˜์— ๋Œ€ํ•œ ๋น„์šฉํ•จ์ˆ˜์˜ ๋ณ€ํ™”์œจ์ด๋‹ค. ์—ฌ๊ธฐ์— ๋ฏธ๋ถ„์˜ ์—ฐ์‡„๋ฒ•์น™์ด ์ด์šฉ๋œ๋‹ค. (A์— ๋Œ€ํ•œ B ์˜ ๋ณ€ํ™”์œจ => B๋ถ„์˜ A)

 

5.  XOR code (MLP)

import torch

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# for reproducibility
torch.manual_seed(777)
if device == 'cuda':
    torch.cuda.manual_seed_all(777)
    
X = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]]).to(device)
Y = torch.FloatTensor([[0], [1], [1], [0]]).to(device)

#๋ ˆ์ด์–ด ์„ ์–ธ (w์™€ b๋ฅผ ์ง์ ‘ ์„ค์ •)
# nn.Linear๋ฅผ 2๊ฐœ ์‚ฌ์šฉํ•œ ๊ฒƒ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€

w1 = torch.Tensor(2, 2).to(device) # 2->2
b1 = torch.Tensor(2).to(device)
w2 = torch.Tensor(2, 1).to(device) #2->1
b2 = torch.Tensor(1).to(device)

################################## (์ถ”๊ฐ€๋œ ๋ถ€๋ถ„)
torch.nn.init.normal_(w1)  #์ •๊ทœ๋ถ„ํฌ์—์„œ ๊ฐ€์ ธ์˜จ ๊ฐ’์œผ๋กœ ํ…์„œ๋ฅผ ์ฑ„์šด๋‹ค.
torch.nn.init.normal_(b1)
torch.nn.init.normal_(w2)
torch.nn.init.normal_(b2)
###################################

def sigmoid(x):
    return 1.0/(1.0 + torch.exp(-x))
    
def sigmoid_prime(x): #์‹œ๊ทธ๋ชจ์ด๋“œ์˜ ๋ฏธ๋ถ„.
    return sigmoid(x)*(1-sigmoid(x))
    
learning_rate = 1

for step in range(10001):
    #forward
    l1 = torch.add(torch.matmul(X, w1), b1) #linear๋Œ€๋กœ ๊ณ„์‚ฐ(Wx + b)   (4*2)
    a1 = sigmoid(l1) #ํ™œ์„ฑํ™”
    l2 = torch.add(torch.matmul(a1, w2), b2) #(4*1)
    y_pred = sigmoid(l2) #ํ™œ์„ฑํ™”
    
    #BCE Loss์‚ฌ์šฉ
    cost = -torch.mean(Y * torch.log(y_pred) + (1 - Y) * torch.log(1 - y_pred))
    
    
    #back prop (chain rule)
    d_y_pred = (y_pred - Y)/(y_pred * (1.0 - y_pred) + 1e-7) #bce๋ฅผ ๋ฏธ๋ถ„ํ•œ ์‹
    
    #์ถœ๋ ฅ์ธต -> ์€๋‹‰์ธต
    d_l2 = d_y_pred * sigmoid_prime(l2) #(4*1)
    d_b2 = d_l2
    d_w2 = torch.matmul(torch.transpose(a1, 0, 1), d_b2) #(2*1)
    #a๊ฐ€ 4*2๋ผ์„œ 2*4๋กœ ๋ฐ”๊ฟ”์„œ ํ–‰๋ ฌ๊ณฑ(2*4 x 4*1)์„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฐ”๊ฟ”์คŒ.
    
    #์€๋‹‰์ธต -> ์ž…๋ ฅ์ธต
    d_a1 = torch.matmul(d_b2, torch.transpose(w2, 0, 1)) #(4*1) x (1*2) = (4*2)
    d_l1 = d_a1 * sigmoid_prime(l1) # 4*2
    d_b1 = d_l1 # 4*2
    d_w1 = torch.matmul(torch.transpose(X, 0, 1), d_b1) #(2*4) * (4*2) = (2*2)
    
    
    # weight update
    w1 = w1 - learning_rate * d_w1 #๋ฏธ๋ถ„๊ฐ’(๊ธฐ์šธ๊ธฐ)๋นผ์ฃผ๊ธฐ(๋ฐฉํ–ฅ์ด ์ค‘์š”!)
    b1 = b1 - learning_rate * torch.mean(d_b1, 0) 
    #ํŽธ์ฐจ๋Š” ์Šค์นผ๋ผ์ธ๋ฐ ๋ฏธ๋ถ„ํ•œ ํŽธ์ฐจ๊ฐ’์ด ๋ฒกํ„ฐ๋ผ์„œ ๋งž์ถฐ์ค˜์•ผํ•จ??
    w2 = w2 - learning_rate * d_w2
    b2 = b2 - learning_rate * torch.mean(d_b2, 0)
    
    if step%100==0:
        print(step, cost.item())
        
        
l1 = torch.add(torch.matmul(X, w1), b1) #linear๋Œ€๋กœ ๊ณ„์‚ฐ(Wx + b)   (4*2)
a1 = sigmoid(l1) #ํ™œ์„ฑํ™”
l2 = torch.add(torch.matmul(a1, w2), b2) #(4*1)
y_pred = sigmoid(l2) #ํ™œ์„ฑํ™”

predicted = (y_pred > 0.5).float()
accuracy = (predicted == Y).float().mean()
    
    
print('Hypothesis:', y_pred.detach().cpu().numpy(), '\nCorrect: ', Y.detach().cpu().numpy(), '\nAccuracy: ', accuracy.item())

 

5-1.  XOR code (MLP) with nn.Module

import torch

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# for reproducibility
torch.manual_seed(777)
if device == 'cuda':
    torch.cuda.manual_seed_all(777)
    
X = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]]).to(device)
Y = torch.FloatTensor([[0], [1], [1], [0]]).to(device)

# nn layers
linear1 = torch.nn.Linear(2, 10, bias=True)
linear2 = torch.nn.Linear(10, 10, bias=True)
linear3 = torch.nn.Linear(10, 10, bias=True)
linear4 = torch.nn.Linear(10, 1, bias=True)
sigmoid = torch.nn.Sigmoid()

# model
model = torch.nn.Sequential(linear1, sigmoid, linear2, sigmoid, linear3, sigmoid, linear4, sigmoid).to(device)

# define cost/loss & optimizer
criterion = torch.nn.BCELoss().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=3)  # modified learning rate from 0.1 to 1

for step in range(10001):
    optimizer.zero_grad()
    hypothesis = model(X)

    # cost/loss function
    cost = criterion(hypothesis, Y)
    cost.backward()
    optimizer.step()

    if step % 100 == 0:
        print(step, cost.item())
        
# Accuracy computation
# True if hypothesis>0.5 else False
with torch.no_grad():
    hypothesis = model(X)
    predicted = (hypothesis > 0.5).float()
    accuracy = (predicted == Y).float().mean()
    print('\nHypothesis: ', hypothesis.detach().cpu().numpy(), '\nCorrect: ', predicted.detach().cpu().numpy(), '\nAccuracy: ', accuracy.item())

 

 

<๋Œ€์ถฉ ์ •๋ฆฌํ•˜๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ํ•™์Šต ๊ณผ์ •>

  1. ํ•™์Šต์— ์ ํ•ฉํ•œ ์‹์„ ๊ฐ€์ง„ ๋ ˆ์ด์–ด ์ƒ์„ฑ( n๊ฐœ)
  2. model์ •์˜ (์–ด๋–ค์ˆœ์„œ๋กœ ๋Œ๋ฆฌ๋Š”์ง€? ๋ ˆ์ด์–ด ์ˆœ์„œ? + ํ™œ์„ฑํ™”ํ•จ์ˆ˜)
  3. ๋น„์šฉํ•จ์ˆ˜์™€ ์ตœ์ ํ™” ํ•จ์ˆ˜ ์ •์˜
  4. ๋Œ๋ฆฐ๋‹ค (ํฌ์›Œ๋“œ + ๋ฐฑ์›Œ๋“œ)

 


<Reference>

https://deeplearningzerotoall.github.io/season2/lec_pytorch.html

https://wikidocs.net/60680

https://ko.wikipedia.org/wiki/%ED%8D%BC%EC%85%89%ED%8A%B8%EB%A1%A0

https://wikidocs.net/61010