• 08. Perceptron

    2020. 3. 3.

    by. ํ•ด๋Š”์„ 

    ๋ณธ ๊ธ€์€ '๋ชจ๋‘๋ฅผ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ์‹œ์ฆŒ 2'์™€ 'pytorch๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ์ž…๋ฌธ'์„ ๋ณด๋ฉฐ ๊ณต๋ถ€ํ•œ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค.

    ํ•„์ž์˜ ์˜๊ฒฌ์ด ์„ž์—ฌ ๋“ค์–ด๊ฐ€ ๋ถ€์ •ํ™•ํ•œ ๋‚ด์šฉ์ด ์กด์žฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


    0. ํผ์…‰ํŠธ๋ก  (Perceptron)

    ๋‹ค์ˆ˜์˜ ์ž…๋ ฅ์œผ๋กœ๋ถ€ํ„ฐ ํ•˜๋‚˜์˜ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด๋ณด๋‚ด๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜. ๋‰ด๋Ÿฐ์˜ ๋™์ž‘ ๋ฐฉ์‹๊ณผ ๋งค์šฐ ์œ ์‚ฌํ•˜๋‹ค.

     

    ๊ฐ๊ฐ์˜ ์ž…๋ ฅ ๊ฐ’์— ๊ฐ€์ค‘์น˜์— ๊ณฑํ•ด์„œ y์— ์ „๋‹ฌ ๋œ๋‹ค. ์ด๋•Œ, ์ „๋‹ฌ๋œ ๊ฐ’์ด ์ž„๊ณ„์น˜๋ฅผ ๋…ธ๋“œ๊ฐ€ ํ™œ์„ฑํ™”๋˜๊ณ , ์•„๋‹ˆ๋ผ๋ฉด ํ™œ์„ฑํ™” ๋˜์ง€ ์•Š๋Š”๋‹ค. ์ด๋ ‡๊ฒŒ ํ™œ์„ฑํ™”๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ํ•จ์ˆ˜๋ฅผ ํ™œ์„ฑํ™” ํ•จ์ˆ˜(active function)๋ผ๊ณ  ํ•˜๊ณ , ๊ทธ ์ข…๋ฅ˜์—๋Š” ๊ณ„๋‹จํ•จ์ˆ˜, ์‹œ๊ทธ๋ชจ์ด๋“œ, ReLU ๋“ฑ์ด ์žˆ๋‹ค.

     

    1. ๋‹จ์ธต ํผ์…‰ํŠธ๋ก 

    ๋‹จ์ธต ํผ์…‰ํŠธ๋ก ์€ ๊ฐ’์„ ๋ณด๋‚ด๋Š” ๋‹จ๊ณ„๊ณผ ๊ฐ’์„ ๋ฐ›์•„์„œ ์ถœ๋ ฅํ•˜๋Š” ๋‘ ๋‹จ๊ณ„๋กœ๋งŒ ์ด๋ฃจ์–ด์กŒ๋‹ค. ์ฆ‰ ์ž…๋ ฅ์ธต๊ณผ ๊ฒฐ๊ณผ์ธต์œผ๋กœ๋งŒ ์ด๋ค„์ง„๋‹ค. ์ธต์ด ํ•˜๋‚˜๋ผ ์„ ํ˜• ์˜์—ญ์— ๋Œ€ํ•ด์„œ๋งŒ ๋ถ„๋ฅ˜๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค. ๊ทธ๋ž˜์„œ ๊ฐ€๋Šฅํ•œ ์—ฐ์‚ฐ์€ AND์—ฐ์‚ฐ, OR์—ฐ์‚ฐ, NAND์—ฐ์‚ฐ์ด ์žˆ๋‹ค. XOR ์—ฐ์‚ฐ์€ ๋ถˆ๊ฐ€๋Šฅ ํ•œ๋ฐ, ์•„๋ž˜์˜ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๋ฉด ๋” ์‰ฝ๊ฒŒ ์ดํ•ด๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.

    AND GATE
    OR, NAND GATE

    and, or, nand gate๋ฅผ ๊ทธ๋ ค๋ณด๋ฉด, ๋ชจ๋‘ ์ง์„  ํ•˜๋‚˜๋กœ ๊ฒฝ์šฐ๋“ค์„ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋‹ค.

     

    XOR GATE

    ๊ทธ๋Ÿฌ๋‚˜, XOR gate๋Š” ์ง์„  ํ•˜๋‚˜๋กœ๋Š” ๋ถ„๋ฅ˜๋ฅผ ํ•  ์ˆ˜๊ฐ€ ์—†๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ์–ด๋–ป๊ฒŒ ํ•ด์•ผ ํ• ๊นŒ? ์ง์„ ์„ ๊ณก์„ ์œผ๋กœ ๋งŒ๋“ค๋ฉด ๋ ๊นŒ?

     

    ๊ณก์„ ์œผ๋กœ ํ•˜๋‹ˆ, ๋ถ„๋ฅ˜๊ฐ€ ๋œ๋‹ค!

     

    2. ๋‹ค์ธต ํผ์…‰ํŠธ๋ก 

    ๋‹ค์ธต ํผ์…‰ํŠธ๋ก ์€ ์ž…๋ ฅ์ธต, ์ถœ๋ ฅ์ธต ์™ธ์— '์€๋‹‰์ธต(hidden layer)'์ด๋ผ๋Š” ๋˜ ๋‹ค๋ฅธ ์ธต์ด ์žˆ๋Š” ํผ์…‰ํŠธ๋ก ์„ ์˜๋ฏธํ•œ๋‹ค. ๋‹ค์ธต ํผ์…‰ํŠธ๋ก ์—์„œ๋Š” XOR์—ฐ์‚ฐ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. Multylayer perceptron์ด๋ผ๊ณ ๋„ ํ•˜๋ฉฐ, ์ค„์—ฌ์„œ MLP ๋ผ๊ณ ๋„ ๋ถ€๋ฅธ๋‹ค.

     

     

    ์ €๋ ‡๊ฒŒ ์€๋‹‰์ธต์ด 2๊ฐœ ์ด์ƒ์ธ ์‹ ๊ฒฝ๋ง์„ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง(Deep Neural Network, DNN)๋ผ๊ณ  ํ•œ๋‹ค.

     

    ๊ทธ๋Ÿฐ๋ฐ ๋ง์ž…๋‹ˆ๋‹ค, ๋‹จ์ผ ํผ์…‰ํŠธ๋ก ์˜ ๊ฒฝ์šฐ ์˜ค์ฐจ๋ฅผ ๋ฐ”๋กœ๋ฐ”๋กœ ๊ฐœ์„ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค. (h(x) = xw + b ์—์„œ w๊ฐ™์€ ๊ฒฝ์šฐ, ๋ฐ”๋กœ๋ฐ”๋กœ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์„ ์ ์šฉํ–ˆ๋‹ค) ๊ทธ๋Ÿฐ๋ฐ ๋‹ค์ธต ํผ์…‰ํŠธ๋ก ์˜ ๊ฒฝ์šฐ, ๊ฐ๊ฐ์˜ ๊ฐ€์ค‘์น˜์˜ ์˜ค์ฐจ๋ฅผ ์–ด๋–ป๊ฒŒ ๊ฐœ์„ ํ•  ์ˆ˜ ์žˆ์„๊นŒ?

     

    ์ด์ œ, ์ธ๊ณต ์‹ ๊ฒฝ๋ง์ด ์ˆœ์ „ํŒŒ ๊ณผ์ •์„ ์ง„ํ–‰ํ•˜์—ฌ ์˜ˆ์ธก๊ฐ’๊ณผ ์‹ค์ œ๊ฐ’์˜ ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•˜์˜€์„ ๋•Œ ์–ด๋–ป๊ฒŒ ์—ญ์ „ํŒŒ ๊ณผ์ •์—์„œ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•˜๋Š”์ง€ ์•Œ์•„๋ณด์ž. (์ธ๊ณต ์‹ ๊ฒฝ๋ง์˜ ํ•™์Šต์€ ์˜ค์ฐจ๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฐ€์ค‘์น˜๋ฅผ ์ฐพ๋Š” ๋ชฉ์ ์œผ๋กœ ์ˆœ์ „ํŒŒ์™€ ์—ญ์ „ํŒŒ๋ฅผ ๋ฐ˜๋ณตํ•˜๋Š” ๊ฒƒ์„ ๋งํ•œ๋‹ค.)

     

    3. ์ˆœ์ „ํŒŒ (Forward propagation)

    ์ž…๋ ฅ์ธต -> ์€๋‹‰์ธต -> ์ถœ๋ ฅ์ธต ๋ฐฉํ–ฅ์œผ๋กœ ํ–ฅํ•˜๋ฉด์„œ, ๊ฐ’์„ ๋‚ด์–ด๋†“๋Š” ํ–‰์œ„. ์ด ๊ณผ์ •์œผ๋กœ ์•Œ์•„๋‚ธ ์˜ˆ์ธก๊ฐ’์œผ๋กœ ์‹ค์ œ๊ฐ’๊ณผ์˜ ์˜ค์ฐจ๋ฅผ ๊ณ„์‚ฐํ•œ๋‹ค.

     

    4. ์—ญ์ „ํŒŒ (Backpropagation)

    ์ถœ๋ ฅ์ธต -> ์€๋‹‰์ธต -> ์ž…๋ ฅ์ธต ๋ฐฉํ–ฅ์œผ๋กœ ํ–ฅํ•˜๋ฉด์„œ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธํ•œ๋‹ค. ์ˆœ์ „ํŒŒ๋ฅผ ํ†ตํ•ด ์–ป์€ ์˜ค์ฐจ๋ฅผ ์ด์šฉํ•ด์„œ ๊ฐ€์ค‘์น˜๋ฅผ ์—…๋ฐ์ดํŠธ ํ•˜๊ณ  ์˜ค์ฐจ๋ฅผ ์ค„์—ฌ๋‚˜๊ฐ„๋‹ค. ๊ณ„์‚ฐ์—๋Š” ๋ฏธ๋ถ„์˜ ์—ฐ์‡„๋ฒ•์น™์„ ์ด์šฉํ•œ๋‹ค.

     

    ์ฆ‰, ๊ฐ๊ฐ์˜ ๊ฐ€์ค‘์น˜๋“ค์ด ๊ฒฐ๊ณผ๊ฐ’์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๋น„์œจ(๋ฏธ๋ถ„๊ฐ’)์„ ๊ตฌํ•œ ๋’ค, ์˜ค์ฐจ๋ฅผ ์ค„์—ฌ๋‚˜๊ฐˆ ์ˆ˜ ์žˆ๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ๋นผ ์ฃผ๋Š” ๊ฒƒ์ด๋‹ค. 

     

    ๊ฒฐ๊ตญ ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•œ ๋น„์šฉํ•จ์ˆ˜์˜ ๋ณ€ํ™”์œจ = ๊ฐ€์ค‘์น˜์— ๋Œ€ํ•œ ๊ฐ€์„ค์˜ ๋ณ€ํ™”์œจ x ๊ฐ€์„ค์— ๋Œ€ํ•œ ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ๋ณ€ํ™”์œจ x ํ™œ์„ฑํ™” ํ•จ์ˆ˜์— ๋Œ€ํ•œ ๋น„์šฉํ•จ์ˆ˜์˜ ๋ณ€ํ™”์œจ์ด๋‹ค. ์—ฌ๊ธฐ์— ๋ฏธ๋ถ„์˜ ์—ฐ์‡„๋ฒ•์น™์ด ์ด์šฉ๋œ๋‹ค. (A์— ๋Œ€ํ•œ B ์˜ ๋ณ€ํ™”์œจ => B๋ถ„์˜ A)

     

    5.  XOR code (MLP)

    import torch
    
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    
    # for reproducibility
    torch.manual_seed(777)
    if device == 'cuda':
        torch.cuda.manual_seed_all(777)
        
    X = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]]).to(device)
    Y = torch.FloatTensor([[0], [1], [1], [0]]).to(device)
    
    #๋ ˆ์ด์–ด ์„ ์–ธ (w์™€ b๋ฅผ ์ง์ ‘ ์„ค์ •)
    # nn.Linear๋ฅผ 2๊ฐœ ์‚ฌ์šฉํ•œ ๊ฒƒ๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€
    
    w1 = torch.Tensor(2, 2).to(device) # 2->2
    b1 = torch.Tensor(2).to(device)
    w2 = torch.Tensor(2, 1).to(device) #2->1
    b2 = torch.Tensor(1).to(device)
    
    ################################## (์ถ”๊ฐ€๋œ ๋ถ€๋ถ„)
    torch.nn.init.normal_(w1)  #์ •๊ทœ๋ถ„ํฌ์—์„œ ๊ฐ€์ ธ์˜จ ๊ฐ’์œผ๋กœ ํ…์„œ๋ฅผ ์ฑ„์šด๋‹ค.
    torch.nn.init.normal_(b1)
    torch.nn.init.normal_(w2)
    torch.nn.init.normal_(b2)
    ###################################
    
    def sigmoid(x):
        return 1.0/(1.0 + torch.exp(-x))
        
    def sigmoid_prime(x): #์‹œ๊ทธ๋ชจ์ด๋“œ์˜ ๋ฏธ๋ถ„.
        return sigmoid(x)*(1-sigmoid(x))
        
    learning_rate = 1
    
    for step in range(10001):
        #forward
        l1 = torch.add(torch.matmul(X, w1), b1) #linear๋Œ€๋กœ ๊ณ„์‚ฐ(Wx + b)   (4*2)
        a1 = sigmoid(l1) #ํ™œ์„ฑํ™”
        l2 = torch.add(torch.matmul(a1, w2), b2) #(4*1)
        y_pred = sigmoid(l2) #ํ™œ์„ฑํ™”
        
        #BCE Loss์‚ฌ์šฉ
        cost = -torch.mean(Y * torch.log(y_pred) + (1 - Y) * torch.log(1 - y_pred))
        
        
        #back prop (chain rule)
        d_y_pred = (y_pred - Y)/(y_pred * (1.0 - y_pred) + 1e-7) #bce๋ฅผ ๋ฏธ๋ถ„ํ•œ ์‹
        
        #์ถœ๋ ฅ์ธต -> ์€๋‹‰์ธต
        d_l2 = d_y_pred * sigmoid_prime(l2) #(4*1)
        d_b2 = d_l2
        d_w2 = torch.matmul(torch.transpose(a1, 0, 1), d_b2) #(2*1)
        #a๊ฐ€ 4*2๋ผ์„œ 2*4๋กœ ๋ฐ”๊ฟ”์„œ ํ–‰๋ ฌ๊ณฑ(2*4 x 4*1)์„ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฐ”๊ฟ”์คŒ.
        
        #์€๋‹‰์ธต -> ์ž…๋ ฅ์ธต
        d_a1 = torch.matmul(d_b2, torch.transpose(w2, 0, 1)) #(4*1) x (1*2) = (4*2)
        d_l1 = d_a1 * sigmoid_prime(l1) # 4*2
        d_b1 = d_l1 # 4*2
        d_w1 = torch.matmul(torch.transpose(X, 0, 1), d_b1) #(2*4) * (4*2) = (2*2)
        
        
        # weight update
        w1 = w1 - learning_rate * d_w1 #๋ฏธ๋ถ„๊ฐ’(๊ธฐ์šธ๊ธฐ)๋นผ์ฃผ๊ธฐ(๋ฐฉํ–ฅ์ด ์ค‘์š”!)
        b1 = b1 - learning_rate * torch.mean(d_b1, 0) 
        #ํŽธ์ฐจ๋Š” ์Šค์นผ๋ผ์ธ๋ฐ ๋ฏธ๋ถ„ํ•œ ํŽธ์ฐจ๊ฐ’์ด ๋ฒกํ„ฐ๋ผ์„œ ๋งž์ถฐ์ค˜์•ผํ•จ??
        w2 = w2 - learning_rate * d_w2
        b2 = b2 - learning_rate * torch.mean(d_b2, 0)
        
        if step%100==0:
            print(step, cost.item())
            
            
    l1 = torch.add(torch.matmul(X, w1), b1) #linear๋Œ€๋กœ ๊ณ„์‚ฐ(Wx + b)   (4*2)
    a1 = sigmoid(l1) #ํ™œ์„ฑํ™”
    l2 = torch.add(torch.matmul(a1, w2), b2) #(4*1)
    y_pred = sigmoid(l2) #ํ™œ์„ฑํ™”
    
    predicted = (y_pred > 0.5).float()
    accuracy = (predicted == Y).float().mean()
        
        
    print('Hypothesis:', y_pred.detach().cpu().numpy(), '\nCorrect: ', Y.detach().cpu().numpy(), '\nAccuracy: ', accuracy.item())

     

    5-1.  XOR code (MLP) with nn.Module

    import torch
    
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    
    # for reproducibility
    torch.manual_seed(777)
    if device == 'cuda':
        torch.cuda.manual_seed_all(777)
        
    X = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]]).to(device)
    Y = torch.FloatTensor([[0], [1], [1], [0]]).to(device)
    
    # nn layers
    linear1 = torch.nn.Linear(2, 10, bias=True)
    linear2 = torch.nn.Linear(10, 10, bias=True)
    linear3 = torch.nn.Linear(10, 10, bias=True)
    linear4 = torch.nn.Linear(10, 1, bias=True)
    sigmoid = torch.nn.Sigmoid()
    
    # model
    model = torch.nn.Sequential(linear1, sigmoid, linear2, sigmoid, linear3, sigmoid, linear4, sigmoid).to(device)
    
    # define cost/loss & optimizer
    criterion = torch.nn.BCELoss().to(device)
    optimizer = torch.optim.SGD(model.parameters(), lr=3)  # modified learning rate from 0.1 to 1
    
    for step in range(10001):
        optimizer.zero_grad()
        hypothesis = model(X)
    
        # cost/loss function
        cost = criterion(hypothesis, Y)
        cost.backward()
        optimizer.step()
    
        if step % 100 == 0:
            print(step, cost.item())
            
    # Accuracy computation
    # True if hypothesis>0.5 else False
    with torch.no_grad():
        hypothesis = model(X)
        predicted = (hypothesis > 0.5).float()
        accuracy = (predicted == Y).float().mean()
        print('\nHypothesis: ', hypothesis.detach().cpu().numpy(), '\nCorrect: ', predicted.detach().cpu().numpy(), '\nAccuracy: ', accuracy.item())

     

     

    <๋Œ€์ถฉ ์ •๋ฆฌํ•˜๋Š” ๋จธ์‹ ๋Ÿฌ๋‹ ํ•™์Šต ๊ณผ์ •>

    1. ํ•™์Šต์— ์ ํ•ฉํ•œ ์‹์„ ๊ฐ€์ง„ ๋ ˆ์ด์–ด ์ƒ์„ฑ( n๊ฐœ)
    2. model์ •์˜ (์–ด๋–ค์ˆœ์„œ๋กœ ๋Œ๋ฆฌ๋Š”์ง€? ๋ ˆ์ด์–ด ์ˆœ์„œ? + ํ™œ์„ฑํ™”ํ•จ์ˆ˜)
    3. ๋น„์šฉํ•จ์ˆ˜์™€ ์ตœ์ ํ™” ํ•จ์ˆ˜ ์ •์˜
    4. ๋Œ๋ฆฐ๋‹ค (ํฌ์›Œ๋“œ + ๋ฐฑ์›Œ๋“œ)

     


    <Reference>

    https://deeplearningzerotoall.github.io/season2/lec_pytorch.html

    https://wikidocs.net/60680

    https://ko.wikipedia.org/wiki/%ED%8D%BC%EC%85%89%ED%8A%B8%EB%A1%A0

    https://wikidocs.net/61010

     

     

    '๐Ÿ“šSTUDY > ๐Ÿ”ฅPytorch ML&DL' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

    09-2. Weight initialization  (0) 2020.03.10
    09-1. ํ™œ์„ฑํ™” ํ•จ์ˆ˜(Activation function)  (0) 2020.03.07
    07. Tips and MNIST data  (0) 2020.03.01
    06. softmax classification  (0) 2020.02.28
    05. Logistic Regression  (0) 2020.02.28

    ๋Œ“๊ธ€