๐Ÿ“šSTUDY/๐Ÿ”ฅPytorch ML&DL

07. Tips and MNIST data

ํ•ด๋Š”์„  2020. 3. 1. 14:55

๋ณธ ๊ธ€์€ '๋ชจ๋‘๋ฅผ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ์‹œ์ฆŒ 2'์™€ 'pytorch๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ์ž…๋ฌธ'์„ ๋ณด๋ฉฐ ๊ณต๋ถ€ํ•œ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค.

ํ•„์ž์˜ ์˜๊ฒฌ์ด ์„ž์—ฌ ๋“ค์–ด๊ฐ€ ๋ถ€์ •ํ™•ํ•œ ๋‚ด์šฉ์ด ์กด์žฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


1. ํ•™์Šต๋ฅ  (Learning late)

gradient descent๋ฅผ ํ•  ๋•Œ, ๋ฐœ์ž๊ตญ์˜ ํฌ๊ธฐ๋ฅผ ์ •ํ•˜๋Š” ์ผ! ์ฆ‰, ์–ผ๋งˆ๋‚˜ ์ด๋™ํ• ์ง€ ์ •ํ•œ๋‹ค.

์ฒ˜์Œ์— ํ•œ 0.1 ์ •๋„์—์„œ ์‹œ์ž‘ํ•˜๋‹ค๊ฐ€ ๋„ˆ๋ฌด ๋ฐœ์‚ฐํ•˜๋ฉด ์ค„์ด๊ณ , ๋„ˆ๋ฌด ์ง„์ „์ด ์—†์œผ๋ฉด(๋ณ€ํ™”๊ฐ€ ์—†์œผ๋ฉด) ๋Š˜๋ฆฌ๋Š” ์‹์œผ๋กœ ์ตœ์ ์˜ ํ•™์Šต๋ฅ ์„ ์ฐพ์•„๊ฐ€์•ผ ํ•œ๋‹ค.

 

2. Data Preprocessing

๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ.

๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ ํŽธ์ฐจ๊ฐ€ ๋„ˆ๋ฌด ํฌ๋ฉด, ํฐ ๊ฐ’์—๋งŒ ์ง‘์ค‘ํ•˜๊ฒŒ ๋˜์–ด์„œ ํ•™์Šต์ด ์ž˜ ๋˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ๋‹ค.

์ •๊ทœ๋ถ„ํฌํ™”(standardization)์„ ํ†ตํ•ด์„œ ๊ท ๋“ฑํ•˜๊ฒŒ ๋งŒ๋“ค์–ด ์ค˜์•ผ ํ•œ๋‹ค.

mu = x_train.mean(dim=0)
sigma = x_train.std(dim=0)
norm_x_train = (x_train - mu) / sigma

์ž๋งคํ’ˆ์œผ๋กœ ๊ฐ€์ค‘์น˜ ์ „์ฒ˜๋ฆฌ๋„ ์žˆ๋‹ค. 09-2์— ๋‚˜์˜จ๋‹ค.

 

3. MNIST

๋ฐฉ๋Œ€ํ•œ ์–‘์˜ ์†๊ธ€์”จ ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ์•„๋†“์€ ๋ฐ์ดํ„ฐ Set!

 

1) torch vision

์œ ๋ช…ํ•œ ๋ฐ์ดํ„ฐ ์…‹, ๋ชจ๋ธ ์•„ํ‚คํ…์ณ, ํŠธ๋žœ์Šค ํผ ๋“ฑ๋“ฑ์„ ์ œ๊ณตํ•œ๋‹ค.

 

2) introduction

  • 28 x 28 image => 784 (input์˜ ๊ฐœ์ˆ˜)
  • 1 channel gray image (์ƒ‰์ƒ์ด ํ‘๋ฐฑ์ด๋‹ค = 1์ฑ„๋„์ด๋‹ค)
  • 0~9  digits (0๋ถ€ํ„ฐ 9๊นŒ์ง€์˜ ์ˆซ์ž๊ฐ€ ์žˆ๋‹ค, output์˜ ๊ฐœ์ˆ˜)

3) Reading data

import torchvision.datasets as dsets

...

mnist_train = dsets.MNIST(root="MNIST_data/", train=True, transform=transforms.ToTensor(), download=True)
mnist_test = dsets.MNIST(root="MNIST_data/", train=False, transform=transforms.ToTensor(), download=True)

dsets.MNIST

  • root = mnist dataset์ด ์žˆ๋Š”(์žˆ์„) ์žฅ์†Œ - ํŒŒ์ผ ์œ„์น˜
  • train = T๋ฉด ํ•™์Šต ๋ฐ์ดํ„ฐ, F๋ฉด ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ ์‚ฌ์šฉ
  • transform = ๋ฐ์ดํ„ฐ์˜ ํ˜•ํƒœ๋ฅผ ๊ฒฐ์ •. ToTenser()๋ฅผ ํ•˜๋ฉด pytorch์—์„œ ์“ฐ๋Š” ํ˜•ํƒœ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”๊ฟ”์คŒ.
  • download = T๋ฉด ํŒŒ์ผ ๊ฒฝ๋กœ์— ๋ฐ์ดํ„ฐ ์…‹์ด ์—†์„ ๊ฒฝ์šฐ, ๋‹ค์šด๋กœ๋“œ๋ฅผ ๋ฐ›๊ฒ ๋‹ค๊ณ  ๋ช…์‹œ.
data_loader = torch.utils.DataLoader(DataLoader=mnist_train, batch_size=batch_size, shuffle=True, drop_last=True)

Data loader : mini batch๋ฅผ ๋งŒ๋“ค ๋•Œ ์‚ฌ์šฉ.

  • DataLoader = ์ด ๋ฐ์ด์„œ ์…‹์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๊ฒ ๋‹ค.
  • batch_size = ์–ผ๋งˆ์˜ ํฌ๊ธฐ๋กœ ์ด๋“ค์„ ๋‚˜๋ˆŒ ๊ฒƒ์ด๋‹ค.
  • shuffle = ํ•™์Šตํ•  ๋•Œ ๋งˆ๋‹ค ๋ฐ์ดํ„ฐ๋ฅผ ์„ž์–ด ์ฃผ๊ฒ ๋‹ค.
  • drop_last = T๋ผ๋ฉด, ๋งˆ์ง€๋ง‰ ๋ฐฐ์น˜๊ฐ€ ๋”ฑ ์•ˆ๋–จ์–ด์ง€๊ณ  ์• ๋งคํ•˜๊ฒŒ ๋‚จ์„ ๋•Œ ๊ทธ ๋ฐฐ์น˜๋ฅผ ๋ฒ„๋ฆฐ๋‹ค. (๋‹ค๋ฅธ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜๋ณด๋‹ค ๊ฐœ์ˆ˜๊ฐ€ ์ ์€ ๋งˆ์ง€๋ง‰ ๋ฐฐ์น˜๊ฐ€ ์ƒ๋Œ€์ ์œผ๋กœ ๊ณผ๋Œ€ ํ‰๊ฐ€๋˜๋Š” ํ˜„์ƒ์„ ๋ฐฉ์ง€)
for epoch in range(training_epochs):
...
	for x, y in data_loader:
		X = X.view(-1, 28*28).to(device)

x์—๋Š” mnist image, y์—๋Š” label(0~9)์ด ๋“ค์–ด๊ฐ.

view๋ฅผ ์ด์šฉํ•ด์„œ ๋ฐ์ดํ„ฐ ํ˜•์‹์„ ๋ฐ”๊ฟ”์ค€๋‹ค. (B, 1, 28, 28) -> (B, 784)

 

4) etc...

  • Epoch : ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ ์…‹ ์ „์ฒด๊ฐ€ ํŠธ๋ ˆ์ด๋‹์— ๋ชจ๋‘ ์‚ฌ์šฉ์ด ๋œ๋‹ค => ํ•œ ์—ํญ์„ ๋Œ์•˜๋‹ค.
  • batch size : ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ์–ผ๋งˆ์˜ ํฌ๊ธฐ๋กœ ์ž๋ฅด๋Š๋ƒ.
  • Iteration : ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ๋งŒํผ ๋‚˜๋ˆ„๋ฉด ๋‚˜์˜ค๋Š” ๋ฉ์–ด๋ฆฌ์˜ ๊ฐœ์ˆ˜. ๊ทธ ๊ฐœ์ˆ˜์˜ Iteration์„ ๋Œ์•˜๋‹ค -> 1 epoch

5) softmax train

 

import torch
import torchvision.datasets as dsets
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import random

device = 'cuda' if torch.cuda.is_available() else 'cpu'

# for reproducibility
random.seed(777)
torch.manual_seed(777)
if device == 'cuda':
    torch.cuda.manual_seed_all(777)

# hyperparameters
training_epochs = 15
batch_size = 100

# MNIST dataset
mnist_train = dsets.MNIST(root='MNIST_data/',
                          train=True,
                          transform=transforms.ToTensor(),
                          download=True)

mnist_test = dsets.MNIST(root='MNIST_data/',
                         train=False,
                         transform=transforms.ToTensor(),
                         download=True)

data_loader = torch.utils.data.DataLoader(dataset=mnist_train,
                                          batch_size=batch_size,
                                          shuffle=True,
                                          drop_last=True)




# MNIST data image of shape 28 * 28 = 784 (input), output => 0~9 (10๊ฐœ)

linear = torch.nn.Linear(784, 10, bias=True).to(device)
# to() = ์—ฐ์‚ฐ์„ ์–ด๋””์„œ ์ˆ˜ํ–‰ํ• ์ง€ ์ •ํ•จ.(๊ธฐ๋ณธ์€ CPU)
#bias - ํŽธํ–ฅ์„ ์‚ฌ์šฉํ• ๊ฑด์ง€ (๊ธฐ๋ณธ๊ฐ’์€ T)

# define cost/loss & optimizer
criterion = torch.nn.CrossEntropyLoss().to(device)    # Softmax is internally computed.
optimizer = torch.optim.SGD(linear.parameters(), lr=0.1)

for epoch in range(training_epochs): # ์•ž์„œ training_epochs์˜ ๊ฐ’์€ 15๋กœ ์ง€์ •ํ•จ.
    avg_cost = 0
    total_batch = len(data_loader)

    for X, Y in data_loader: #Iteration ๋Œ๋ ค์ฃผ๊ธฐ (์ด๊ฒŒ ๋‹ค ๋Œ๋ฉด 1epoch)
        # ๋ฐฐ์น˜ ํฌ๊ธฐ๊ฐ€ 100์ด๋ฏ€๋กœ ์•„๋ž˜์˜ ์—ฐ์‚ฐ์—์„œ X๋Š” (100, 784)์˜ ํ…์„œ๊ฐ€ ๋œ๋‹ค.
        X = X.view(-1, 28 * 28).to(device)
        # ๋ ˆ์ด๋ธ”์€ ์›-ํ•ซ ์ธ์ฝ”๋”ฉ์ด ๋œ ์ƒํƒœ๊ฐ€ ์•„๋‹ˆ๋ผ 0 ~ 9์˜ ์ •์ˆ˜.
        Y = Y.to(device)

        optimizer.zero_grad()
        hypothesis = linear(X)
        cost = criterion(hypothesis, Y)
        cost.backward()
        optimizer.step()

        avg_cost += cost / total_batch

    print('Epoch:', '%04d' % (epoch + 1), 'cost =', '{:.9f}'.format(avg_cost))

print('Learning finished')



# Test the model using test sets
with torch.no_grad():
    X_test = mnist_test.test_data.view(-1, 28 * 28).float().to(device)
    Y_test = mnist_test.test_labels.to(device)

    prediction = linear(X_test)
    correct_prediction = torch.argmax(prediction, 1) == Y_test
    accuracy = correct_prediction.float().mean()
    print('Accuracy:', accuracy.item())

    # Get one and predict
    r = random.randint(0, len(mnist_test) - 1)
    X_single_data = mnist_test.test_data[r:r + 1].view(-1, 28 * 28).float().to(device)
    Y_single_data = mnist_test.test_labels[r:r + 1].to(device)

    print('Label: ', Y_single_data.item())
    single_prediction = linear(X_single_data)
    print('Prediction: ', torch.argmax(single_prediction, 1).item())

    plt.imshow(mnist_test.test_data[r:r + 1].view(28, 28), cmap='Greys', interpolation='nearest')
    plt.show()

<Reference>

https://deeplearningzerotoall.github.io/season2/lec_pytorch.html

https://wikidocs.net/60324