• 04-2. Loading Data(Mini batch and data load)

    2020. 2. 24.

    by. ํ•ด๋Š”์„ 

    ๋ณธ ๊ธ€์€ '๋ชจ๋‘๋ฅผ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ์‹œ์ฆŒ 2'์™€ 'pytorch๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ ๋Ÿฌ๋‹ ์ž…๋ฌธ'์„ ๋ณด๋ฉฐ ๊ณต๋ถ€ํ•œ ๋‚ด์šฉ์„ ์ •๋ฆฌํ•œ ๊ธ€์ž…๋‹ˆ๋‹ค.

    ํ•„์ž์˜ ์˜๊ฒฌ์ด ์„ž์—ฌ ๋“ค์–ด๊ฐ€ ๋ถ€์ •ํ™•ํ•œ ๋‚ด์šฉ์ด ์กด์žฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


    ๋ณต์žกํ•œ ๋ชจ๋ธ๋“ค์€ ์—„์ฒญ๋‚œ ์–‘์˜ ๋ฐ์ดํ„ฐ๊ฐ€ ํ•„์š”ํ•˜๋‹ค!

    ⇒ ๊ทธ๋Ÿฐ๋ฐ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์ด์œ (์‹œ๊ฐ„ ๋“ฑ๋“ฑ) ๋•Œ๋ฌธ์— ํ•œ๋ฒˆ์— ํ•™์Šตํ•˜๋Š”๊ฒŒ ๋ถˆ๊ฐ€๋Šฅํ•˜๋‹ค.

    ⇒ ๊ทธ๋ ‡๋‹ค๋ฉด ์ผ๋ถ€์˜ ๋ฐ์ดํ„ฐ๋กœ๋งŒ ํ•™์Šตํ•œ๋‹ค๋ฉด ๋น ๋ฅด๊ฒŒ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์ง€ ์•Š์„๊นŒ?

    1. Mini Batch and batch size

    ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ : ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ๋” ์ž‘์€ ๋‹จ์œ„๋กœ ๋‚˜๋ˆ ์–ด ํ•ด๋‹น ๋‹จ์œ„๋กœ ํ•™์Šตํ•˜๋Š” ๊ฐœ๋…์—์„œ, ๋‹จ์œ„!

     

     ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ํ•™์Šต์„ ํ•˜๊ฒŒ ๋œ๋‹ค๋ฉด, ํ•œ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜๋งŒํผ๋งŒ ๊ฐ€์ ธ๊ฐ€์„œ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•œ๋‹ค, ๊ทธ๋ฆฌ๊ณ  ๊ทธ ๋‹ค์Œ ๋ฏธ๋‹ˆ๋ฐฐ์น˜๋ฅผ ๊ฐ€์ ธ๊ฐ€์„œ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๊ณ ....

    ์ด๋ ‡๊ฒŒ ์ „์ฒด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํ•™์Šต์ด 1ํšŒ ๋๋‚˜๋ฉด 1 Epoch๊ฐ€ ๋๋‚˜๊ฒŒ ๋œ๋‹ค.

     

     ๋ฏธ๋‹ˆ ๋ฐฐ์น˜์˜ ํฌ๊ธฐ๋ฅผ ๋ฐฐ์น˜ ํฌ๊ธฐ(batch size)๋ผ๊ณ  ํ•œ๋‹ค.

     

     ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ๋‹จ์œ„๋กœ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ '๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•'์ด๋ผ๊ณ  ํ•˜๋ฉฐ, ์ผ๋ฐ˜ ๊ฒฝ์‚ฌ ํ•˜๊ฐ•๋ฒ•์ด ์ตœ์ ๊ฐ’์— ์ˆ˜๋ ดํ•˜๋Š ๊ณผ์ •์ด ๋งค์šฐ ์•ˆ์ •์ ์ด์ง€๋งŒ, ๊ณ„์‚ฐ๋Ÿ‰์ด ๋งŽ์ด ๋“œ๋Š” ๋ฐ˜๋ฉด ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์€ ํ›ˆ๋ จ ์†๋„๊ฐ€ ๋น ๋ฅด๋‹ค.

     

     ๋ฐฐ์น˜ ํฌ๊ธฐ๋Š” ์ฃผ๋กœ 2์˜ ์ œ๊ณฑ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. (๋ฐ์ดํ„ฐ ์†ก์ˆ˜์‹ ์˜ ํšจ์œจ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด)

     

     ์ „์ฒด ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐฐ์น˜ ํฌ๊ธฐ๋กœ ๋‚˜๋ˆˆ ๊ฐ’์„ ์ดํ„ฐ๋ ˆ์ด์…˜(Iteration)์ด๋ผ๊ณ  ํ•˜๋Š”๋ฐ, ์ด๋Š” ํ•œ๋ฒˆ์˜ ์—ํฌํฌ ๋‚ด์—์„œ ์ด๋ค„์ง€๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜ W์™€ b์˜ ์—…๋ฐ์ดํŠธ ํšŸ์ˆ˜์ด๋‹ค.(ํ•™์Šต์ด ์ผ์–ด๋‚˜๋Š” ํšŸ์ˆ˜)

     

    2. Data Load

    pytorch์—๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๊ธฐ ์‰ฝ๊ฒŒ ํ•ด์ฃผ๋Š” ์œ ์šฉํ•œ ๋„๊ตฌ๋“ค์ด ๋งŽ์€๋ฐ, ๊ทธ ์ค‘ ๋ฐ์ดํ„ฐ์…‹(Dataset)๊ณผ ๋ฐ์ดํ„ฐ๋กœ๋”(DataLoader)๋ฅผ ์•Œ์•„๋ณด์ž.

    ์ด๋ฅผ ์ด์šฉํ•˜๋ฉด ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ํ•™์Šต, ๋ฐ์ดํ„ฐ ์…”ํ”Œ, ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ๊นŒ์ง€ ๊ฐ„๋‹จํžˆ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.

     

    ๊ธฐ๋ณธ์ ์ธ ์‚ฌ์šฉ ๋ฐฉ๋ฒ•์€ Dataset์„ ์ •์˜ํ•˜๊ณ , ์ด๋ฅผ DataLoader์— ์ „๋‹ฌํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

     

    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    
    from torch.utils.data import TensorDataset # ํ…์„œ๋ฐ์ดํ„ฐ์…‹
    from torch.utils.data import DataLoader # ๋ฐ์ดํ„ฐ๋กœ๋”
    
    x_train  =  torch.FloatTensor([[73,  80,  75], 
                                   [93,  88,  93], 
                                   [89,  91,  90], 
                                   [96,  98,  100],   
                                   [73,  66,  70]])  
    y_train  =  torch.FloatTensor([[152],  [185],  [180],  [196],  [142]])
    
    #dataset์œผ๋กœ ์ €์žฅ
    dataset = TensorDataset(x_train, y_train)

     

    dataset์„ ๋งŒ๋“ค์—ˆ๋‹ค๋ฉด dataloader๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ๋ฐ์ดํ„ฐ ๋กœ๋”๋Š” ๋ฐ์ดํ„ฐ ์…‹, ๋ฏธ๋‹ˆ ๋ฐฐ์น˜์˜ ํฌ๊ธฐ๋ฅผ ๊ธฐ๋ณธ ์ธ์ž๋กœ ์ž…๋ ฅ๋ฐ›๋Š”๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ๋Š” shuffle์ด ์žˆ๋Š”๋ฐ, True๋ฅผ ์„ ํƒํ•˜๋ฉด epoch๋งˆ๋‹ค ๋ฐ์ดํ„ฐ ์…‹์„ ์„ž์–ด์ค€๋‹ค. ์ˆœ์„œ์— ์ต์ˆ™ํ•ด ์ง€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•จ์ด๋‹ค.

     

    #๋ฐ์ดํ„ฐ ๋กœ๋” ์„ค์ •
    dataloader = DataLoader(dataset, batch_size=2, shuffle=True)
    
    model = nn.Linear(3,1)
    optimizer = torch.optim.SGD(model.parameters(), lr=1e-5) 
    
    nb_epochs = 20
    for epoch in range(nb_epochs + 1):
      for batch_idx, samples in enumerate(dataloader):
      
      	#dataloader์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋บ„๋•Œ, x์™€ y๊ฐ€ ๋ฌถ์—ฌ์„œ ๋‚˜์˜จ๋‹ค.
        x_train, y_train = samples
        
        # H(x) ๊ณ„์‚ฐ
        prediction = model(x_train)
    
        # cost ๊ณ„์‚ฐ
        cost = F.mse_loss(prediction, y_train)
    
        # cost๋กœ H(x) ๊ณ„์‚ฐ
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()
    
        print('Epoch {:4d}/{} Batch {}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, batch_idx+1, len(dataloader),
            cost.item()
            ))

    3. Custom Dataset

    torch.utils.data.Dataset์„ ์ƒ์†๋ฐ›์•„ ์ง์ ‘ ์ปค์Šคํ…€ ๋ฐ์ดํ„ฐ์…‹์„ ๋งŒ๋“œ๋Š” ๊ฒฝ์šฐ๋„ ์žˆ๋‹ค.

    torch.utils.data.Dataset์€ ํŒŒ์ดํ† ์น˜์—์„œ ๋ฐ์ดํ„ฐ์…‹์„ ์ œ๊ณตํ•˜๋Š” ์ถ”์ƒ ํด๋ž˜์Šค์ด๋‹ค.

    class CustomDataset(torch.utils.data.Dataset): 
      def __init__(self):
      #๋ฐ์ดํ„ฐ์…‹์˜ ์ „์ฒ˜๋ฆฌ๋ฅผ ํ•ด์ฃผ๋Š” ๋ถ€๋ถ„
    
      def __len__(self):
      #๋ฐ์ดํ„ฐ์…‹์˜ ๊ธธ์ด. ์ฆ‰, ์ด ์ƒ˜ํ”Œ์˜ ์ˆ˜๋ฅผ ์ ์–ด์ฃผ๋Š” ๋ถ€๋ถ„
    
      def __getitem__(self, idx): 
      #๋ฐ์ดํ„ฐ์…‹์—์„œ ํŠน์ • 1๊ฐœ์˜ ์ƒ˜ํ”Œ์„ ๊ฐ€์ ธ์˜ค๋Š” ํ•จ์ˆ˜
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    import torch.optim as optim
    
    from torch.utils.data import Dataset
    
    class CustomDataset(Dataset): #์ปค์Šคํ…€ ๋ฐ์ดํ„ฐ ์…‹ ๋งŒ๋“ค๊ธฐ
        def __init__(self):
            self.x_data= [[73, 80, 75],
                          [93, 88, 93],
                          [89, 91, 90],
                          [96, 98, 100],
                          [73, 66, 70]]
            self.y_data = [[152], [185], [180], [196], [142]]
            
        def __len__(self):
            return len(self.x_data)
        
        def __getitem__(self, idx):
            x = torch.FloatTensor(self.x_data[idx])
            y = torch.FloatTensor(self.y_data[idx])
            
            return x, y
        
    dataset= CustomDataset() #๊ฐ์ฒด ์ƒ์„ฑ
    
    from torch.utils.data import DataLoader
    
    dataloader = DataLoader(
        dataset,
        batch_size = 2,
        shuffle=True, #๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ ๋งˆ๋‹ค ๋ฐ์ดํ„ฐ ์„ž๊ธฐ(์ˆœ์„œ์™ธ์šฐ๊ธฐ ๋ชปํ•˜๊ฒŒ ํ•˜๋ ค๊ณ )
    )
    
    #---------------------------------------------
    class MultivariateLinearRegressionModel(nn.Module):
        def __init__(self):
            super().__init__()
            self.linear = nn.Linear(3, 1)
    
        def forward(self, x):
            return self.linear(x)
        
    
    # ๋ฐ์ดํ„ฐ
    x_train = torch.FloatTensor([[73, 80, 75],
                                 [93, 88, 93],
                                 [89, 91, 90],
                                 [96, 98, 100],
                                 [73, 66, 70]])
    y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])
    # ๋ชจ๋ธ ์ดˆ๊ธฐํ™”
    model = MultivariateLinearRegressionModel()
    # optimizer ์„ค์ •
    optimizer = optim.SGD(model.parameters(), lr=1e-5)    
    
    
    nb_epochs = 20
    for epoch in range(nb_epochs+1):
        for batch_idx, samples in enumerate(dataloader):
            x_train, y_train = samples
        # H(x) ๊ณ„์‚ฐ
            prediction = model(x_train)
        
        # cost ๊ณ„์‚ฐ
            cost = F.mse_loss(prediction, y_train)
        
        # cost๋กœ H(x) ๊ฐœ์„ 
            optimizer.zero_grad()
            cost.backward()
            optimizer.step()
        
        # 20๋ฒˆ๋งˆ๋‹ค ๋กœ๊ทธ ์ถœ๋ ฅ
            print('Epoch {:4d}/{} Batch {}/{} Cost: {:.6f}'.format(
                epoch, nb_epochs, batch_idx+1, len(dataloader), cost.item()
            ))#์ดํ„ฐ๋ ˆ์ด์…˜์€ 3! (๊ฐ€์ค‘์น˜์™€ ํŽธํ–ฅ์˜ ์—…๋ฐ์ดํŠธ ํšŸ์ˆ˜ per 1์—ํฌํฌ)

     


    <Reference>

    https://deeplearningzerotoall.github.io/season2/lec_pytorch.html

    https://wikidocs.net/55580

    https://wikidocs.net/57165

    '๐Ÿ“šSTUDY > ๐Ÿ”ฅPytorch ML&DL' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

    06. softmax classification  (0) 2020.02.28
    05. Logistic Regression  (0) 2020.02.28
    04-1. Multivariable Linear regression  (0) 2020.02.24
    03. Deeper Look at Gradient Descent  (0) 2020.02.24
    02. Linear Regression  (0) 2020.02.21

    ๋Œ“๊ธ€