莫烦pytorch学习笔记

news/2024/7/4 13:21:43
  • torch.dot()处理两个1-D tensor,结果为1*1的数, torch.mm(a, b)执行矩阵乘法, 还有 torch.normal(...), torch.zeros(...), x.pow(2)等等函数
  • torch.unsqueeze()增加一个维度,使 i * j tensor 变为  1 * i * j 或 i * j * 1 等,可以指定dim
  • torch.linspace( a, b, c)相当于matlab 的 a : b,均匀产生c个点
  • torch.nn.Module是自定义网络的基类,需要自定义forward(self)和 __init__(self), 在init中需要
    class Net(torch.nn.Module):
        def __init(self):
            super(Net, self).__init__()
            ##....
    

     

  • torch.nn.Linear(a, b)返回一个a * b 的线性分类器,可以这么用:
    hidden = torch.nn.Linear(n_features), n_hidden) 
    y = F.relu(hidden(x))

     

  • 可以用print(Net)打印出自定义的网络信息

  • loss_func = torch.MSELoss()返回均方误差函数 , 接下来可以使用该函数 loss = loss_func(pred, y),然后 loss.backward();      loss_func = torch.nn.CrossEntropyLoss()返回交叉熵函数,接下来可以类似上述过程计算交叉熵损失

  • optimizer = torch.optim.SGD(net.parameters(), lr=0.1), 每次迭代要记得调用optimizer.zero_grad(),在需要的loss.backward()之后,用optimizer.step()来更新参数

  • import torch.nn.functional as F以使用F.relu、F.softmax等等

  • import torch.utils.data as Data 以使用Data.TensorDataset()和Data.DataLoader()
##此处x,y为数据集的tensor
torch_dataset = Data.TensorDataset(data_tensor=x, target_tensor=y)
loader = Data.DataLoader(
    dataset=torch_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,    ##是否打乱次序
    num_workers=2    ##线程
)
for epoch in range(3):
    for step, (batch_x, batch_y) in enumerate(loader):
        ##  train .....

 

  • torch.save(net, 'net.pkl')
    net  = torch.load('net.pkl')
  • nn.Conv2d(in_channels=1, out_channels=16, channel_size=5, stride=1, padding=2)
    nn.MaxPool2d()
  • 注意在莫烦的pytorch中使用的Variable已经被pytorch抛弃,合并到tensor中去了,所以现在可以直接使用tensor进行训练而不需要用variable包装,对于volatile,也已经被抛弃,需要使用with torch.no_grad()来使该with下声明的tensor不计算梯度,不更新参数

 这里是CNN层的代码,要注意的地方是,在__init__()中定义各个层的功能,在forward()中连接起各层,要注意把conv后的tensor展平,要注意保留batch的维度,但是只在forward中考虑batch的维度,在定义的时候不需考虑,view函数的用法相当于reshape,在指定为-1的地方表示该处值自动推断

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Sequential(#(1, 28, 28)
            nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=2),#(16, 28, 28)
            nn.ReLU(), 
            nn.MaxPool2d(kernel_size=2),#(16, 14, 14)
            )
        self.conv2 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2),#(32, 14, 14)
            nn.ReLU(), 
            nn.MaxPool2d(kernel_size=2),#(32, 7, 7)
            )
        self.out = nn.Linear(32*7*7, 10)
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)  #(batch, 32, 7, 7)
        x = x.view(x.size(0), -1) #(batch, 32*7*7)
        output = self.out(x)
        return output
  • 这里是训练代码,注意:
  1. torch.max()返回的有两个数组,【0】是最大值数组,【1】是最大值索引数组,所以这里用【1】
  2. 要从tensor取数据出来,应该用tensor.data.numpy()
  3. 要把逻辑数组先astype为int 1/0,然后除的时候要转化为float
for epoch in range(EPOCH):
    for step, (x, y) in enumerate(train_loader):
        output = cnn(x)
        loss = loss_func(output, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if step % 50 == 0:
            test_output = cnn(test_x)
            pred_y = torch.max(test_output, 1)[1].data.numpy()
            accuracy = float((pred_y == test_y.data.numpy()).astype(int).sum()) / float(test_y.size(0))
            print('Epoch: ', epoch, 
                  '| train loss: %.4f' % loss.data.numpy(),
                  '| test accuracy: %.4f' % accuracy)

 

 

  • RNN类似,就是在sequential中加入一个LSTM层,指定他的参数,代码如下:
    class RNN(nn.Module):
        def __init__(self):
            super(RNN, self).__init__()
    
            self.rnn = nn.LSTM(         # if use nn.RNN(), it hardly learns
                input_size=INPUT_SIZE,
                hidden_size=64,         # rnn hidden unit
                num_layers=1,           # number of rnn layer
                batch_first=True,       # input & output will has batch size as 1s dimension. 
            )
    
            self.out = nn.Linear(64, 10)
    
        def forward(self, x):
            # x shape (batch, time_step, input_size)
            # r_out shape (batch, time_step, output_size)
            # h_n shape (n_layers, batch, hidden_size)
            # h_c shape (n_layers, batch, hidden_size)
            r_out, (h_n, h_c) = self.rnn(x, None)   # None represents zero initial hidden state
    
            # choose r_out at the last time step
            out = self.out(r_out[:, -1, :])
            return out
     
  • GAN稍微有点复杂,需要generate和discriminator两个网络:
    要注意loss的形式,有G和D的loss
    同时训练的时候,每一个batch,先更新D的参数,再更新G的参数,在更新其中一个时,要确保另一个的参数不会被改变
    此外G在反向传播的时候要注意使retain_graph=True(视频中的retain_variable已经被pytorch新版本retain_graph所代替)以避免重复运算,提高效率
    G = nn.Sequential(                      # Generator
        nn.Linear(N_IDEAS, 128),            # random ideas (could from normal distribution)
        nn.ReLU(),
        nn.Linear(128, ART_COMPONENTS),     # making a painting from these random ideas
    )
    
    D = nn.Sequential(                      # Discriminator
        nn.Linear(ART_COMPONENTS, 128),     # receive art work either from the famous artist or a newbie like G
        nn.ReLU(),
        nn.Linear(128, 1),
        nn.Sigmoid(),                       # tell the probability that the art work is made by artist
    )
    
    opt_D = torch.optim.Adam(D.parameters(), lr=LR_D)
    opt_G = torch.optim.Adam(G.parameters(), lr=LR_G)
    
    for step in range(10000):
        artist_paintings = artist_works()           # real painting from artist
        G_ideas = torch.randn(BATCH_SIZE, N_IDEAS)    # random ideas
        G_paintings = G(G_ideas)                    # fake painting from G (random ideas)
    
        prob_artist0 = D(artist_paintings)          # D try to increase this prob
        prob_artist1 = D(G_paintings)               # D try to reduce this prob
    
        D_loss = - torch.mean(torch.log(prob_artist0) + torch.log(1. - prob_artist1))
        G_loss = torch.mean(torch.log(1. - prob_artist1))
    
        opt_D.zero_grad()
        D_loss.backward(retain_graph=True)      # retain_variables for reusing computational graph
        opt_D.step()
    
        opt_G.zero_grad()
        G_loss.backward()
        opt_G.step()

     

  • dropout很容易,就是在层之间加一个drop_out层就行了,其参数为失活的百分比:

    layer = torch.nn.Sequential(
        torch.nn.Linear(10,64),
        troch.nn. Dropout(0.5),
        torch.nn.ReLU(),
        torch.nn.Linear(64, 128),
        #...
    )

    需要注意的地方是,训练的时候,先调用layer.train()使dropout层能运行,在test的时候,需要调用layer.eval()使dropout层失效


http://www.niftyadmin.cn/n/3658198.html

相关文章

Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation

Fast Gradient-Descent Methods for Temporal-Difference Learning with Linear Function Approximation 论文 gradient temporal difference(GTD)算法能够可靠地收敛,但与传统的线性TD相比,它可能非常慢。在本文中,我…

李宏毅2017深度学习学习笔记1

pooling 有两种方式,一种是把同一感受野的输出pooling,起到的是maxout network的作用,即把同一感受野通过不同filter的结果进行max out以使模型能辨别多样性输入;另一种是把同一filter 的输出pooling,起到的是down sam…

瞄准时间这个强盗

瞄准时间这个强盗 现在是星期天上午11点,我又在计算机前工作。自从成为一个三人创业组的CTO(首席技术官)以来,我就在一直在超时工作。我编写软件、处理文档、会见风险投资商和制定公司发展策略;事实上,我深…

gradient-TD

A Convergent O(n) Algorithm for Off-policy Temporal-difference Learning with Linear Function Approximation 我们介绍了第一个时间差分学习算法,该算法对于任何有限马尔科夫决策过程、行为策略和目标策略在线性函数近似和Off-policy训练下是稳定的&#xff0…

必先利其器之一:用PageHeap检查内存越界错误

必先利其器之一:使用PageHeap.EXE或GFlags.EXE检查内存越界错误Article last modified on 2002-6-3----------------------------------------------------------------The information in this article applies to:- Microsoft Visual C, 32-bit Editions, …

Horde: A Scalable Real-time Architecture for Learning Knowledge

Horde: A Scalable Real-time Architecture for Learning Knowledge from Unsupervised Sensorimotor Interaction 论文 如何学习、表示和使用一般意义上的世界知识,仍然是人工智能(AI)的一个关键的开放性问题。有一些基于first-order pred…

西瓜书强化学习笔记

epsilon-贪心算法:以epsilon的概率随机选取摇臂,以1-epsilon的概率选择平均奖赏最高的摇臂,其中epsilon可以取0.1或者1/根号t,即随时间变小softmax算法中摇臂几率的分配基于boltzmann分布,平均奖赏高的摇臂被选中的概率…