莫烦pytorch学习笔记

torch.dot()处理两个1-D tensor，结果为1*1的数， torch.mm(a, b)执行矩阵乘法, 还有 torch.normal(...), torch.zeros(...), x.pow(2)等等函数
torch.unsqueeze()增加一个维度，使 i * j tensor 变为 1 * i * j 或 i * j * 1 等，可以指定dim
torch.linspace( a, b, c)相当于matlab 的 a : b，均匀产生c个点
torch.nn.Module是自定义网络的基类，需要自定义forward（self）和 __init__(self), 在init中需要
```
class Net(torch.nn.Module):
    def __init(self):
        super(Net, self).__init__()
        ##....
```
torch.nn.Linear(a, b)返回一个a * b 的线性分类器，可以这么用：
```
hidden = torch.nn.Linear(n_features), n_hidden) 
y = F.relu(hidden(x))
```
可以用print(Net)打印出自定义的网络信息
loss_func = torch.MSELoss()返回均方误差函数 , 接下来可以使用该函数 loss = loss_func(pred, y),然后 loss.backward(); loss_func = torch.nn.CrossEntropyLoss()返回交叉熵函数，接下来可以类似上述过程计算交叉熵损失
optimizer = torch.optim.SGD(net.parameters(), lr=0.1), 每次迭代要记得调用optimizer.zero_grad()，在需要的loss.backward()之后，用optimizer.step()来更新参数
import torch.nn.functional as F以使用F.relu、F.softmax等等
import torch.utils.data as Data 以使用Data.TensorDataset()和Data.DataLoader()

##此处x，y为数据集的tensor
torch_dataset = Data.TensorDataset(data_tensor=x, target_tensor=y)
loader = Data.DataLoader(
    dataset=torch_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,    ##是否打乱次序
    num_workers=2    ##线程
)
for epoch in range(3):
    for step, (batch_x, batch_y) in enumerate(loader):
        ##  train .....

torch.save(net, 'net.pkl')
net = torch.load('net.pkl')
nn.Conv2d(in_channels=1, out_channels=16, channel_size=5, stride=1, padding=2)
nn.MaxPool2d()
注意在莫烦的pytorch中使用的Variable已经被pytorch抛弃，合并到tensor中去了，所以现在可以直接使用tensor进行训练而不需要用variable包装，对于volatile，也已经被抛弃，需要使用with torch.no_grad()来使该with下声明的tensor不计算梯度，不更新参数

这里是CNN层的代码，要注意的地方是，在__init__()中定义各个层的功能，在forward（）中连接起各层，要注意把conv后的tensor展平，要注意保留batch的维度，但是只在forward中考虑batch的维度，在定义的时候不需考虑，view函数的用法相当于reshape，在指定为-1的地方表示该处值自动推断

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Sequential(#(1, 28, 28)
            nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=2),#(16, 28, 28)
            nn.ReLU(), 
            nn.MaxPool2d(kernel_size=2),#(16, 14, 14)
            )
        self.conv2 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2),#(32, 14, 14)
            nn.ReLU(), 
            nn.MaxPool2d(kernel_size=2),#(32, 7, 7)
            )
        self.out = nn.Linear(32*7*7, 10)
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.conv2(x)  #(batch, 32, 7, 7)
        x = x.view(x.size(0), -1) #(batch, 32*7*7)
        output = self.out(x)
        return output

这里是训练代码，注意：

torch.max()返回的有两个数组，【0】是最大值数组，【1】是最大值索引数组，所以这里用【1】
要从tensor取数据出来，应该用tensor.data.numpy()
要把逻辑数组先astype为int 1/0，然后除的时候要转化为float

for epoch in range(EPOCH):
    for step, (x, y) in enumerate(train_loader):
        output = cnn(x)
        loss = loss_func(output, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if step % 50 == 0:
            test_output = cnn(test_x)
            pred_y = torch.max(test_output, 1)[1].data.numpy()
            accuracy = float((pred_y == test_y.data.numpy()).astype(int).sum()) / float(test_y.size(0))
            print('Epoch: ', epoch, 
                  '| train loss: %.4f' % loss.data.numpy(),
                  '| test accuracy: %.4f' % accuracy)

RNN类似，就是在sequential中加入一个LSTM层，指定他的参数，代码如下：

class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()

        self.rnn = nn.LSTM(         # if use nn.RNN(), it hardly learns
            input_size=INPUT_SIZE,
            hidden_size=64,         # rnn hidden unit
            num_layers=1,           # number of rnn layer
            batch_first=True,       # input & output will has batch size as 1s dimension. 
        )

        self.out = nn.Linear(64, 10)

    def forward(self, x):
        # x shape (batch, time_step, input_size)
        # r_out shape (batch, time_step, output_size)
        # h_n shape (n_layers, batch, hidden_size)
        # h_c shape (n_layers, batch, hidden_size)
        r_out, (h_n, h_c) = self.rnn(x, None)   # None represents zero initial hidden state

        # choose r_out at the last time step
        out = self.out(r_out[:, -1, :])
        return out

GAN稍微有点复杂，需要generate和discriminator两个网络：
要注意loss的形式，有G和D的loss
同时训练的时候，每一个batch，先更新D的参数，再更新G的参数，在更新其中一个时，要确保另一个的参数不会被改变
此外G在反向传播的时候要注意使retain_graph=True（视频中的retain_variable已经被pytorch新版本retain_graph所代替）以避免重复运算，提高效率

G = nn.Sequential(                      # Generator
    nn.Linear(N_IDEAS, 128),            # random ideas (could from normal distribution)
    nn.ReLU(),
    nn.Linear(128, ART_COMPONENTS),     # making a painting from these random ideas
)

D = nn.Sequential(                      # Discriminator
    nn.Linear(ART_COMPONENTS, 128),     # receive art work either from the famous artist or a newbie like G
    nn.ReLU(),
    nn.Linear(128, 1),
    nn.Sigmoid(),                       # tell the probability that the art work is made by artist
)

opt_D = torch.optim.Adam(D.parameters(), lr=LR_D)
opt_G = torch.optim.Adam(G.parameters(), lr=LR_G)

for step in range(10000):
    artist_paintings = artist_works()           # real painting from artist
    G_ideas = torch.randn(BATCH_SIZE, N_IDEAS)    # random ideas
    G_paintings = G(G_ideas)                    # fake painting from G (random ideas)

    prob_artist0 = D(artist_paintings)          # D try to increase this prob
    prob_artist1 = D(G_paintings)               # D try to reduce this prob

    D_loss = - torch.mean(torch.log(prob_artist0) + torch.log(1. - prob_artist1))
    G_loss = torch.mean(torch.log(1. - prob_artist1))

    opt_D.zero_grad()
    D_loss.backward(retain_graph=True)      # retain_variables for reusing computational graph
    opt_D.step()

    opt_G.zero_grad()
    G_loss.backward()
    opt_G.step()

dropout很容易，就是在层之间加一个drop_out层就行了,其参数为失活的百分比：
```
layer = torch.nn.Sequential(
    torch.nn.Linear(10,64),
    troch.nn. Dropout(0.5),
    torch.nn.ReLU(),
    torch.nn.Linear(64, 128),
    #...
)
```
需要注意的地方是，训练的时候，先调用layer.train()使dropout层能运行，在test的时候，需要调用layer.eval()使dropout层失效