- torch.dot()处理两个1-D tensor,结果为1*1的数, torch.mm(a, b)执行矩阵乘法, 还有 torch.normal(...), torch.zeros(...), x.pow(2)等等函数
- torch.unsqueeze()增加一个维度,使 i * j tensor 变为 1 * i * j 或 i * j * 1 等,可以指定dim
- torch.linspace( a, b, c)相当于matlab 的 a : b,均匀产生c个点
- torch.nn.Module是自定义网络的基类,需要自定义forward(self)和 __init__(self), 在init中需要
class Net(torch.nn.Module): def __init(self): super(Net, self).__init__() ##....
- torch.nn.Linear(a, b)返回一个a * b 的线性分类器,可以这么用:
hidden = torch.nn.Linear(n_features), n_hidden) y = F.relu(hidden(x))
-
可以用print(Net)打印出自定义的网络信息
-
loss_func = torch.MSELoss()返回均方误差函数 , 接下来可以使用该函数 loss = loss_func(pred, y),然后 loss.backward(); loss_func = torch.nn.CrossEntropyLoss()返回交叉熵函数,接下来可以类似上述过程计算交叉熵损失
-
optimizer = torch.optim.SGD(net.parameters(), lr=0.1), 每次迭代要记得调用optimizer.zero_grad(),在需要的loss.backward()之后,用optimizer.step()来更新参数
-
import torch.nn.functional as F以使用F.relu、F.softmax等等
- import torch.utils.data as Data 以使用Data.TensorDataset()和Data.DataLoader()
##此处x,y为数据集的tensor
torch_dataset = Data.TensorDataset(data_tensor=x, target_tensor=y)
loader = Data.DataLoader(
dataset=torch_dataset,
batch_size=BATCH_SIZE,
shuffle=True, ##是否打乱次序
num_workers=2 ##线程
)
for epoch in range(3):
for step, (batch_x, batch_y) in enumerate(loader):
## train .....
- torch.save(net, 'net.pkl')
net = torch.load('net.pkl') - nn.Conv2d(in_channels=1, out_channels=16, channel_size=5, stride=1, padding=2)
nn.MaxPool2d() - 注意在莫烦的pytorch中使用的Variable已经被pytorch抛弃,合并到tensor中去了,所以现在可以直接使用tensor进行训练而不需要用variable包装,对于volatile,也已经被抛弃,需要使用with torch.no_grad()来使该with下声明的tensor不计算梯度,不更新参数
这里是CNN层的代码,要注意的地方是,在__init__()中定义各个层的功能,在forward()中连接起各层,要注意把conv后的tensor展平,要注意保留batch的维度,但是只在forward中考虑batch的维度,在定义的时候不需考虑,view函数的用法相当于reshape,在指定为-1的地方表示该处值自动推断
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential(#(1, 28, 28)
nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=2),#(16, 28, 28)
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),#(16, 14, 14)
)
self.conv2 = nn.Sequential(
nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2),#(32, 14, 14)
nn.ReLU(),
nn.MaxPool2d(kernel_size=2),#(32, 7, 7)
)
self.out = nn.Linear(32*7*7, 10)
def forward(self, x):
x = self.conv1(x)
x = self.conv2(x) #(batch, 32, 7, 7)
x = x.view(x.size(0), -1) #(batch, 32*7*7)
output = self.out(x)
return output
- 这里是训练代码,注意:
- torch.max()返回的有两个数组,【0】是最大值数组,【1】是最大值索引数组,所以这里用【1】
- 要从tensor取数据出来,应该用tensor.data.numpy()
- 要把逻辑数组先astype为int 1/0,然后除的时候要转化为float
for epoch in range(EPOCH):
for step, (x, y) in enumerate(train_loader):
output = cnn(x)
loss = loss_func(output, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
if step % 50 == 0:
test_output = cnn(test_x)
pred_y = torch.max(test_output, 1)[1].data.numpy()
accuracy = float((pred_y == test_y.data.numpy()).astype(int).sum()) / float(test_y.size(0))
print('Epoch: ', epoch,
'| train loss: %.4f' % loss.data.numpy(),
'| test accuracy: %.4f' % accuracy)
- RNN类似,就是在sequential中加入一个LSTM层,指定他的参数,代码如下:
class RNN(nn.Module): def __init__(self): super(RNN, self).__init__() self.rnn = nn.LSTM( # if use nn.RNN(), it hardly learns input_size=INPUT_SIZE, hidden_size=64, # rnn hidden unit num_layers=1, # number of rnn layer batch_first=True, # input & output will has batch size as 1s dimension. ) self.out = nn.Linear(64, 10) def forward(self, x): # x shape (batch, time_step, input_size) # r_out shape (batch, time_step, output_size) # h_n shape (n_layers, batch, hidden_size) # h_c shape (n_layers, batch, hidden_size) r_out, (h_n, h_c) = self.rnn(x, None) # None represents zero initial hidden state # choose r_out at the last time step out = self.out(r_out[:, -1, :]) return out
- GAN稍微有点复杂,需要generate和discriminator两个网络:
要注意loss的形式,有G和D的loss
同时训练的时候,每一个batch,先更新D的参数,再更新G的参数,在更新其中一个时,要确保另一个的参数不会被改变
此外G在反向传播的时候要注意使retain_graph=True(视频中的retain_variable已经被pytorch新版本retain_graph所代替)以避免重复运算,提高效率G = nn.Sequential( # Generator nn.Linear(N_IDEAS, 128), # random ideas (could from normal distribution) nn.ReLU(), nn.Linear(128, ART_COMPONENTS), # making a painting from these random ideas ) D = nn.Sequential( # Discriminator nn.Linear(ART_COMPONENTS, 128), # receive art work either from the famous artist or a newbie like G nn.ReLU(), nn.Linear(128, 1), nn.Sigmoid(), # tell the probability that the art work is made by artist ) opt_D = torch.optim.Adam(D.parameters(), lr=LR_D) opt_G = torch.optim.Adam(G.parameters(), lr=LR_G) for step in range(10000): artist_paintings = artist_works() # real painting from artist G_ideas = torch.randn(BATCH_SIZE, N_IDEAS) # random ideas G_paintings = G(G_ideas) # fake painting from G (random ideas) prob_artist0 = D(artist_paintings) # D try to increase this prob prob_artist1 = D(G_paintings) # D try to reduce this prob D_loss = - torch.mean(torch.log(prob_artist0) + torch.log(1. - prob_artist1)) G_loss = torch.mean(torch.log(1. - prob_artist1)) opt_D.zero_grad() D_loss.backward(retain_graph=True) # retain_variables for reusing computational graph opt_D.step() opt_G.zero_grad() G_loss.backward() opt_G.step()
-
dropout很容易,就是在层之间加一个drop_out层就行了,其参数为失活的百分比:
layer = torch.nn.Sequential( torch.nn.Linear(10,64), troch.nn. Dropout(0.5), torch.nn.ReLU(), torch.nn.Linear(64, 128), #... )
需要注意的地方是,训练的时候,先调用layer.train()使dropout层能运行,在test的时候,需要调用layer.eval()使dropout层失效