https://www.youtube.com/watch?v=_H3aw6wkCv0
set_trace() -> to debug or see value at any point
numpy -> ndarray
pytorch -> tensor
@ = Matrix Multiplication
x.T @ x -> x transpose matmul x :numpy
x.t() @ x : pytorch
inv(x)
torch.inverse(x)
x.add(1)
x.add_(1) : _ means inplace operation, ex: x.t_() will change x
torch to numpy: A.numpy()
numpy to torch: torch.from_numpy(x)
Difference between detach() and with torch.no_grad():
https://pytorch.org/blog/pytorch-0_4_0-migration-guide/
always use detach to get the variable/tensor data because it is secured.
example:
x = 1,2,3
y = x.data
y = 4,5,6
then, x also becomes 4,5,6
and when we calc loss.backward, x gradient is changed from its new value - harmful because we had changed it explicitly.
x = 1,2,3
y = x.detach()
y = 4,5,6
then, x also becomes 4,5,6
and when we calc loss.backward, there is an error which will indicate the value has changed.
So in short, do not change the variable on which detach() is used.
torch.no_grad():
if you want no node to be created for this operation.
Move data to GPU :
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") :if two gpus cuda:1
assert device == "GPU" to make sure its using gpu
data.to(device)
Previously we needed to use:
.cuda() and .cpu() thich caused code with many checks:
if CUDA:
model = model.cuda()
Now its easier: model.to(device) you dont have to use if else anymore
if you use conda torch to install it puts the cuda in the anaconda path
I installed pycharm for autocomplete
model
init
forward
optimizer
model.parameters()
loss
backward
optstep
Why GPU?
model has some params that are run on gpu
better to keep data on gpu at once and run it there
sometimes hard to put all data on gpu so we copy batches into gpu one by one when operating - a bit slow but we dont have memory so this is the only way.
Training step:
model.train() <- always
Eval:
model.eval()
from IPython.core.debugger import set_trace [There is a cheat sheet]
set_trace()
some commands: next, exit, locals()
Datasets in pytorch must have __len__ and __getitem__ like init and forward for modules. - so that you can feed it to data loader
train_ds[14][0]
train_set[1]['LR']
implement transform function is easy in pytorch:
_image_size = 224
_mean = [0.485, 0.456, 0.406]
_std = [0.229, 0.224, 0.225]
trans = transforms.Compose([
transforms.RandomCrop(_image_size),
# transforms.RandomHorizontalFlip(),
# transforms.ColorJitter(.3, .3, .3),
transforms.ToTensor(),
transforms.Normalize(_mean, _std),
]) #you can change the order you do it as well
then:
trans(train_ds[13][0]) # train_ds[13][0] is an image
train_iter = iter(train_dl)
X, y = next(train_iter)
Transfer learning Time: 1:03:00
Transfer Learning: Using weights of pretrained network.
Say we have a pretrained model which classifies into 1000 categories.
We not want it to classify 2 lables.
So we only need to change the last fc layer to output 2 instead of 1000
And then train the last layer only, no need to train whole network. Very fast.
First we load the model:
from torchvision import models
model = models.resnet18(pretrained=True)
To display model:
model
Freeze all trainable parameters manually
for param in model.parameters():
param.requires_grad = False
or:
# Or use our convenient functions from before #these are userdefined. see below
freeze_all(model.parameters())
assert all_frozen(model.parameters())
Replace the last layer with a linear layer. New layers have requires_grad = True.
model.fc = nn.Linear(512, n_classes)
assert not all_frozen(model.parameters())
So in general:
def get_model(n_classes=2):
model = models.resnet18(pretrained=True)
freeze_all(model.parameters())
model.fc = nn.Linear(512, n_classes)
model = model.to(DEVICE)
return model
model = get_model()
# %load my_train_helper.py
def get_trainable(model_params):
return (p for p in model_params if p.requires_grad)
def get_frozen(model_params):
return (p for p in model_params if not p.requires_grad)
def all_trainable(model_params):
return all(p.requires_grad for p in model_params)
def all_frozen(model_params):
return all(not p.requires_grad for p in model_params)
def freeze_all(model_params):
for param in model_params:
param.requires_grad = False
Or we can also use high learning rate at the end and low learning rate at the beginning
optimizer = torch.optim.Adam(
get_trainable(model.parameters()),
lr=0.001,
# momentum=0.9,
)
the way i used model.train and model.eval was good.
epoch loop:
model.train
train over one epoch
model.eval
validate
loop up
.item() -> what does this do?
loss.item()
tensorboradX SummaryWriter
set_trace() -> to debug or see value at any point
numpy -> ndarray
pytorch -> tensor
@ = Matrix Multiplication
x.T @ x -> x transpose matmul x :numpy
x.t() @ x : pytorch
inv(x)
torch.inverse(x)
x.add(1)
x.add_(1) : _ means inplace operation, ex: x.t_() will change x
torch to numpy: A.numpy()
numpy to torch: torch.from_numpy(x)
Difference between detach() and with torch.no_grad():
https://pytorch.org/blog/pytorch-0_4_0-migration-guide/
always use detach to get the variable/tensor data because it is secured.
example:
x = 1,2,3
y = x.data
y = 4,5,6
then, x also becomes 4,5,6
and when we calc loss.backward, x gradient is changed from its new value - harmful because we had changed it explicitly.
x = 1,2,3
y = x.detach()
y = 4,5,6
then, x also becomes 4,5,6
and when we calc loss.backward, there is an error which will indicate the value has changed.
So in short, do not change the variable on which detach() is used.
torch.no_grad():
if you want no node to be created for this operation.
Move data to GPU :
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") :if two gpus cuda:1
assert device == "GPU" to make sure its using gpu
data.to(device)
Previously we needed to use:
.cuda() and .cpu() thich caused code with many checks:
if CUDA:
model = model.cuda()
Now its easier: model.to(device) you dont have to use if else anymore
if you use conda torch to install it puts the cuda in the anaconda path
I installed pycharm for autocomplete
model
init
forward
optimizer
model.parameters()
loss
backward
optstep
Why GPU?
model has some params that are run on gpu
better to keep data on gpu at once and run it there
sometimes hard to put all data on gpu so we copy batches into gpu one by one when operating - a bit slow but we dont have memory so this is the only way.
Training step:
model.train() <- always
Eval:
model.eval()
from IPython.core.debugger import set_trace [There is a cheat sheet]
set_trace()
some commands: next, exit, locals()
Datasets in pytorch must have __len__ and __getitem__ like init and forward for modules. - so that you can feed it to data loader
train_ds[14][0]
train_set[1]['LR']
implement transform function is easy in pytorch:
_image_size = 224
_mean = [0.485, 0.456, 0.406]
_std = [0.229, 0.224, 0.225]
trans = transforms.Compose([
transforms.RandomCrop(_image_size),
# transforms.RandomHorizontalFlip(),
# transforms.ColorJitter(.3, .3, .3),
transforms.ToTensor(),
transforms.Normalize(_mean, _std),
]) #you can change the order you do it as well
then:
trans(train_ds[13][0]) # train_ds[13][0] is an image
train_iter = iter(train_dl)
X, y = next(train_iter)
Transfer learning Time: 1:03:00
Transfer Learning: Using weights of pretrained network.
Say we have a pretrained model which classifies into 1000 categories.
We not want it to classify 2 lables.
So we only need to change the last fc layer to output 2 instead of 1000
And then train the last layer only, no need to train whole network. Very fast.
First we load the model:
from torchvision import models
model = models.resnet18(pretrained=True)
To display model:
model
Freeze all trainable parameters manually
for param in model.parameters():
param.requires_grad = False
or:
# Or use our convenient functions from before #these are userdefined. see below
freeze_all(model.parameters())
assert all_frozen(model.parameters())
Replace the last layer with a linear layer. New layers have requires_grad = True.
model.fc = nn.Linear(512, n_classes)
assert not all_frozen(model.parameters())
So in general:
def get_model(n_classes=2):
model = models.resnet18(pretrained=True)
freeze_all(model.parameters())
model.fc = nn.Linear(512, n_classes)
model = model.to(DEVICE)
return model
model = get_model()
# %load my_train_helper.py
def get_trainable(model_params):
return (p for p in model_params if p.requires_grad)
def get_frozen(model_params):
return (p for p in model_params if not p.requires_grad)
def all_trainable(model_params):
return all(p.requires_grad for p in model_params)
def all_frozen(model_params):
return all(not p.requires_grad for p in model_params)
def freeze_all(model_params):
for param in model_params:
param.requires_grad = False
Or we can also use high learning rate at the end and low learning rate at the beginning
optimizer = torch.optim.Adam(
get_trainable(model.parameters()),
lr=0.001,
# momentum=0.9,
)
the way i used model.train and model.eval was good.
epoch loop:
model.train
train over one epoch
model.eval
validate
loop up
.item() -> what does this do?
loss.item()
tensorboradX SummaryWriter