Section 1: Introducing Pytorch, CUDA and GPU

PyTorch is a deep learning framework and a scientific computing package.
The scientific computing aspect of PyTorch is primarily a result PyTorch’s tensor library and associated tensor operations. That means you can take advantage of Pytorch for many computing tasks, thanks to its supporting tensor operation, without touching deep learning modules.

Important to note that PyTorch tensors and their associated operations are very similar to numpy n-dimensional arrays. A tensor is actually an n-dimensional array.

Pytorch build its library around Object Oriented Programming(OOP) concept. With object oriented programming, we orient our program design and structure around objects (take a look at Random topics for more information). The tensor in Pytorch is presented by the object torch.Tensor which is created from numpy ndarray objects. Two objects share memory. This makes the transition between PyTorch and NumPy very cheap from a performance perspective.

With PyTorch tensors, GPU support is built-in. It’s very easy with PyTorch to move tensors to and from a GPU if we have one installed on our system. Tensors are super important for deep learning and neural networks because they are the data structure that we ultimately use for building and training our neural networks.

Talking a bit about history.
The initial release of PyTorch was in October of 2016, and before PyTorch was created, there was and still is, another framework called Torch which is also a machine learning framework but is based on the Lua programming language. The connection between PyTorch and this Lua version, called Torch, exists because many of the developers who maintain the Lua version are the individuals who created PyTorch. And they have been working for Facebook since then till now.

Note: Facebook Created PyTorch

Below are the primary PyTorch modules we’ll be learning about and using as we build neural networks along the way.

Package	Description
torch	The top-level PyTorch package and tensor library.
torch.nn	A subpackage that contains modules and extensible classes for building neural networks.
torch.autograd	A subpackage that supports all the differentiable Tensor operations in PyTorch.
torch.nn.functional	A functional interface that contains operations used for building neural net like loss, activation, layer operations...
torch.optim	A subpackage that contains standard optimization operations like SGD and Adam.
torch.utils	A subpackage that contains utility classes like data sets and data loaders that make data preprocessing easier.
torchvision	A package that provides access to popular datasets, models, and image transformations for computer vision.

Why use PyTorch for deep learning?

PyTorch’s design is modern, Pythonic. When we build neural networks with PyTorch, we are super close to programming neural networks from scratch. When we write PyTorch code, we are just writing and extending standard Python classes, and when we debug PyTorch code, we are using the standard Python debugger. It’s written mostly in Python, and only drops into C++ and CUDA code for operations that are performance bottlenecks.
It is a thin framework, which makes it more likely that PyTorch will be capable of adapting to the rapidly evolving deep learning environment as things change quickly over time.
Stays out of the way and this makes it so that we can focus on neural networks and less on the actual framework.

Why PyTorch is great for deep learning research

The reason for this research suitability is that Pytorch use dynamic computational graph, in contrast with tensorfow which uses static computational graph, in order to calculate derivatives.

Computational graphs are used to graph the function operations that occur on tensors inside neural networks. These graphs are then used to compute the derivatives needed to optimize the neural network. Dynamic computational graph means that the graph is generated on the fly as the operations are created. Static graphs that are fully determined before the actual operations occur.

It just so happens that many of the cutting edge research topics in deep learning are requiring or benefiting greatly from dynamic graphs.

Installing PyTorch

The recommended best option is to use the Anaconda Python package manager. With Anaconda, it's easy to get and manage Python, Jupyter Notebook, and other commonly used packages for scientific computing and data science, like PyTorch!

Let’s go over the steps:

Download and install Anaconda (choose the latest Python version).
Go to PyTorch's site and find the get started locally section.
Specify the appropriate configuration options for your particular environment.
Run the presented command in the terminal to install PyTorch

For the example: conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

Notice that we are installing both PyTorch and torchvision. Also, there is no need to install CUDA separately. The needed CUDA software comes installed with PyTorch if a CUDA version is selected in step (3). All we need to do is select a version of CUDA if we have a supported Nvidia GPU on our system.

!conda list torch

# packages in environment at /Users/phucnsp/anaconda3/envs/fastai2:
#
# Name                    Version                   Build  Channel
pytorch                   1.4.0                   py3.7_0    pytorch
torchsummary              1.5.1                    pypi_0    pypi
torchvision               0.5.0                  py37_cpu    pytorch

import torch
torch.__version__ # to verify pytorch version

'1.4.0'

torch.cuda.is_available() # to verify our GPU capabilities

False

Why deep learning uses GPUs

To understand CUDA, we need to have a working knowledge of graphics processing units (GPUs). A GPU is a processor that is good at handling specialized computations. This is in contrast to a central processing unit (CPU), which is a processor that is good at handling general computations. CPUs are the processors that power most of the typical computations on our electronic devices.

A GPU can be much faster at computing than a CPU. However, this is not always the case. The speed of a GPU relative to a CPU depends on the type of computation being performed. The type of computation most suitable for a GPU is a computation that can be done in parallel.

Parallel computing is a type of computation where by a particular computation is broken into independent smaller computations that can be carried out simultaneously. The resulting computations are then recombined, or synchronized, to form the result of the original larger computation. The number of tasks that a larger task can be broken into depends on the number of cores contained on a particular piece of hardware. Cores are the units that actually do the computation within a given processor, and CPUs typically have four, eight, or sixteen cores while GPUs have potentially thousands.

So why deep learning uses them - Neural networks are embarrassingly parallel. Tasks that embarrassingly parallel are ones where it’s easy to see that the set of smaller tasks are independent with respect to each other. Many of the computations that we do with neural networks can be easily broken into smaller computations in such a way that the set of smaller computations do not depend on one another. One such example is a convolution.

GPU, CUDA and Nvidia

GPU computing
In the beginning, the main tasks that were accelerated using GPUs were computer graphics. That's why we have the name graphics processing unit. But in recent years, many more varieties parallel tasks have emerged. One such task as we have seen is deep learning.
Deep learning along with many other scientific computing tasks that use parallel programming techniques are leading to a new type of programming model called GPGPU or general purpose GPU computing.

GPU computing practically began with the introduction of CUDA by NVIDIA and Stream by AMD. These are APIs designed by the GPU vendors to be used together with the hardware that they provide.
Nvidia is a technology company that designs GPUs, and they have created CUDA as a software platform that pairs with their GPU hardware making it easier for developers to build software that accelerates computations using the parallel processing power of Nvidia GPUs.

GPU computing stack concept
The stack comprises of:

GPU as the hardware on the bottom
CUDA as the software architecture on top of the GPU
And finally libraries like cuDNN on top of CUDA.

Sitting on top of CUDA and cuDNN is PyTorch, which is the framework were we’ll be working that ultimately supports applications on top. Developers use CUDA by downloading the CUDA toolkit which comes with specialized libraries like cuDNN - the CUDA Deep Neural Network library. With PyTorch, CUDA comes baked in from the start. There are no additional downloads required. All we need is to have a supported Nvidia GPU, and we can leverage CUDA using PyTorch. We don’t need to know how to use the CUDA API directly.

After all, PyTorch is written in all of these: Python, C++, CUDA

Suppose we have the following code:

t = torch.tensor([1,2,3])

The tensor object created in this way is on the CPU by default. As a result, any operations that we do using this tensor object will be carried out on the CPU. Now, to move the tensor onto the GPU, we just write:

t = t.cuda()

This ability makes PyTorch very flexible because computations can be selectively carried out either on the CPU or on the GPU.

Note: GPU Can Be Slower Than CPU. The answer is that a GPU is only faster for particular (specialized) tasks. For example, moving data from the CPU to the GPU is costly, so in this case, the overall performance might be slower if the computation task is a simple one. Moving relatively small computational tasks to the GPU won’t speed us up very much and may indeed slow us down. Remember, the GPU works well for tasks that can be broken into many smaller tasks, and if a compute task is already small, we won’t have much to gain by moving the task to the GPU.For this reason, it’s often acceptable to simply use a CPU when just starting out, and as we tackle larger more complicated problems, begin using the GPU more heavily.

Section 2: Introducing Tensors

Tensors - Data Structures of Deep Learning

A tensor is the primary data structure used by neural networks. The inputs, outputs, and transformations within neural networks are all represented using tensors, and as a result, neural network programming utilizes tensors heavily.

The below concepts, that we met in math or computer science, are all refered to tensor in deep learning.

indexes required	math	computer science
0	scalar	number
1	vector	array
2	matrix	2d-array

The relationship within each of these pairs, for example vector and array, is that both elements require the same number of indexes to refer to a specific element within the data structure.

# an array or a vector requires 1 index to access its element
a = [1,2,3,4]
a[3]

4

# an matrix or 2d-array requires 2 index to access its element
a = [
    [1, 2, 3, 4],
    [5, 6, 7, 8]
]
a[0][2]

3

When more than two indexes are required to access a specific element, we stop giving specific names to the structures, and we begin using more general language.

In mathematics, we stop using words like scalar, vector, and matrix, and we start using the word tensor or nd-tensor. The n tells us the number of indexes required to access a specific element within the structure.
In computer science, we stop using words like, number, array, 2d-array, and start using the word multidimensional array or nd-array. The n tells us the number of indexes required to access a specific element within the structure.

The reason we say a tensor is a generalization form is because we use the word tensor for all values of n like so:

A scalar is a 0 dimensional tensor
A vector is a 1 dimensional tensor
A matrix is a 2 dimensional tensor
A nd-array is an n dimensional tensor

Note: Tensors and nd-arrays are the same thing!

Fundamental tensor attributes for deep learning - Rank, Axes, and Shape.

These concepts build on one another starting with rank, then axes, and building up to shape.

The rank of a tensor refers to the number of dimensions present within the tensor. A rank-2 tensor means all of the following: a matrix, a 2d-array, a 2d-tensor.

An axis of a tensor is a specific dimension of a tensor.
Let's get an example how to access elements of an axis.

dd = [
[1,2,3],
[4,5,6],
[7,8,9]
]

Each element along the first axis, is an array:

dd[0], dd[1], dd[2]

([1, 2, 3], [4, 5, 6], [7, 8, 9])

Each element along the second axis, is a number:

dd[0][0], dd[1][0], dd[2][0]

(1, 4, 7)

Note: with tensors, the elements of the last axis are always numbers. Every other axis will contain n-dimensional arrays.

The shape of a tensor gives us the length of each axis of the tensor.

Note: The shape of a tensor is important because it encodes all of the relevant information about axes, rank, and therefore indexes. Additionally, one of the types of operations we must perform frequently when we are programming our neural networks is called reshaping.

t = torch.tensor([
    [1,2,3],
    [5,6,7]
], dtype=torch.float)
t.shape

torch.Size([2, 3])

Note: size and shape of a tensor are the same thing.

Section 3: Pytorch Tensors

PyTorch tensors are the data structures we'll be using when programming neural networks in PyTorch.

The tensor in Pytorch is presented by the object torch.Tensor which is created from numpy ndarray objects. Two objects share memory. This makes the transition between PyTorch and NumPy very cheap from a performance perspective.

When programming neural networks, data preprocessing is often one of the first steps in the overall process, and one goal of data preprocessing is to transform the raw input data into tensor form.

torch.Tensor class and its attributes

PyTorch tensors are instances of the torch.Tensor Python class.
First, let’s look at a few torch.Tensor's tensor attributes.

tensor.dtype:
tensor.device
tensor.layout

t = torch.Tensor()

The dtype specifies the type of the data that is contained within the tensor.

t.dtype

torch.float32

Table below shows all the tensor types that Pytorch supports. Each type has a CPU and GPU version. Tensors contain uniform (of the same type) numerical data.

pytorch datatype

The device specifies the device (CPU or GPU) where the tensor's data is allocated. This determines where tensor computations for the given tensor will be performed.

t.device

device(type='cpu')

PyTorch supports the use of multiple devices, and they are specified using an index like so:

device = torch.device('cuda:0')
device

device(type='cuda', index=0)

If we have a device like above, we can create a tensor on the device by passing the device to the tensor’s constructor.

import torch
t = torch.tensor([1,2,3], dtype=torch.int, device=device)

The layout specifies how the tensor is stored in memory.

t.layout

torch.strided

To learn more about stride check here.

Create a new tensor using data

These are the primary ways of creating tensor objects (instances of the torch.Tensor class), with data (array-like) in PyTorch:

torch.Tensor(data) is the constructor of the torch.Tensor class
torch.tensor(data): is the factory function that constructs torch.Tensor objects.
torch.as_tensor(data)
torch.from_numpy(data)

Let’s look at each of these. They all accept some form of data and give us an instance of the torch.Tensor class. Sometimes when there are multiple ways to achieve the same result, things can get confusing, so let’s break this down.

import numpy as np
data = np.array([1,2,3])

o1 = torch.Tensor(data)
o2 = torch.tensor(data)
o3 = torch.as_tensor(data)
o4 = torch.from_numpy(data)

print(o1)
print(o2)
print(o3)
print(o4)

tensor([1., 2., 3.])
tensor([1, 2, 3])
tensor([1, 2, 3])
tensor([1, 2, 3])

The table below compare 4 options and propose which one to use.

torch.tensor is best option to go daily ^^. It copy data which help us prevent hidden mistake caused by sharing data. In addition, it is better than torch.Tensor thanks to better doc and more config options.
torch.as_tensor is recommened in case we want to improve the performance and want to levarage data share memory characteristic of Pytorch. However, it is always better to start with copy data to make sure your program works first and go to performance improvement later. This option is better than torch.from_numpy because it accepts a wide variety of array-like objects including other Pytorch tensor.

method	which one to use	dtype	data in memory
torch.Tensor(data)		infer from default dtype.	copy
torch.tensor(data)	best option to go	inferred from input or explicitly set.	copy
torch.as_tensor(data)	use for improve performance	inferred from input or explicitly set.	share
torch.from_numpy(data)		inferred from input or explicitly set.	share

Data in memory is shared means that the actual data in memory exists in a single place. As a result, any changes that occur in the underlying data will be reflected in both objects, the torch.Tensor and the numpy.ndarray.
Sharing data is more efficient and uses less memory than copying data because the data is not written to two locations in memory. However, there are something to keep in mind about memory sharing:

Since numpy.ndarray objects are allocated on the CPU, the as_tensor() function must copy the data from the CPU to the GPU when a GPU is being used.
The memory sharing of as_tensor() doesn’t work with built-in Python data structures like list.
The as_tensor() call requires developer knowledge of the sharing feature. This is necessary so we don’t inadvertently make an unwanted change in the underlying data without realizing the change impacts multiple objects.
The as_tensor() performance improvement will be greater if there are a lot of back and forth operations between numpy.ndarray objects and tensor objects. However, if there is just a single load operation, there shouldn’t be much impact from a performance perspective.

Tips, In order to convert multiple arrays to tensor we can use map

import numpy as np
import torch

a = np.array([1,2,3])
b = np.array([3,4,5])
c = np.array([1])

a, b, c = map(torch.tensor, (a, b, c))
a, b, c

(tensor([1, 2, 3]), tensor([3, 4, 5]), tensor([1]))

Create a new tensor without data

Here are some other creation options that are available.

# create identity matrix
torch.eye(2)

tensor([[1., 0.],
        [0., 1.]])

# create a tensor of zeros with the shape of specified shape argument
torch.zeros([2,2])

tensor([[0., 0.],
        [0., 0.]])

# create a tensor of ones with the shape of specified shape argument
torch.ones([2,2])

tensor([[1., 1.],
        [1., 1.]])

# Returns a tensor filled with random numbers from a uniform distribution on the interval [0, 1)
# The shape of the tensor is defined by the variable argument size.
torch.rand([2,2])

tensor([[0.3088, 0.4226],
        [0.8102, 0.9129]])

# Returns a tensor filled with random numbers from a normal distribution with mean 0 and variance 1 
# (also called the standard normal distribution).
torch.randn(2, 3)

tensor([[-1.1778, -1.0388, -0.1459],
        [-0.1746,  0.5764, -1.1491]])

This is a small subset of the available creation functions that don’t require data. Check with the PyTorch documentation for the full list.

Section 4: Pytorch Tensor Operation Types

We have the following high-level categories of tensor operations:

Reshaping operation type: gave us the ability to position our elements along particular axes.
Element-wise operation type: allow us to perform operations on elements between two tensors.
Reduction operation type: allow us to perform operations on elements within a single tensor.
Access operation type allow us to access to each numerical elements within a single tensor.

There are a lot of individual operations out there, so much so that it can sometimes be intimidating when you're just beginning, but grouping similar operations into categories based on their likeness can help make learning about tensor operations more manageable.

Important: Tensor operations between tensors must happen between tensors with the same type of data and on the same device

Before going to each category, firstly we will take a look on the concept of broadcasting

Broadcasting Tensors

To understand this concept, let's take a look at an example. Suppose we have the following tensors.

t1 = torch.tensor([
    [1,1],
    [1,1]
], dtype=torch.float32)

t2 = torch.tensor([2,4], dtype=torch.float32)

t1.shape, t2.shape

(torch.Size([2, 2]), torch.Size([2]))

What will be the result of this element-wise addition operation, t1 + t2 ?

t1 + t2

tensor([[3., 5.],
        [3., 5.]])

Even though these two tenors have differing shapes, the element-wise operation is possible, and broadcasting is what makes the operation possible. The lower rank tensor t2 will be transformed via broadcasting to match the shape of the higher rank tensor t1, and the element-wise operation will be performed as usual.

we can check the broadcast transformation using the broadcast_to() numpy function.

import numpy as np
np.broadcast_to(t2.numpy(), t1.shape)

array([[2., 4.],
       [2., 4.]], dtype=float32)

t1 + t2

tensor([[3., 5.],
        [3., 5.]])

After broadcasting, the addition operation between these two tensors is a regular element-wise operation between tensors of the same shape.

Tensor reshape operation type

Reshaping operations are perhaps the most important type of tensor operations because the shape of a tensor gives us something concrete we can use to shape an intuition for our tensors.

Note: Reshaping changes the shape but not the underlying data elements.

Using the reshape() function, we can specify the row x column shape that we are seeking. Notice that the product of the shape's components has to be equal to the number of elements in the original tensor.

Pytorch has another function called view() that does the same thing as reshape function.

import torch
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
], dtype=torch.float32)

t.reshape([3,4])

tensor([[1., 1., 1., 1.],
        [2., 2., 2., 2.],
        [3., 3., 3., 3.]])

Another common reshape type operation is squeeze, unsqueeze. Those operations change the shape of our tensors is by squeezing and unsqueezing them.

Squeezing a tensor removes the dimensions or axes that have a length of one.
Unsqueezing a tensor adds a dimension with a length of one.

t.reshape([1,12]).shape

torch.Size([1, 12])

t.reshape([1,12]).squeeze().shape

torch.Size([12])

t.reshape([1,12]).unsqueeze(dim=0).shape

torch.Size([1, 1, 12])

Another common reshape type operation is flattening.
A tensor flatten operation is a common operation inside convolutional neural networks. This is because convolutional layer outputs that are passed to fully connected layers must be flatted out before the fully connected layer will accept the input.
A flatten operation on a tensor reshapes the tensor to have a shape that is equal to the number of elements contained in the tensor. This is the same thing as a 1d-array of elements.

These are some ways to flatten a tensor.

t.reshape(1,-1)[0]

tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])

t.reshape(-1)

tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])

t.view(t.numel())

tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])

t.flatten()

tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])

In addition, it is possible to flatten only specific parts of a tensor.

t = torch.tensor([[
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
]], dtype=torch.float32)
t.shape

torch.Size([1, 3, 4])

t.flatten(start_dim=1).shape

torch.Size([1, 12])

Take a deeper look inside the flatten operation, it is actually a composition of reshape and squeeze operation.

Note: flatten operation = reshape operation + squeeze operation

def flatten_ex(t):
    t = t.reshape(1, -1)
    t = t.squeeze()
    return t
flatten_ex(t)

tensor([1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])

Tensor element-wise operation type

An element-wise operation is an operation between two tensors that operates on corresponding elements within the respective tensors. Two tensors must have the same shape in order to perform element-wise operations on them.

Some common element-wise operations

t1 = torch.tensor([[1,2],
                   [3,4]], dtype=torch.float32)
t2 = torch.tensor([[9,8],
                   [7,6]], dtype=torch.float32)

t1 + t2 # equivalent with t1.add(t2)

tensor([[10., 10.],
        [10., 10.]])

t1 + 2 # equivalent with t1.add(2)

tensor([[3., 4.],
        [5., 6.]])

t1 - 2 # equivalent with t1.sub(2)

tensor([[-1.,  0.],
        [ 1.,  2.]])

t1 * 2 # equivalent with t1.mul(2)

tensor([[2., 4.],
        [6., 8.]])

t1 / 2 # equivalent with t1.div(2)

tensor([[0.5000, 1.0000],
        [1.5000, 2.0000]])

Comparison Operation is element-wise type operation

t.eq(0)

tensor([[ True, False,  True],
        [False,  True, False],
        [ True, False,  True]])

t.gt(0)

tensor([[False,  True, False],
        [ True, False,  True],
        [False,  True, False]])

t.lt(0)

tensor([[False, False, False],
        [False, False, False],
        [False, False, False]])

With element-wise operations that are functions, it’s fine to assume that the function is applied to each element of the tensor.

t.abs()

tensor([[0., 1., 0.],
        [2., 0., 2.],
        [0., 3., 0.]])

t.sqrt()

tensor([[0.0000, 1.0000, 0.0000],
        [1.4142, 0.0000, 1.4142],
        [0.0000, 1.7321, 0.0000]])

t.neg()

tensor([[-0., -1., -0.],
        [-2., -0., -2.],
        [-0., -3., -0.]])

t.neg().abs()

tensor([[0., 1., 0.],
        [2., 0., 2.],
        [0., 3., 0.]])

Note: Element-wise = Component-wise = Point-wise

Tensor reduction operations type

A reduction operation on a tensor is an operation that reduces the number of elements contained within the tensor. Tensors give us the ability to manage our data. The tensor can be reduced to a single scalar value or reduced along an axis.

Let’s look at common tensor reduction operations:

import torch

t = torch.tensor([
    [0,1,0],
    [2,0,2],
    [0,3,0]
], dtype=torch.float32)

Reducing to a tensor with a single element

t.sum(), t.prod(), t.mean(), t.std()

(tensor(8.), tensor(0.), tensor(0.8889), tensor(1.1667))

Reducing tensors By Axes

t.sum(dim=0)

tensor([2., 4., 2.])

Argmax tensor reduction operation is very common in neural network.
This operation returns the index location of the maximum value inside a tensor. In practice, we often use the argmax() function on a network’s output prediction tensor, to determine which category has the highest prediction value.

t.max(dim=1)

torch.return_types.max(
values=tensor([1., 2., 3.]),
indices=tensor([1, 2, 1]))

t.argmax(dim=1)

tensor([1, 2, 1])

Tensor access operation type

This operation provides the ability to access data within the tensor.
Common tensor access operations are item(), tolist(), numpy()

t.mean().item()

0.8888888955116272

t.mean(dim=0).tolist()

[0.6666666865348816, 1.3333333730697632, 0.6666666865348816]

t.mean(dim=0).numpy()

array([0.6666667, 1.3333334, 0.6666667], dtype=float32)

Random topics

Object Oriented Programming and Why Pytorch select it.

When we’re writing programs or building software, there are two key components, code and data. With object oriented programming, we orient our program design and structure around objects.
Objects are defined in code using classes. A class defines the object's specification or spec, which specifies what data and code each object of the class should have.
When we create an object of a class, we call the object an instance of the class, and all instances of a given class have two core components:

Methods(code)
Attributes(data)

In a given program, many objects, a.k.a instances of a given class have the same available attributes and the same available methods. The difference between objects of the same class is the values contained within the object for each attribute. Each object has its own attribute values. These values determine the internal state of the object. The code and data of each object is said to be encapsulated within the object.

Let’s build a simple class to demonstrate how classes encapsulate data and code:

class Sample: #class declaration
    def __init__(self, name): #class constructor (code)
        self.name = name #attribute (data)
    
    def set_name(self, name): #method declaration (code)
        self.name = name #method implementation (code)

Let's switch gears now and look at how object oriented programming fits in with PyTorch.

The primary component we'll need to build a neural network is a layer, and so, as we might expect, PyTorch's neural network library contains classes that aid us in constructing layers. As we know, deep neural networks are built using multiple layers. This is what makes the network deep. Each layer in a neural network has two primary components:

A transformation (code)
A collection of weights (data)

Like many things in life, this fact makes layers great candidates to be represented as objects using Object Oriented Programming - OOP.

References

Some good sources:

pytorch zero to all
deeplizard
effective pytorch
what is torch.nn really?
recommend walk with pytorch
official tutorial
DL(with Pytorch)
Pytorch project template
nlp turorial with pytorch
UDACITY course
awesome pytorch list
deep learning with pytorch
others:
- https://medium.com/pytorch/get-started-with-pytorch-cloud-tpus-and-colab-a24757b8f7fc
- Grokking Algorithms: An illustrated guide for programmers and other curious people 1st Edition