# nn¶

crypten.nn provides modules for defining and training neural networks similar to torch.nn.

## From PyTorch to CrypTen¶

The simplest way to create a CrypTen network is to start with a PyTorch network, and use the from_pytorch function to convert it to a CrypTen network. This is particularly useful for pre-trained PyTorch networks that need to be encrypted before use.

crypten.nn.from_pytorch(pytorch_model, dummy_input)

Static function that converts a PyTorch model into a CrypTen model.

Note

In addition to the PyTorch network, the from_pytorch function also requires a dummy input of the shape of the model’s input. The dummy input simply needs to be a torch tensor of the same shape; the values inside the tensor do not matter. For a complete example of how to use from_pytorch function, please see Tutorial 4.

## Custom CrypTen Modules¶

crypten.nn also provides several modules and containers for directly building neural networks. Here is an example of how to use these objects to build a CrypTen network:

model = crypten.nn.Sequential(
[
crypten.nn.Linear(num_inputs, num_intermediate),
crypten.nn.ReLU(),
crypten.nn.Linear(num_intermediate, num_outputs),
]
)


Alternately, you can create a custom CrypTen network in much the same way as you create a custom PyTorch network, i.e., you can subclass crypten.nn.Module and allow it contain other crypten.nn.Module, nesting them in a tree structure. You can assign the submodules as regular attribute modules within them. for example:

class CrypTenModel(crypten.nn.Module):
def __init__(self):
super(CrypTenModel, self).__init__()
self.fc1 = crypten.nn.Linear(20, 5)
self.fc2 = crypten.nn.Linear(5, 2)

def forward(self, x):
x = self.fc1(x)
x = self.fc2(x)
return x


### Generic Modules¶

class crypten.nn.module.Module

Base Module class that mimics the torch.nn.Module class.

decrypt()

Decrypts model.

encrypt(mode=True, src=0)

Encrypts the model.

eval()

Sets the module in evaluation mode.

forward(*args)

Perform forward pass on model.

modules()

Returns iterator over modules.

named_modules()

Returns iterator over named modules (non-recursively).

named_parameters(recurse=True)

Iterator over named parameters.

train(mode=True)

Sets the module in the specified training mode.

update_parameters(learning_rate)

zero_grad()

Sets gradients of all parameters to zero.

class crypten.nn.module.Container

Container allows distinguishing between individual modules and containers.

class crypten.nn.module.Graph(input_name, output_name, modules=None, graph=None)

Acyclic graph of modules.

The module maintains a dict of named modules and a graph structure stored in a dict where each key is a module name, and the associated value is a list of module names that provide the input into the module.

forward(input)

Perform forward pass on model.

class crypten.nn.module.Sequential(module_list)

Sequence of modules.

### Modules for Encrypted Layers¶

class crypten.nn.module.Constant(value, trainable=False)

Modules that returns a constant.

forward(input)

Perform forward pass on model.

class crypten.nn.module.Add

Module that sums two values.

forward(input)

Perform forward pass on model.

class crypten.nn.module.Sub

Module that subtracts two values.

forward(input)

Perform forward pass on model.

class crypten.nn.module.Squeeze(dimension)

Returns a tensor with all the dimensions of input of size 1 removed.

For example, if input is of shape: $$(A \times 1 \times B \times C \times 1 \times D)$$ then the out tensor will be of shape: $$(A \times B \times C \times D)$$.

When dimension is given, a squeeze operation is done only in the given dimension. If input is of shape: $$(A \times 1 \times B)$$, squeeze(input, 0) leaves the tensor unchanged, but squeeze(input, 1) will squeeze the tensor to the shape $$(A \times B)$$.

Note

The returned tensor shares the storage with the input tensor, so changing the contents of one will change the contents of the other.

Parameters

dimension (int, optional) – if given, the input will be squeezed only in this dimension

forward(input)

Perform forward pass on model.

class crypten.nn.module.Unsqueeze(dimension)

Module that unsqueezes a tensor. Returns a new tensor with a dimension of size one inserted at the specified position.

The returned tensor shares the same underlying data with this tensor. A dimension value within the range [-input.dim() - 1, input.dim() + 1) can be used. Negative dimension will correspond to unsqueeze() applied at dimension = dim + input.dim() + 1.

Parameters

dimension (int) – the index at which to insert the singleton dimension

forward(input)

Perform forward pass on model.

class crypten.nn.module.Flatten(axis=1)

Module that flattens the input tensor into a 2D matrix.

Parameters

axis (int, optional) – must not be larger than dimension

forward(x)

Perform forward pass on model.

class crypten.nn.module.Shape

Module that returns the shape of a tensor. If the input tensor is encrypted, the output size vector will be encrypted, too.

forward(x)

Perform forward pass on model.

class crypten.nn.module.Concat(dimension)

Module that concatenates tensors along a dimension.

Parameters

dim (int, optional) – the dimension over which to concatenate

forward(input)

Perform forward pass on model.

class crypten.nn.module.Gather(dimension)

Module that gathers elements from tensor according to indices. Given data tensor of rank $$r >= 1$$, and indices tensor of rank $$q$$, gather entries of the axis dimension of data (by default outer-most one as axis = 0) indexed by indices, and concatenates them in an output tensor of rank $$q + (r - 1)$$. For example, for axis = 0: Let $$k = indices[i_{0}, ..., i_{q-1}]$$. Then $$output[i_{0}, ..., i_{q-1}, j_{0}, ..., j_{r-2}] = input[k, j_{0}, ..., j_{r-2}]$$. This is an operation from the ONNX specification.

Parameters
• dimension (int) – the axis along which to index

• index (tensor) – the indices to select along the dimension

forward(input)

Perform forward pass on model.

class crypten.nn.module.Reshape

Module that reshapes tensors to new dimensions. Returns a tensor with same data and number of elements as self, but with the specified shape.

When possible, the returned tensor will be a view of self. Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior.

See torch.Tensor.view() on when it is possible to return a view. A single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements in self.

Parameters

input (tuple of ints) – the new shape

forward(input)

Perform forward pass on model.

class crypten.nn.module.ConstantPad1d(padding, value, mode='constant')

Module that pads a 1D tensor.

forward(input)

Perform forward pass on model.

class crypten.nn.module.ConstantPad2d(padding, value, mode='constant')

Module that pads a 2D tensor.

forward(input)

Perform forward pass on model.

class crypten.nn.module.ConstantPad3d(padding, value, mode='constant')

Module that pads a 3D tensor.

forward(input)

Perform forward pass on model.

class crypten.nn.module.Linear(in_features, out_features, bias=True)

Module that performs linear transformation. Applies a linear transformation to the incoming data: $$y = xA^T + b$$

Parameters
• in_features – size of each input sample

• out_features – size of each output sample

• bias – If set to False, the layer will not learn an additive bias. Default: True

Shape:
• Input: $$(N, *, H_{in})$$ where $$*$$ means any number of additional dimensions and $$H_{in} = ext{in\_features}$$

• Output: $$(N, *, H_{out})$$ where all but the last dimension are the same shape as the input and $$H_{out} = ext{out\_features}$$.

forward(x)

Perform forward pass on model.

class crypten.nn.module.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, bias=True)

Module that performs 2D convolution.

Applies a 2D convolution over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size $$(N, C_{\text{in}}, H, W)$$ and output $$(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})$$ can be precisely described as:

$\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)$

where $$\star$$ is the valid 2D cross-correlation operator, $$N$$ is a batch size, $$C$$ denotes a number of channels, $$H$$ is a height of input planes in pixels, and $$W$$ is width in pixels.

• stride controls the stride for the cross-correlation, a single number or a tuple.

• padding controls the amount of implicit zero-paddings on both sides for padding number of points for each dimension.

• dilation controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does.

• groups controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups. For example,

• At groups=1, all inputs are convolved to all outputs.

• At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.

• At groups= in_channels, each input channel is convolved with its own set of filters, of size: $$\left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor$$.

The parameters kernel_size, stride, padding, dilation can either be:

• a single int – in which case the same value is used for the height and width dimension

• a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension

Parameters
• in_channels (int) – Number of channels in the input image

• out_channels (int) – Number of channels produced by the convolution

• kernel_size (int or tuple) – Size of the convolving kernel

• stride (int or tuple, optional) – Stride of the convolution. Default: 1

• padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0

• padding_mode (string, optional) – zeros

• dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1

• bias (bool, optional) – If True, adds a learnable bias to the output. Default: True

Shape:
• Input: $$(N, C_{in}, H_{in}, W_{in})$$

• Output: $$(N, C_{out}, H_{out}, W_{out})$$ where

$H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor$
$W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor$
forward(x)

Perform forward pass on model.

class crypten.nn.module.ConstantPad1d(padding, value, mode='constant')

Module that pads a 1D tensor.

forward(input)

Perform forward pass on model.

class crypten.nn.module.AvgPool2d(kernel_size, stride=1, padding=0)

Module that Applies a 2D average pooling over an input signal composed of several input planes.

In the simplest case, the output value of the layer with input size $$(N, C, H, W)$$, output $$(N, C, H_{out}, W_{out})$$ and kernel_size $$(kH, kW)$$ can be precisely described as:

$out(N_i, C_j, h, w) = \frac{1}{kH * kW} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} input(N_i, C_j, stride[0] \times h + m, stride[1] \times w + n)$

If padding is non-zero, then the input is implicitly zero-padded on both sides for padding number of points.

The parameters kernel_size, stride, padding can either be:

• a single int – in which case the same value is used for the height and width dimension

• a tuple of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension

Parameters
• kernel_size – the size of the window

• stride – the stride of the window. Default value is kernel_size

Shape:
• Input: $$(N, C, H_{in}, W_{in})$$

• Output: $$(N, C, H_{out}, W_{out})$$, where

$H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{kernel\_size}[0]}{\text{stride}[0]} + 1\right\rfloor$
$W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{kernel\_size}[1]}{\text{stride}[1]} + 1\right\rfloor$
forward(x)

Perform forward pass on model.

class crypten.nn.module.MaxPool2d(kernel_size, stride=1, padding=0)

Module that performs 2D max pooling (see AvgPool2d())

forward(x)

Perform forward pass on model.

class crypten.nn.module.GlobalAveragePool

GlobalAveragePool consumes an input tensor and applies average pooling across the values in the same channel. This is equivalent to AveragePool with kernel size equal to the spatial dimension of input tensor. This is an operation from the ONNX specification.

forward(input)

Perform forward pass on model.

class crypten.nn.module.BatchNorm1d(num_features, eps=1e-05, momentum=0.1)

Module that performs batch normalization on 1D tensors.

forward(input)

Perform forward pass on model.

class crypten.nn.module.BatchNorm2d(num_features, eps=1e-05, momentum=0.1)

Module that performs batch normalization on 2D tensors.

forward(input)

Perform forward pass on model.

class crypten.nn.module.BatchNorm3d(num_features, eps=1e-05, momentum=0.1)

Module that performs batch normalization on 3D tensors.

forward(input)

Perform forward pass on model.

class crypten.nn.module.Dropout(p=0.5, inplace=False)

During training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution. Furthermore, the outputs are scaled by a factor of $$\frac{1}{1-p}$$ during training. This means that during evaluation the module simply computes an identity function.

Parameters

p – probability of an element to be zeroed. Default: 0.5

Shape:
• Input: $$(*)$$. Input can be of any shape

• Output: $$(*)$$. Output is of the same shape as input

forward(input)

Perform forward pass on model.

class crypten.nn.module.Dropout2d(p=0.5, inplace=False)

Randomly zero out entire channels (a channel is a 2D feature map, e.g., the $$j$$-th channel of the $$i$$-th sample in the batched input is a 2D tensor $$\text{input}[i, j]$$). Each channel will be zeroed out independently on every forward call with probability p using samples from a Bernoulli distribution.

Usually the input comes from nn.Conv2d modules.

Parameters
• p (float, optional) – probability of an element to be zero-ed.

• inplace (bool, optional) – If set to True, will do this operation in-place

Shape:
• Input: $$(N, C, H, W)$$

• Output: $$(N, C, H, W)$$ (same shape as input)

forward(input)

Perform forward pass on model.

class crypten.nn.module.Dropout3d(p=0.5, inplace=False)

Randomly zero out entire channels (a channel is a 3D feature map, e.g., the $$j$$-th channel of the $$i$$-th sample in the batched input is a 3D tensor $$\text{input}[i, j]$$). Each channel will be zeroed out independently on every forward call with probability p using samples from a Bernoulli distribution.

Usually the input comes from nn.Conv3d modules.

Parameters

p (float, optional) – probability of an element to be zeroed.

Shape:
• Input: $$(N, C, D, H, W)$$

• Output: $$(N, C, D, H, W)$$ (same shape as input)

forward(input)

Perform forward pass on model.

class crypten.nn.module.DropoutNd(p=0.5, inplace=False)

Randomly zero out entire channels (a channel is a nD feature map, e.g., the $$j$$-th channel of the $$i$$-th sample in the batched input is a nD tensor :math: ext{input}[i, j]). Each channel will be zeroed out independently on every forward call with probability p using samples from a Bernoulli distribution.

Parameters

p (float, optional) – probability of an element to be zero-ed.

forward(input)

Perform forward pass on model.

class crypten.nn.module.Softmax(dim)

Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1.

Softmax is defined as:

$\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}$
Shape:
• Input: $$(*)$$ where * means, any number of additional dimensions

• Output: $$(*)$$, same shape as the input

Returns

a Tensor of the same dimension and shape as the input with values in the range [0, 1]

Parameters

dim (int) – A dimension along which Softmax will be computed (so every slice along dim will sum to 1).

forward(input)

Perform forward pass on model.

class crypten.nn.module.LogSoftmax(dim)

Applies the $$\log(\text{Softmax}(x))$$ function to an n-dimensional input Tensor. The LogSoftmax formulation can be simplified as:

$\text{LogSoftmax}(x_{i}) = \log\left(\frac{\exp(x_i) }{ \sum_j \exp(x_j)} \right)$
Shape:
• Input: $$(*)$$ where * means, any number of additional dimensions

• Output: $$(*)$$, same shape as the input

Parameters

dim (int) – A dimension along which LogSoftmax will be computed.

Returns

a Tensor of the same dimension and shape as the input with values in the range [-inf, 0)

forward(input)

Perform forward pass on model.

## Loss Functions¶

CrypTen also provides a number of encrypted loss functions similar to torch.nn.

class crypten.nn.loss.BCELoss(reduction='mean')

Creates a criterion that measures the Binary Cross Entropy between the prediction $$x$$ and the target $$y$$.

The loss can be described as:

$\ell(x, y) = mean(L) = mean(\{l_1,\dots,l_N\}^\top), \quad l_n = - \left [ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right ],$

where $$N$$ is the batch size, $$x$$ and $$y$$ are tensors of arbitrary shapes with a total of $$n$$ elements each.

This is used for measuring the error of a reconstruction in for example an auto-encoder. Note that the targets $$y$$ should be numbers between 0 and 1.

forward(x, y)

Perform forward pass on model.

class crypten.nn.loss.CrossEntropyLoss(reduction='mean')

Creates a criterion that measures cross-entropy loss between the prediction $$x$$ and the target $$y$$. It is useful when training a classification problem with C classes.

The prediction x is expected to contain raw, unnormalized scores for each class.

The prediction x has to be a Tensor of size either $$(N, C)$$ or $$(N, C, d_1, d_2, ..., d_K)$$, where $$N$$ is the size of the minibatch, and with $$K \geq 1$$ for the K-dimensional case (described later).

This criterion expects a class index in the range $$[0, C-1]$$ as the target y for each value of a 1D tensor of size N.

The loss can be described as:

$\text{loss}(x, class) = -\log \left( \frac{\exp(x[class])}{\sum_j \exp(x[j])} \right ) = -x[class] + \log \left (\sum_j \exp(x[j]) \right)$

The losses are averaged across observations for each batch

Can also be used for higher dimension inputs, such as 2D images, by providing an input of size $$(N, C, d_1, d_2, ..., d_K)$$ with $$K \geq 1$$, where $$K$$ is the number of dimensions, and a target of appropriate shape.

forward(x, y)

Perform forward pass on model.

class crypten.nn.loss.L1Loss(reduction='mean')

Creates a criterion that measures the mean absolute error between each element in the prediction $$x$$ and target $$y$$.

The loss can be described as:

$\ell(x, y) = mean(L) = mean(\{l_1,\dots,l_N\}^\top), \quad l_n = \left | x_n - y_n \right |,$

where $$N$$ is the batch size, $$x$$ and $$y$$ are tensors of arbitrary shapes with a total of $$n$$ elements each.

forward(x, y)

Perform forward pass on model.

class crypten.nn.loss.MSELoss(reduction='mean')

Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the prediction $$x$$ and target $$y$$.

The loss can be described as:

$\ell(x, y) = mean(L) = mean(\{l_1,\dots,l_N\}^\top), \quad l_n = (x_n - y_n)^2,$

where $$N$$ is the batch size, $$x$$ and $$y$$ are tensors of arbitrary shapes with a total of $$n$$ elements each.

forward(x, y)

Perform forward pass on model.