nn¶
crypten.nn
provides modules for defining and training neural
networks similar to torch.nn
.
From PyTorch to CrypTen¶
The simplest way to create a CrypTen network is to start with a
PyTorch network, and use the from_pytorch
function to convert it
to a CrypTen network. This is particularly useful for pre-trained
PyTorch networks that need to be encrypted before use.
-
crypten.nn.
from_pytorch
(pytorch_model, dummy_input)¶ Static function that converts a PyTorch model into a CrypTen model.
Note
In addition to the PyTorch network, the from_pytorch function
also requires a dummy input of the shape of the model’s input.
The dummy input simply needs to be a torch tensor of the same
shape; the values inside the tensor do not matter. For a complete
example of how to use from_pytorch
function, please see
Tutorial 4.
Custom CrypTen Modules¶
crypten.nn
also provides several modules and containers
for directly building neural networks. Here is an example of how
to use these objects to build a CrypTen network:
model = crypten.nn.Sequential(
crypten.nn.Linear(num_inputs, num_intermediate),
crypten.nn.ReLU(),
crypten.nn.Linear(num_intermediate, num_outputs),
)
Alternately, you can create a custom CrypTen network in much
the same way as you create a custom PyTorch network, i.e., you
can subclass crypten.nn.Module
and allow it contain other
crypten.nn.Module
, nesting them in a tree structure. You can assign
the submodules as regular attribute modules within them. for example:
class CrypTenModel(crypten.nn.Module):
def __init__(self):
super(CrypTenModel, self).__init__()
self.fc1 = crypten.nn.Linear(20, 5)
self.fc2 = crypten.nn.Linear(5, 2)
def forward(self, x):
x = self.fc1(x)
x = self.fc2(x)
return x
Generic Modules
¶
-
class
crypten.nn.module.
Module
¶ Base Module class that mimics the torch.nn.Module class.
-
decrypt
()¶ Decrypts model.
-
encrypt
(mode=True, src=0)¶ Encrypts the model.
-
eval
()¶ Sets the module in evaluation mode.
-
forward
(*args, **kwargs)¶ Perform forward pass on model.
-
modules
()¶ Returns iterator over modules (non-recursively).
-
named_modules
()¶ Returns iterator over named modules (non-recursively).
-
named_parameters
(recurse=True, prefix=None)¶ Iterator over named parameters.
-
train
(mode=True)¶ Sets the module in the specified training mode.
-
update_parameters
(learning_rate, grad_threshold=100)¶ Performs gradient step on parameters.
- Parameters
- Because arithmetic operations can extremely rarely (grad_threshold) – return large incorrect results, we zero-out all elements with magnitude larger than this given threshold. To turn off thresholding, set to None.
-
zero_grad
()¶ Sets gradients of all parameters to zero.
-
-
class
crypten.nn.module.
Container
¶ Container allows distinguishing between individual modules and containers.
-
class
crypten.nn.module.
Graph
(input_name, output_name, modules=None, graph=None)¶ Acyclic graph of modules.
The module maintains a dict of named modules and a graph structure stored in a dict where each key is a module name, and the associated value is a list of module names that provide the input into the module.
-
forward
(input)¶ Perform forward pass on model.
-
-
class
crypten.nn.module.
Sequential
(*module_list)¶ Sequence of modules.
Modules
for Encrypted Layers¶
-
class
crypten.nn.module.
Constant
(value, trainable=False)¶ Modules that returns a constant.
-
forward
(input)¶ Perform forward pass on model.
-
-
class
crypten.nn.module.
Add
¶ Module that sums two values.
-
forward
(input)¶ Perform forward pass on model.
-
-
class
crypten.nn.module.
Sub
¶ Module that subtracts two values.
-
forward
(input)¶ Perform forward pass on model.
-
-
class
crypten.nn.module.
Squeeze
(dimension)¶ Returns a tensor with all the dimensions of
input
of size 1 removed.For example, if input is of shape: \((A \times 1 \times B \times C \times 1 \times D)\) then the out tensor will be of shape: \((A \times B \times C \times D)\).
When
dimension
is given, a squeeze operation is done only in the given dimension. If input is of shape: \((A \times 1 \times B)\),squeeze(input, 0)
leaves the tensor unchanged, butsqueeze(input, 1)
will squeeze the tensor to the shape \((A \times B)\).Note
The returned tensor shares the storage with the input tensor, so changing the contents of one will change the contents of the other.
- Parameters
dimension (int, optional) – if given, the input will be squeezed only in this dimension
-
forward
(input)¶ Perform forward pass on model.
-
class
crypten.nn.module.
Unsqueeze
(dimension)¶ Module that unsqueezes a tensor. Returns a new tensor with a dimension of size one inserted at the specified position.
The returned tensor shares the same underlying data with this tensor. A
dimension
value within the range[-input.dim() - 1, input.dim() + 1)
can be used. Negativedimension
will correspond tounsqueeze()
applied atdimension
=dim + input.dim() + 1
.- Parameters
dimension (int) – the index at which to insert the singleton dimension
-
forward
(input)¶ Perform forward pass on model.
-
class
crypten.nn.module.
Flatten
(axis=1)¶ Module that flattens the input tensor into a 2D matrix.
- Parameters
axis (int, optional) – must not be larger than dimension
-
forward
(x)¶ Perform forward pass on model.
-
class
crypten.nn.module.
Shape
¶ Module that returns the shape of a tensor. If the input tensor is encrypted, the output size vector will be encrypted, too.
-
forward
(x)¶ Perform forward pass on model.
-
-
class
crypten.nn.module.
Concat
(dimension)¶ Module that concatenates tensors along a dimension.
- Parameters
dim (int, optional) – the dimension over which to concatenate
-
forward
(input)¶ Perform forward pass on model.
-
class
crypten.nn.module.
Gather
(dimension)¶ Module that gathers elements from tensor according to indices. Given data tensor of rank \(r >= 1\), and indices tensor of rank \(q\), gather entries of the axis dimension of data (by default outer-most one as axis = 0) indexed by indices, and concatenates them in an output tensor of rank \(q + (r - 1)\). For example, for axis = 0: Let \(k = indices[i_{0}, ..., i_{q-1}]\). Then \(output[i_{0}, ..., i_{q-1}, j_{0}, ..., j_{r-2}] = input[k, j_{0}, ..., j_{r-2}]\). This is an operation from the ONNX specification.
- Parameters
dimension (int) – the axis along which to index
index (tensor) – the indices to select along the dimension
-
forward
(input)¶ Perform forward pass on model.
-
class
crypten.nn.module.
Reshape
(shape)¶ Module that reshapes tensors to new dimensions. Returns a tensor with same data and number of elements as
self
, but with the specified shape.When possible, the returned tensor will be a view of
self
. Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior.See
torch.Tensor.view()
on when it is possible to return a view. A single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements inself
.- Parameters
input (tuple) – contains input tensor and shape (torch.Size)
-
forward
(tensor)¶ Perform forward pass on model.
-
class
crypten.nn.module.
ConstantPad1d
(padding, value, mode='constant')¶ Module that pads a 1D tensor.
-
forward
(input)¶ Perform forward pass on model.
-
-
class
crypten.nn.module.
ConstantPad2d
(padding, value, mode='constant')¶ Module that pads a 2D tensor.
-
forward
(input)¶ Perform forward pass on model.
-
-
class
crypten.nn.module.
ConstantPad3d
(padding, value, mode='constant')¶ Module that pads a 3D tensor.
-
forward
(input)¶ Perform forward pass on model.
-
-
class
crypten.nn.module.
Linear
(in_features, out_features, bias=True)¶ Module that performs linear transformation. Applies a linear transformation to the incoming data: \(y = xA^T + b\)
- Parameters
in_features – size of each input sample
out_features – size of each output sample
bias – If set to
False
, the layer will not learn an additive bias. Default:True
- Shape:
Input: \((N, *, H_{in})\) where \(*\) means any number of additional dimensions and \(H_{in} = ext{in\_features}\)
Output: \((N, *, H_{out})\) where all but the last dimension are the same shape as the input and \(H_{out} = ext{out\_features}\).
-
forward
(x)¶ Perform forward pass on model.
-
class
crypten.nn.module.
Conv2d
(in_channels, out_channels, kernel_size, stride=1, padding=0, bias=True)¶ Module that performs 2D convolution.
Applies a 2D convolution over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C_{\text{in}}, H, W)\) and output \((N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})\) can be precisely described as:
\[\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)\]where \(\star\) is the valid 2D cross-correlation operator, \(N\) is a batch size, \(C\) denotes a number of channels, \(H\) is a height of input planes in pixels, and \(W\) is width in pixels.
stride
controls the stride for the cross-correlation, a single number or a tuple.padding
controls the amount of implicit zero-paddings on both sides forpadding
number of points for each dimension.dilation
controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of whatdilation
does.groups
controls the connections between inputs and outputs.in_channels
andout_channels
must both be divisible bygroups
. For example,At groups=1, all inputs are convolved to all outputs.
At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels, and producing half the output channels, and both subsequently concatenated.
At groups=
in_channels
, each input channel is convolved with its own set of filters, of size: \(\left\lfloor\frac{out\_channels}{in\_channels}\right\rfloor\).
The parameters
kernel_size
,stride
,padding
,dilation
can either be:a single
int
– in which case the same value is used for the
height and width dimension - a
tuple
of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension- Parameters
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) – Zero-padding added to both sides of
input. Default (the) – 0
padding_mode (string, optional) –
Default – zeros
dilation (int or tuple, optional) – Spacing between kernel elements.
Default – 1
bias (bool, optional) – If
True
, adds a learnable bias to the output.Default –
True
- Shape:
Input: \((N, C_{in}, H_{in}, W_{in})\)
Output: \((N, C_{out}, H_{out}, W_{out})\) where
\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor\]
-
forward
(x)¶ Perform forward pass on model.
-
class
crypten.nn.module.
ConstantPad1d
(padding, value, mode='constant') Module that pads a 1D tensor.
-
forward
(input) Perform forward pass on model.
-
-
class
crypten.nn.module.
AvgPool2d
(kernel_size, stride=None, padding=0)¶ Module that Applies a 2D average pooling over an input signal composed of several input planes.
In the simplest case, the output value of the layer with input size \((N, C, H, W)\), output \((N, C, H_{out}, W_{out})\) and
kernel_size
\((kH, kW)\) can be precisely described as:\[out(N_i, C_j, h, w) = \frac{1}{kH * kW} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} input(N_i, C_j, stride[0] \times h + m, stride[1] \times w + n)\]If
padding
is non-zero, then the input is implicitly zero-padded on both sides forpadding
number of points.The parameters
kernel_size
,stride
,padding
can either be:a single
int
– in which case the same value is used for the
height and width dimension - a
tuple
of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension- Parameters
kernel_size – the size of the window
stride – the stride of the window. Default value is
kernel_size
padding – implicit zero padding to be added on both sides
- Shape:
Input: \((N, C, H_{in}, W_{in})\)
Output: \((N, C, H_{out}, W_{out})\), where
\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{kernel\_size}[0]}{\text{stride}[0]} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{kernel\_size}[1]}{\text{stride}[1]} + 1\right\rfloor\]
-
forward
(x)¶ Perform forward pass on model.
-
class
crypten.nn.module.
MaxPool2d
(kernel_size, stride=None, padding=0)¶ Module that performs 2D max pooling (see
AvgPool2d()
)-
forward
(x)¶ Perform forward pass on model.
-
-
class
crypten.nn.module.
GlobalAveragePool
¶ GlobalAveragePool consumes an input tensor and applies average pooling across the values in the same channel. This is equivalent to AveragePool with kernel size equal to the spatial dimension of input tensor. This is an operation from the ONNX specification.
-
forward
(input)¶ Perform forward pass on model.
-
-
class
crypten.nn.module.
BatchNorm1d
(num_features, eps=1e-05, momentum=0.1)¶ Module that performs batch normalization on 1D tensors.
-
forward
(input)¶ Perform forward pass on model.
-
-
class
crypten.nn.module.
BatchNorm2d
(num_features, eps=1e-05, momentum=0.1)¶ Module that performs batch normalization on 2D tensors.
-
forward
(input)¶ Perform forward pass on model.
-
-
class
crypten.nn.module.
BatchNorm3d
(num_features, eps=1e-05, momentum=0.1)¶ Module that performs batch normalization on 3D tensors.
-
forward
(input)¶ Perform forward pass on model.
-
-
class
crypten.nn.module.
Dropout
(p=0.5)¶ During training, randomly zeroes some of the elements of the input tensor with probability
p
using samples from a Bernoulli distribution. Furthermore, the outputs are scaled by a factor of \(\frac{1}{1-p}\) during training. This means that during evaluation the module simply computes an identity function.- Parameters
p – probability of an element to be zeroed. Default: 0.5
- Shape:
Input: \((*)\). Input can be of any shape
Output: \((*)\). Output is of the same shape as input
-
forward
(input)¶ Perform forward pass on model.
-
class
crypten.nn.module.
Dropout2d
(p=0.5)¶ Randomly zero out entire channels (a channel is a 2D feature map, e.g., the \(j\)-th channel of the \(i\)-th sample in the batched input is a 2D tensor \(\text{input}[i, j]\)). Each channel will be zeroed out independently on every forward call with probability
p
using samples from a Bernoulli distribution.Usually the input comes from
nn.Conv2d
modules.- Parameters
p (float, optional) – probability of an element to be zero-ed.
- Shape:
Input: \((N, C, H, W)\)
Output: \((N, C, H, W)\) (same shape as input)
-
forward
(input)¶ Perform forward pass on model.
-
class
crypten.nn.module.
Dropout3d
(p=0.5)¶ Randomly zero out entire channels (a channel is a 3D feature map, e.g., the \(j\)-th channel of the \(i\)-th sample in the batched input is a 3D tensor \(\text{input}[i, j]\)). Each channel will be zeroed out independently on every forward call with probability
p
using samples from a Bernoulli distribution.Usually the input comes from
nn.Conv3d
modules.- Parameters
p (float, optional) – probability of an element to be zeroed.
- Shape:
Input: \((N, C, D, H, W)\)
Output: \((N, C, D, H, W)\) (same shape as input)
-
forward
(input)¶ Perform forward pass on model.
-
class
crypten.nn.module.
DropoutNd
(p=0.5)¶ Randomly zero out entire channels (a channel is a nD feature map, e.g., the \(j\)-th channel of the \(i\)-th sample in the batched input is a nD tensor :math:` ext{input}[i, j]`). Each channel will be zeroed out independently on every forward call with probability
p
using samples from a Bernoulli distribution.- Parameters
p (float, optional) – probability of an element to be zero-ed.
-
forward
(input)¶ Perform forward pass on model.
-
class
crypten.nn.module.
Softmax
(dim)¶ Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1.
Softmax is defined as:
\[\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)}\]- Shape:
Input: \((*)\) where * means, any number of additional dimensions
Output: \((*)\), same shape as the input
- Returns
a Tensor of the same dimension and shape as the input with values in the range [0, 1]
- Parameters
dim (int) – A dimension along which Softmax will be computed (so every slice along dim will sum to 1).
-
forward
(input)¶ Perform forward pass on model.
-
class
crypten.nn.module.
LogSoftmax
(dim)¶ Applies the \(\log(\text{Softmax}(x))\) function to an n-dimensional input Tensor. The LogSoftmax formulation can be simplified as:
\[\text{LogSoftmax}(x_{i}) = \log\left(\frac{\exp(x_i) }{ \sum_j \exp(x_j)} \right)\]- Shape:
Input: \((*)\) where * means, any number of additional dimensions
Output: \((*)\), same shape as the input
- Parameters
dim (int) – A dimension along which LogSoftmax will be computed.
- Returns
a Tensor of the same dimension and shape as the input with values in the range [-inf, 0)
-
forward
(input)¶ Perform forward pass on model.
Loss Functions¶
CrypTen also provides a number of encrypted loss functions similar to torch.nn.
-
class
crypten.nn.loss.
BCELoss
(reduction='mean', skip_forward=False)¶ Creates a criterion that measures the Binary Cross Entropy between the prediction \(x\) and the target \(y\).
The loss can be described as:
\[\ell(x, y) = mean(L) = mean(\{l_1,\dots,l_N\}^\top), \quad l_n = - \left [ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right ],\]where \(N\) is the batch size, \(x\) and \(y\) are tensors of arbitrary shapes with a total of \(n\) elements each.
This is used for measuring the error of a reconstruction in for example an auto-encoder. Note that the targets \(y\) should be numbers between 0 and 1.
-
forward
(x, y)¶ Perform forward pass on model.
-
-
class
crypten.nn.loss.
BCEWithLogitsLoss
(reduction='mean', skip_forward=False)¶ This loss combines a Sigmoid layer and the BCELoss in one single class.
The loss can be described as:
\[\ell(x, y) = mean(L) = mean(\{l_1,\dots,l_N\}^\top), \quad l_n = - \left [ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right ],\]This is used for measuring the error of a reconstruction in for example an auto-encoder. Note that the targets t[i] should be numbers between 0 and 1.
-
forward
(x, y)¶ Perform forward pass on model.
-
-
class
crypten.nn.loss.
CrossEntropyLoss
(reduction='mean', skip_forward=False)¶ Creates a criterion that measures cross-entropy loss between the prediction \(x\) and the target \(y\). It is useful when training a classification problem with C classes.
The prediction x is expected to contain raw, unnormalized scores for each class.
The prediction x has to be a Tensor of size either \((N, C)\) or \((N, C, d_1, d_2, ..., d_K)\), where \(N\) is the size of the minibatch, and with \(K \geq 1\) for the K-dimensional case (described later).
This criterion expects a class index in the range \([0, C-1]\) as the target y for each value of a 1D tensor of size N.
The loss can be described as:
\[\text{loss}(x, class) = -\log \left( \frac{\exp(x[class])}{\sum_j \exp(x[j])} \right ) = -x[class] + \log \left (\sum_j \exp(x[j]) \right)\]The losses are averaged across observations for each batch
Can also be used for higher dimension inputs, such as 2D images, by providing an input of size \((N, C, d_1, d_2, ..., d_K)\) with \(K \geq 1\), where \(K\) is the number of dimensions, and a target of appropriate shape.
-
forward
(x, y)¶ Perform forward pass on model.
-
-
class
crypten.nn.loss.
L1Loss
(reduction='mean', skip_forward=False)¶ Creates a criterion that measures the mean absolute error between each element in the prediction \(x\) and target \(y\).
The loss can be described as:
\[\ell(x, y) = mean(L) = mean(\{l_1,\dots,l_N\}^\top), \quad l_n = \left | x_n - y_n \right |,\]where \(N\) is the batch size, \(x\) and \(y\) are tensors of arbitrary shapes with a total of \(n\) elements each.
-
forward
(x, y)¶ Perform forward pass on model.
-
-
class
crypten.nn.loss.
MSELoss
(reduction='mean', skip_forward=False)¶ Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the prediction \(x\) and target \(y\).
The loss can be described as:
\[\ell(x, y) = mean(L) = mean(\{l_1,\dots,l_N\}^\top), \quad l_n = (x_n - y_n)^2,\]where \(N\) is the batch size, \(x\) and \(y\) are tensors of arbitrary shapes with a total of \(n\) elements each.
-
forward
(x, y)¶ Perform forward pass on model.
-