Encoders

Base Classes

class curated_transformers.models.EncoderModule(config)

Bases: Generic[ConfigT], TransformerModule[ConfigT]

Base class for encoder modules.

property config: ConfigT: Returns the model’s configuration.

abstract forward(piece_ids, attention_mask, *, positions=None, type_ids=None)

Apply the encoder to the input.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the encoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
positions (Optional[Tensor]) –
Input positions. Positions are used to look up position embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
type_ids (Optional[Tensor]) –
Type identifiers to indicate the spans of different sequences in the input. Useful when performing tasks like sequence classification and question answering.

Shape: (batch_size, seq_len)

Return type:

ModelOutput

Returns:

Encoder output.

class curated_transformers.models.TransformerEncoder(config)

Bases: Generic[ConfigT], EncoderModule[ConfigT]

Transformer encoder (Vaswani et al., 2017) base class.

This class provides an implementation of the forward method. Subclasses must set the given member attributes.

property config: ConfigT: Returns the model’s configuration.

forward(piece_ids, attention_mask, *, positions=None, type_ids=None)

Apply the encoder to the input.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the encoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
positions (Optional[Tensor]) –
Input positions. Positions are used to look up position embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
type_ids (Optional[Tensor]) –
Type identifiers to indicate the spans of different sequences in the input. Useful when performing tasks like sequence classification and question answering.

Shape: (batch_size, seq_len)

Return type:

ModelOutput

Returns:

Encoder output.

Architectures

These modules represent the supported encoder-only architectures.

class curated_transformers.models.ALBERTEncoder(config, *, device=None)

Bases: EncoderModule[ALBERTConfig], FromHF[ALBERTConfig]

ALBERT (Lan et al., 2022) encoder.

Construct an ALBERT encoder.

Parameters:

config (ALBERTConfig) – Encoder configuration.
device (Optional[device]) – Device to which the module is to be moved.

Returns:

The encoder.

classmethod config_from_hf(hf_config)

Convert a Hugging Face model configuration to the module’s configuration.

Parameters:: hf_config (Mapping[str, Any]) – The Hugging Face model configuration.
Return type:: ALBERTConfig
Returns:: The converted Curated Transformer configuration.

classmethod config_to_hf(curated_config)

Convert the module’s configuration to the a Hugging Face model configuration.

Parameters:: curated_config (ALBERTConfig) – The Curated Transformer model configuration.
Return type:: Mapping[str, Any]
Returns:: The converted Hugging Face configuration.

forward(piece_ids, attention_mask, *, type_ids=None, positions=None)

Apply the encoder to the input.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the encoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
positions (Optional[Tensor]) –
Input positions. Positions are used to look up position embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
type_ids (Optional[Tensor]) –
Type identifiers to indicate the spans of different sequences in the input. Useful when performing tasks like sequence classification and question answering.

Shape: (batch_size, seq_len)

Return type:

ModelOutput

Returns:

Encoder output.

classmethod from_hf_config(*, hf_config, device=None)

Create the module from a Hugging Face model JSON-deserialized model configuration.

Parameters:

hf_config (Any) – Hugging Face model configuration.
device (Optional[device]) – Device on which to initialize the model.

Return type:

TypeVar(Self, bound= ALBERTEncoder)

Returns:

Module constructed using the configuration.

classmethod is_supported(config)

Check if the model with the given configuration is supported by this class.

Parameters:: config (Dict[str, Any]) – Hugging Face model configuration.
Return type:: bool
Returns:: Whether the model is supported by this class.

classmethod state_dict_from_hf(params)

Convert a state dict of a Hugging Face model to a valid state dict for the module.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

classmethod state_dict_to_hf(params)

Convert the state dict of the module to a compatible Hugging Face model’s format.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

class curated_transformers.models.BERTEncoder(config, *, device=None)

Bases: TransformerEncoder[BERTConfig], FromHF[BERTConfig]

BERT (Devlin et al., 2018) encoder.

Construct a BERT encoder.

Parameters:

config (BERTConfig) – Encoder configuration.
device (Optional[device]) – Device to which the module is to be moved.

Returns:

The encoder.

property config: ConfigT: Returns the model’s configuration.

classmethod config_from_hf(hf_config)

Convert a Hugging Face model configuration to the module’s configuration.

Parameters:: hf_config (Mapping[str, Any]) – The Hugging Face model configuration.
Return type:: BERTConfig
Returns:: The converted Curated Transformer configuration.

classmethod config_to_hf(curated_config)

Convert the module’s configuration to the a Hugging Face model configuration.

Parameters:: curated_config (BERTConfig) – The Curated Transformer model configuration.
Return type:: Mapping[str, Any]
Returns:: The converted Hugging Face configuration.

forward(piece_ids, attention_mask, *, positions=None, type_ids=None)

Apply the encoder to the input.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the encoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
positions (Optional[Tensor]) –
Input positions. Positions are used to look up position embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
type_ids (Optional[Tensor]) –
Type identifiers to indicate the spans of different sequences in the input. Useful when performing tasks like sequence classification and question answering.

Shape: (batch_size, seq_len)

Return type:

ModelOutput

Returns:

Encoder output.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_fsspec_(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Load parameters from a fsspec filestytem in-place into the model.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_config(*, hf_config, device=None)

Create the module from a Hugging Face model JSON-deserialized model configuration.

Parameters:

hf_config (Any) – Hugging Face model configuration.
device (Optional[device]) – Device on which to initialize the model.

Return type:

TypeVar(Self, bound= BERTEncoder)

Returns:

Module constructed using the configuration.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct a module and load its parameters from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_hf_hub_(*, name, revision='main', device=None, quantization_config=None)

Load parameters from Hugging Face Hub in-place into the model.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

from_repo_(*, repo, device=None, quantization_config=None)

Load parameters from a repository in-place into the model.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

classmethod is_supported(config)

Check if the model with the given configuration is supported by this class.

Parameters:: config (Dict[str, Any]) – Hugging Face model configuration.
Return type:: bool
Returns:: Whether the model is supported by this class.

classmethod state_dict_from_hf(params)

Convert a state dict of a Hugging Face model to a valid state dict for the module.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

classmethod state_dict_to_hf(params)

Convert the state dict of the module to a compatible Hugging Face model’s format.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

class curated_transformers.models.CamemBERTEncoder(config, *, device=None)

Bases: RoBERTaEncoder

CamemBERT (Martin et al., 2020) encoder.

Construct a CamemBERT encoder.

Parameters:

config (RoBERTaConfig) – Encoder configuration.
device (Optional[device]) – Device to which the module is to be moved.

Returns:

The encoder.

property config: ConfigT: Returns the model’s configuration.

classmethod config_from_hf(hf_config)

Convert a Hugging Face model configuration to the module’s configuration.

Parameters:: hf_config (Mapping[str, Any]) – The Hugging Face model configuration.
Return type:: RoBERTaConfig
Returns:: The converted Curated Transformer configuration.

classmethod config_to_hf(curated_config)

Convert the module’s configuration to the a Hugging Face model configuration.

Parameters:: curated_config (RoBERTaConfig) – The Curated Transformer model configuration.
Return type:: Mapping[str, Any]
Returns:: The converted Hugging Face configuration.

forward(piece_ids, attention_mask, *, positions=None, type_ids=None)

Apply the encoder to the input.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the encoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
positions (Optional[Tensor]) –
Input positions. Positions are used to look up position embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
type_ids (Optional[Tensor]) –
Type identifiers to indicate the spans of different sequences in the input. Useful when performing tasks like sequence classification and question answering.

Shape: (batch_size, seq_len)

Return type:

ModelOutput

Returns:

Encoder output.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_fsspec_(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Load parameters from a fsspec filestytem in-place into the model.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_config(*, hf_config, device=None)

Create the module from a Hugging Face model JSON-deserialized model configuration.

Parameters:

hf_config (Any) – Hugging Face model configuration.
device (Optional[device]) – Device on which to initialize the model.

Return type:

TypeVar(Self, bound= RoBERTaEncoder)

Returns:

Module constructed using the configuration.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct a module and load its parameters from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_hf_hub_(*, name, revision='main', device=None, quantization_config=None)

Load parameters from Hugging Face Hub in-place into the model.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

from_repo_(*, repo, device=None, quantization_config=None)

Load parameters from a repository in-place into the model.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

classmethod is_supported(config)

Check if the model with the given configuration is supported by this class.

Parameters:: config (Dict[str, Any]) – Hugging Face model configuration.
Return type:: bool
Returns:: Whether the model is supported by this class.

classmethod state_dict_from_hf(params)

Convert a state dict of a Hugging Face model to a valid state dict for the module.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

classmethod state_dict_to_hf(params)

Convert the state dict of the module to a compatible Hugging Face model’s format.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

class curated_transformers.models.RoBERTaEncoder(config, *, device=None)

Bases: TransformerEncoder[RoBERTaConfig], FromHF[RoBERTaConfig]

RoBERTa (Liu et al., 2019) encoder.

Construct a RoBERTa encoder.

Parameters:

config (RoBERTaConfig) – Encoder configuration.
device (Optional[device]) – Device to which the module is to be moved.

Returns:

The encoder.

property config: ConfigT: Returns the model’s configuration.

classmethod config_from_hf(hf_config)

Convert a Hugging Face model configuration to the module’s configuration.

Parameters:: hf_config (Mapping[str, Any]) – The Hugging Face model configuration.
Return type:: RoBERTaConfig
Returns:: The converted Curated Transformer configuration.

classmethod config_to_hf(curated_config)

Convert the module’s configuration to the a Hugging Face model configuration.

Parameters:: curated_config (RoBERTaConfig) – The Curated Transformer model configuration.
Return type:: Mapping[str, Any]
Returns:: The converted Hugging Face configuration.

forward(piece_ids, attention_mask, *, positions=None, type_ids=None)

Apply the encoder to the input.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the encoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
positions (Optional[Tensor]) –
Input positions. Positions are used to look up position embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
type_ids (Optional[Tensor]) –
Type identifiers to indicate the spans of different sequences in the input. Useful when performing tasks like sequence classification and question answering.

Shape: (batch_size, seq_len)

Return type:

ModelOutput

Returns:

Encoder output.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_fsspec_(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Load parameters from a fsspec filestytem in-place into the model.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_config(*, hf_config, device=None)

Create the module from a Hugging Face model JSON-deserialized model configuration.

Parameters:

hf_config (Any) – Hugging Face model configuration.
device (Optional[device]) – Device on which to initialize the model.

Return type:

TypeVar(Self, bound= RoBERTaEncoder)

Returns:

Module constructed using the configuration.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct a module and load its parameters from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_hf_hub_(*, name, revision='main', device=None, quantization_config=None)

Load parameters from Hugging Face Hub in-place into the model.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

from_repo_(*, repo, device=None, quantization_config=None)

Load parameters from a repository in-place into the model.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

classmethod is_supported(config)

Check if the model with the given configuration is supported by this class.

Parameters:: config (Dict[str, Any]) – Hugging Face model configuration.
Return type:: bool
Returns:: Whether the model is supported by this class.

classmethod state_dict_from_hf(params)

Convert a state dict of a Hugging Face model to a valid state dict for the module.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

classmethod state_dict_to_hf(params)

Convert the state dict of the module to a compatible Hugging Face model’s format.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

class curated_transformers.models.XLMREncoder(config, *, device=None)

Bases: RoBERTaEncoder

XLM-RoBERTa (Conneau et al., 2019) encoder.

Construct a XLM-RoBERTa encoder.

Parameters:

config (RoBERTaConfig) – Encoder configuration.
device (Optional[device]) – Device to which the module is to be moved.

Returns:

The encoder.

property config: ConfigT: Returns the model’s configuration.

classmethod config_from_hf(hf_config)

Convert a Hugging Face model configuration to the module’s configuration.

Parameters:: hf_config (Mapping[str, Any]) – The Hugging Face model configuration.
Return type:: RoBERTaConfig
Returns:: The converted Curated Transformer configuration.

classmethod config_to_hf(curated_config)

Convert the module’s configuration to the a Hugging Face model configuration.

Parameters:: curated_config (RoBERTaConfig) – The Curated Transformer model configuration.
Return type:: Mapping[str, Any]
Returns:: The converted Hugging Face configuration.

forward(piece_ids, attention_mask, *, positions=None, type_ids=None)

Apply the encoder to the input.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the encoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
positions (Optional[Tensor]) –
Input positions. Positions are used to look up position embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
type_ids (Optional[Tensor]) –
Type identifiers to indicate the spans of different sequences in the input. Useful when performing tasks like sequence classification and question answering.

Shape: (batch_size, seq_len)

Return type:

ModelOutput

Returns:

Encoder output.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_fsspec_(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Load parameters from a fsspec filestytem in-place into the model.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_config(*, hf_config, device=None)

Create the module from a Hugging Face model JSON-deserialized model configuration.

Parameters:

hf_config (Any) – Hugging Face model configuration.
device (Optional[device]) – Device on which to initialize the model.

Return type:

TypeVar(Self, bound= RoBERTaEncoder)

Returns:

Module constructed using the configuration.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct a module and load its parameters from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_hf_hub_(*, name, revision='main', device=None, quantization_config=None)

Load parameters from Hugging Face Hub in-place into the model.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

from_repo_(*, repo, device=None, quantization_config=None)

Load parameters from a repository in-place into the model.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

classmethod is_supported(config)

Check if the model with the given configuration is supported by this class.

Parameters:: config (Dict[str, Any]) – Hugging Face model configuration.
Return type:: bool
Returns:: Whether the model is supported by this class.

classmethod state_dict_from_hf(params)

Convert a state dict of a Hugging Face model to a valid state dict for the module.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

classmethod state_dict_to_hf(params)

Convert the state dict of the module to a compatible Hugging Face model’s format.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

Downloading

Each encoder type provides a from_hf_hub function that will load a model from Hugging Face Hub. If you want to load an encoder without committing to a specific encoder type, you can use the AutoEncoder class. This class also provides a from_hf_hub method but will try to infer the correct type automatically.

class curated_transformers.models.AutoEncoder

Encoder model loaded from the Hugging Face Model Hub.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(ModelT)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct and load a model or a generator from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(ModelT)

Returns:

Loaded model or generator.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model or a generator from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

EncoderModule[TransformerConfig]

Returns:

Loaded model or generator.

Configuration

ALBERT

class curated_transformers.models.ALBERTConfig(*, dtype=torch.float32, embedding_width=128, hidden_width=768, n_layers_per_group=1, intermediate_width=3072, n_attention_heads=12, n_hidden_layers=12, n_hidden_groups=1, attention_probs_dropout_prob=0.0, hidden_dropout_prob=0.0, activation=Activation.GELUNew, n_pieces=30000, n_types=2, n_positions=512, model_max_length=512, layer_norm_eps=1e-12)

ALBERT (Lan et al., 2022) model configuration.

Parameters:

dtype (dtype) – Data type to use for model parameters.
embedding_width (int) – Width of the embedding representations.
hidden_width (int) – Width of the transformer hidden layers.
n_layers_per_group (int) – Number of layers per layer group.
intermediate_width (int) – Width of the intermediate projection layer in the point-wise feed-forward layer.
n_attention_heads (int) – Number of self-attention heads.
n_hidden_layers (int) – Number of hidden layers.
n_hidden_groups (int) – Number of hidden groups.
attention_probs_dropout_prob (float) – Dropout probabilty of the self-attention layers.
hidden_dropout_prob (float) – Dropout probabilty of the point-wise feed-forward and embedding layers.
activation (Activation) – Activation used by the pointwise feed-forward layers.
n_pieces (int) – Size of main vocabulary.
n_types (int) – Size of token type vocabulary.
n_positions (int) – Maximum length of position embeddings.
model_max_length (int) – Maximum length of model inputs.
layer_norm_eps (float) – Epsilon for layer normalization.

BERT

class curated_transformers.models.BERTConfig(*, dtype=torch.float32, embedding_width=768, hidden_width=768, intermediate_width=3072, n_attention_heads=12, n_hidden_layers=12, attention_probs_dropout_prob=0.1, hidden_dropout_prob=0.1, activation=Activation.GELU, n_pieces=30000, n_types=2, n_positions=512, model_max_length=512, layer_norm_eps=1e-12)

BERT (Devlin et al., 2018) model configuration.

Parameters:

dtype (dtype) – Data type to use for model parameters.
embedding_width (int) – Width of the embedding representations.
hidden_width (int) – Width of the transformer hidden layers.
intermediate_width (int) – Width of the intermediate projection layer in the point-wise feed-forward layer.
n_attention_heads (int) – Number of self-attention heads.
n_hidden_layers (int) – Number of hidden layers.
attention_probs_dropout_prob (float) – Dropout probabilty of the self-attention layers.
hidden_dropout_prob (float) – Dropout probabilty of the point-wise feed-forward and embedding layers.
activation (Activation) – Activation used by the pointwise feed-forward layers.
n_pieces (int) – Size of main vocabulary.
n_types (int) – Size of token type vocabulary.
n_positions (int) – Maximum length of position embeddings.
model_max_length (int) – Maximum length of model inputs.
layer_norm_eps (float) – Epsilon for layer normalization.

CamemBERT

See RoBERTa.

RoBERTa

class curated_transformers.models.RoBERTaConfig(*args, dtype=torch.float32, layer_norm_eps=1e-05, n_positions=514, padding_id=1, n_types=1, n_pieces=50265, **kwargs)

Bases: BERTConfig

RoBERTa (Liu et al., 2019) model configuration.

Parameters:

dtype (dtype) – Data type to use for model parameters.
embedding_width – Width of the embedding representations.
hidden_width – Width of the transformer hidden layers.
intermediate_width – Width of the intermediate projection layer in the point-wise feed-forward layer.
n_attention_heads – Number of self-attention heads.
n_hidden_layers – Number of hidden layers.
attention_probs_dropout_prob – Dropout probabilty of the self-attention layers.
hidden_dropout_prob – Dropout probabilty of the point-wise feed-forward and embedding layers.
activation – Activation used by the pointwise feed-forward layers.
n_pieces – Size of main vocabulary.
n_types – Size of token type vocabulary.
n_positions – Maximum length of position embeddings.
model_max_length – Maximum length of model inputs.
layer_norm_eps – Epsilon for layer normalization.
padding_id – Index of the padding meta-token.

XLM-RoBERTa

See RoBERTa.