Decoders

Base Classes

class curated_transformers.models.DecoderModule(config)

Bases: Generic[ConfigT, CacheT], TransformerModule[ConfigT]

Base class for decoder modules.

property config: ConfigT: Returns the model’s configuration.

abstract forward(piece_ids, attention_mask, *, cache=None, positions=None, store_cache=False)

Apply the decoder to the given piece identifiers.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the decoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
cache (Optional[List[TypeVar(CacheT, bound= CacheProtocol)]]) – Key/value cache to avoid recomputing key/value representations for tokens that were previously seen.
positions (Optional[Tensor]) –
Input positions. Positions are needed to look up position embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
store_cache (bool) – Whether to cache the key/value representations for future reuse.

Return type:

ModelOutputWithCache[TypeVar(CacheT, bound= CacheProtocol)]

Returns:

Decoder output with key/value cache.

class curated_transformers.models.TransformerDecoder(config)

Bases: Generic[ConfigT], DecoderModule[ConfigT, KeyValueCache]

Transformer decoder (Vaswani et al., 2017) base class.

This class provides an implementation of the forward method. Subclasses must set the given member attributes.

property config: ConfigT: Returns the model’s configuration.

forward(piece_ids, attention_mask, *, cache=None, positions=None, store_cache=False)

Apply the decoder to the given piece identifiers.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the decoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
cache (Optional[List[KeyValueCache]]) – Key/value cache to avoid recomputing key/value representations for tokens that were previously seen.
positions (Optional[Tensor]) –
Input positions. Positions are needed to look up position embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
store_cache (bool) – Whether to cache the key/value representations for future reuse.

Return type:

ModelOutputWithCache[KeyValueCache]

Returns:

Decoder output with key/value cache.

Architectures

These modules represent the supported decoder-only architectures.

class curated_transformers.models.FalconDecoder(config, *, device=None)

Bases: TransformerDecoder[FalconConfig], FromHF[FalconConfig]

Falcon (Penedo et al., 2019) decoder.

Construct a Falcon decoder.

Parameters:

config (FalconConfig) – Decoder configuration.
device (Optional[device]) – Device to which the module is to be moved.

Returns:

The decoder.

property config: ConfigT: Returns the model’s configuration.

classmethod config_from_hf(hf_config)

Convert a Hugging Face model configuration to the module’s configuration.

Parameters:: hf_config (Mapping[str, Any]) – The Hugging Face model configuration.
Return type:: FalconConfig
Returns:: The converted Curated Transformer configuration.

classmethod config_to_hf(curated_config)

Convert the module’s configuration to the a Hugging Face model configuration.

Parameters:: curated_config (FalconConfig) – The Curated Transformer model configuration.
Return type:: Mapping[str, Any]
Returns:: The converted Hugging Face configuration.

forward(piece_ids, attention_mask, *, cache=None, positions=None, store_cache=False)

Apply the decoder to the given piece identifiers.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the decoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
cache (Optional[List[KeyValueCache]]) – Key/value cache to avoid recomputing key/value representations for tokens that were previously seen.
positions (Optional[Tensor]) –
Input positions. Positions are needed to look up position embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
store_cache (bool) – Whether to cache the key/value representations for future reuse.

Return type:

ModelOutputWithCache[KeyValueCache]

Returns:

Decoder output with key/value cache.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_fsspec_(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Load parameters from a fsspec filestytem in-place into the model.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_config(*, hf_config, device=None)

Create the module from a Hugging Face model JSON-deserialized model configuration.

Parameters:

hf_config (Any) – Hugging Face model configuration.
device (Optional[device]) – Device on which to initialize the model.

Return type:

TypeVar(Self, bound= FalconDecoder)

Returns:

Module constructed using the configuration.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct a module and load its parameters from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_hf_hub_(*, name, revision='main', device=None, quantization_config=None)

Load parameters from Hugging Face Hub in-place into the model.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

from_repo_(*, repo, device=None, quantization_config=None)

Load parameters from a repository in-place into the model.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

classmethod is_supported(config)

Check if the model with the given configuration is supported by this class.

Parameters:: config (Dict[str, Any]) – Hugging Face model configuration.
Return type:: bool
Returns:: Whether the model is supported by this class.

classmethod state_dict_from_hf(params)

Convert a state dict of a Hugging Face model to a valid state dict for the module.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

classmethod state_dict_to_hf(params)

Convert the state dict of the module to a compatible Hugging Face model’s format.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

class curated_transformers.models.GPTNeoXDecoder(config, *, device=None)

Bases: TransformerDecoder[GPTNeoXConfig], FromHF

GPT-NeoX (Black et al., 2022) decoder.

Construct a GPT-NeoX decoder.

Parameters:

config (GPTNeoXConfig) – Decoder configuration.
device (Optional[device]) – Device to which the module is to be moved.

Returns:

The decoder.

property config: ConfigT: Returns the model’s configuration.

classmethod config_from_hf(hf_config)

Convert a Hugging Face model configuration to the module’s configuration.

Parameters:: hf_config (Mapping[str, Any]) – The Hugging Face model configuration.
Return type:: GPTNeoXConfig
Returns:: The converted Curated Transformer configuration.

classmethod config_to_hf(curated_config)

Convert the module’s configuration to the a Hugging Face model configuration.

Parameters:: curated_config (GPTNeoXConfig) – The Curated Transformer model configuration.
Return type:: Mapping[str, Any]
Returns:: The converted Hugging Face configuration.

forward(piece_ids, attention_mask, *, cache=None, positions=None, store_cache=False)

Apply the decoder to the given piece identifiers.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the decoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
cache (Optional[List[KeyValueCache]]) – Key/value cache to avoid recomputing key/value representations for tokens that were previously seen.
positions (Optional[Tensor]) –
Input positions. Positions are needed to look up position embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
store_cache (bool) – Whether to cache the key/value representations for future reuse.

Return type:

ModelOutputWithCache[KeyValueCache]

Returns:

Decoder output with key/value cache.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_fsspec_(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Load parameters from a fsspec filestytem in-place into the model.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_config(*, hf_config, device=None)

Create the module from a Hugging Face model JSON-deserialized model configuration.

Parameters:

hf_config (Any) – Hugging Face model configuration.
device (Optional[device]) – Device on which to initialize the model.

Return type:

TypeVar(Self, bound= GPTNeoXDecoder)

Returns:

Module constructed using the configuration.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct a module and load its parameters from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_hf_hub_(*, name, revision='main', device=None, quantization_config=None)

Load parameters from Hugging Face Hub in-place into the model.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

from_repo_(*, repo, device=None, quantization_config=None)

Load parameters from a repository in-place into the model.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

classmethod is_supported(config)

Check if the model with the given configuration is supported by this class.

Parameters:: config (Dict[str, Any]) – Hugging Face model configuration.
Return type:: bool
Returns:: Whether the model is supported by this class.

classmethod state_dict_from_hf(params)

Convert a state dict of a Hugging Face model to a valid state dict for the module.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

classmethod state_dict_to_hf(params)

Convert the state dict of the module to a compatible Hugging Face model’s format.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

class curated_transformers.models.LlamaDecoder(config, *, device=None)

Bases: TransformerDecoder[LlamaConfig], FromHF[LlamaConfig]

Llama (Touvron et al., 2023 [a], Touvron et al., 2023 [b]) decoder.

Construct a Llama decoder.

Parameters:

config (LlamaConfig) – Decoder configuration.
device (Optional[device]) – Device to which the module is to be moved.

Returns:

The decoder.

property config: ConfigT: Returns the model’s configuration.

classmethod config_from_hf(hf_config)

Convert a Hugging Face model configuration to the module’s configuration.

Parameters:: hf_config (Mapping[str, Any]) – The Hugging Face model configuration.
Return type:: LlamaConfig
Returns:: The converted Curated Transformer configuration.

classmethod config_to_hf(curated_config)

Convert the module’s configuration to the a Hugging Face model configuration.

Parameters:: curated_config (LlamaConfig) – The Curated Transformer model configuration.
Return type:: Mapping[str, Any]
Returns:: The converted Hugging Face configuration.

forward(piece_ids, attention_mask, *, cache=None, positions=None, store_cache=False)

Apply the decoder to the given piece identifiers.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the decoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
cache (Optional[List[KeyValueCache]]) – Key/value cache to avoid recomputing key/value representations for tokens that were previously seen.
positions (Optional[Tensor]) –
Input positions. Positions are needed to look up position embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
store_cache (bool) – Whether to cache the key/value representations for future reuse.

Return type:

ModelOutputWithCache[KeyValueCache]

Returns:

Decoder output with key/value cache.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_fsspec_(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Load parameters from a fsspec filestytem in-place into the model.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_config(*, hf_config, device=None)

Create the module from a Hugging Face model JSON-deserialized model configuration.

Parameters:

hf_config (Any) – Hugging Face model configuration.
device (Optional[device]) – Device on which to initialize the model.

Return type:

TypeVar(Self, bound= LlamaDecoder)

Returns:

Module constructed using the configuration.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct a module and load its parameters from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_hf_hub_(*, name, revision='main', device=None, quantization_config=None)

Load parameters from Hugging Face Hub in-place into the model.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

from_repo_(*, repo, device=None, quantization_config=None)

Load parameters from a repository in-place into the model.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

classmethod is_supported(config)

Check if the model with the given configuration is supported by this class.

Parameters:: config (Dict[str, Any]) – Hugging Face model configuration.
Return type:: bool
Returns:: Whether the model is supported by this class.

classmethod state_dict_from_hf(params)

Convert a state dict of a Hugging Face model to a valid state dict for the module.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

classmethod state_dict_to_hf(params)

Convert the state dict of the module to a compatible Hugging Face model’s format.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

class curated_transformers.models.MPTDecoder(config, *, device=None)

Bases: TransformerDecoder[MPTConfig], FromHF[MPTConfig]

MosaicML MPT decoder.

Construct an MPT decoder.

Parameters:

config (MPTConfig) – Decoder configuration.
device (Optional[device]) – Device to which the module is to be moved.

Returns:

The decoder.

property config: ConfigT: Returns the model’s configuration.

classmethod config_from_hf(hf_config)

Convert a Hugging Face model configuration to the module’s configuration.

Parameters:: hf_config (Mapping[str, Any]) – The Hugging Face model configuration.
Return type:: MPTConfig
Returns:: The converted Curated Transformer configuration.

classmethod config_to_hf(curated_config)

Convert the module’s configuration to the a Hugging Face model configuration.

Parameters:: curated_config (MPTConfig) – The Curated Transformer model configuration.
Return type:: Mapping[str, Any]
Returns:: The converted Hugging Face configuration.

forward(piece_ids, attention_mask, *, cache=None, positions=None, store_cache=False)

Apply the decoder to the given piece identifiers.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the decoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
cache (Optional[List[KeyValueCache]]) – Key/value cache to avoid recomputing key/value representations for tokens that were previously seen.
positions (Optional[Tensor]) –
Input positions. Positions are needed to look up position embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
store_cache (bool) – Whether to cache the key/value representations for future reuse.

Return type:

ModelOutputWithCache[KeyValueCache]

Returns:

Decoder output with key/value cache.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_fsspec_(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Load parameters from a fsspec filestytem in-place into the model.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_config(*, hf_config, device=None)

Create the module from a Hugging Face model JSON-deserialized model configuration.

Parameters:

hf_config (Any) – Hugging Face model configuration.
device (Optional[device]) – Device on which to initialize the model.

Return type:

TypeVar(Self, bound= MPTDecoder)

Returns:

Module constructed using the configuration.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct a module and load its parameters from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_hf_hub_(*, name, revision='main', device=None, quantization_config=None)

Load parameters from Hugging Face Hub in-place into the model.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

from_repo_(*, repo, device=None, quantization_config=None)

Load parameters from a repository in-place into the model.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

classmethod is_supported(config)

Check if the model with the given configuration is supported by this class.

Parameters:: config (Dict[str, Any]) – Hugging Face model configuration.
Return type:: bool
Returns:: Whether the model is supported by this class.

classmethod state_dict_from_hf(params)

Convert a state dict of a Hugging Face model to a valid state dict for the module.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

classmethod state_dict_to_hf(params)

Convert the state dict of the module to a compatible Hugging Face model’s format.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

Downloading

Each decoder type provides a from_hf_hub function that will load a model from Hugging Face Hub. If you want to load a decoder without committing to a specific decoder type, you can use the AutoDecoder class. This class also provides a from_hf_hub method but will try to infer the correct type automatically.

class curated_transformers.models.AutoDecoder

Decoder module loaded from the Hugging Face Model Hub.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(ModelT)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct and load a model or a generator from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(ModelT)

Returns:

Loaded model or generator.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model or a generator from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

DecoderModule[TransformerConfig, KeyValueCache]

Returns:

Loaded model or generator.

Configuration

Falcon

class curated_transformers.models.FalconConfig(*, attention_probs_dropout_prob=0.0, dtype=torch.bfloat16, hidden_dropout_prob=0.0, hidden_width=2560, layer_norm_eps=1e-05, new_decoder_architecture=False, n_query_heads=71, n_key_value_heads=1, n_hidden_layers=32, rotary_embedding_base=10000, rotary_embedding_fraction=0.25, use_alibi=False, use_bias=False, use_parallel_attention=True, n_pieces=50280)

Falcon (Penedo et al., 2019) model configuration.

Parameters:

attention_probs_dropout_prob (float) – Dropout to apply after attention.
dtype (dtype) – Data type to use for model parameters.
hidden_dropout_prob (float) – Dropout to apply to the hidden and embedding layers.
hidden_width (int) – Hidden width of the transformer.
layer_norm_eps (float) – Epsilon for layer normalization.
n_query_heads (int) – Number of query heads.
n_key_value_heads (int) – Number of key and value heads.
n_hidden_layers (int) – Number of hidden layers.
rotary_embedding_base (int) – Base in signifying the rotary embedding period.
rotary_embedding_fraction (float) – Fraction of hidden width to apply rotary embeddings to. Must be in [0,1].
use_alibi (bool) – Use ALiBi linear biases in self-attention.
use_bias (bool) – Use bias in linear layers.
use_parallel_attention (bool) – Use parallel attention.
n_pieces (int) – Vocabulary size (number of embeddings).

GPT-NeoX

class curated_transformers.models.GPTNeoXConfig(*, attention_probs_dropout_prob=0.0, activation=Activation.GELU, dtype=torch.float16, hidden_dropout_prob=0.0, hidden_width=2560, intermediate_width=10240, layer_norm_eps=1e-05, n_positions=2048, model_max_length=2048, n_attention_heads=32, n_hidden_layers=32, rotary_embedding_base=10000, rotary_embedding_fraction=0.25, n_pieces=50280)

GPT-NeoX (Black et al., 2022) model configuration.

Parameters:

attention_probs_dropout_prob (float) – Dropout to apply after attention.
activation (Activation) – Activation used by the pointwise feed-forward layers.
dtype (dtype) – Data type to use for model parameters.
hidden_dropout_prob (float) – Dropout to apply to the hidden and embedding layers.
hidden_width (int) – Hidden width of the transformer.
intermediate_width (int) – Intermediate width in the feed-forward layer. The non-linearity is applied in this intermediate width.
layer_norm_eps (float) – Epsilon for layer normalization.
n_attention_heads (int) – Number of attention heads.
n_hidden_layers (int) – Number of hidden layers.
rotary_embedding_base (int) – Base in signifying the rotary embedding period.
rotary_embedding_fraction (float) – Fraction of hidden width to apply rotary embeddings to. Must be in [0,1].
n_pieces (int) – Vocabulary size (number of embeddings).

Llama

class curated_transformers.models.LlamaConfig(*, attention_probs_dropout_prob=0.0, activation=Activation.GELU, dtype=torch.float16, hidden_dropout_prob=0.0, hidden_width=2560, intermediate_width=10240, rms_norm_eps=1e-05, n_query_heads=32, n_hidden_layers=32, n_key_value_heads=32, rotary_embedding_base=10000, rotary_embedding_fraction=0.25, n_pieces=50280)

Llama (Touvron et al., 2023 [a], Touvron et al., 2023 [b]) model configuration.

Parameters:

attention_probs_dropout_prob (float) – Dropout to apply after attention.
activation (Activation) – Activation used by the pointwise feed-forward layers.
dtype (dtype) – Data type to use for model parameters.
hidden_dropout_prob (float) – Dropout to apply to the hidden and embedding layers.
hidden_width (int) – Hidden width of the transformer.
intermediate_width (int) – Intermediate width in the feed-forward layer. The non-linearity is applied in this intermediate width.
rms_norm_eps (float) – Epsilon for layer normalization.
n_query_heads (int) – Number of query heads.
n_hidden_layers (int) – Number of hidden layers.
n_key_value_heads (int) – Number of key-value heads.
rotary_embedding_base (int) – Base in signifying the rotary embedding period.
rotary_embedding_fraction (float) – Fraction of hidden width to apply rotary embeddings to. Must be in [0,1].
n_pieces (int) – Vocabulary size (number of embeddings).

MPT

class curated_transformers.models.MPTConfig(*, attention_probs_dropout_prob=0.0, activation=Activation.GELU, dtype=torch.bfloat16, hidden_dropout_prob=0.0, hidden_width=4096, intermediate_width_multiplier=4, layer_norm_eps=1e-05, model_max_length=2048, n_attention_heads=32, n_hidden_layers=32, n_pieces=50432, use_bias=False)

MosaicML MPT model configuration.

Parameters:

attention_probs_dropout_prob (float) – Dropout to apply after attention.
activation (Activation) – Activation used by the pointwise feed-forward layers.
dtype (dtype) – Data type to use for model parameters.
hidden_dropout_prob (float) – Dropout to apply to the hidden and embedding layers.
hidden_width (int) – Hidden width of the transformer.
intermediate_width_multiplier (int) – Multiplier for the intermediate width. The hidden width is multiplied by this value to get the intermediate width.
layer_norm_eps (float) – Epsilon for layer normalization.
model_max_length (int) – Maximum sequence length of the model.
n_attention_heads (int) – Number of attention heads.
n_hidden_layers (int) – Number of hidden layers.
n_pieces (int) – Vocabulary size (number of embeddings).