Causal Language Models

Base Classes

class curated_transformers.models.CausalLMModule(config)

Bases: Generic[ConfigT, CacheT], TransformerModule[ConfigT]

Base class for causal language model modules.

property config: ConfigT: Returns the model’s configuration.

abstract forward(piece_ids, attention_mask, *, cache=None, positions=None, store_cache=False)

Apply the causal language model to the given piece identifiers.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the decoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
cache (Optional[List[TypeVar(CacheT, bound= CacheProtocol)]]) – Key/value cache to avoid recomputing key/value representations for tokens that were previously seen.
positions (Optional[Tensor]) –
Input positions. Positions are needed to look up rotary embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
store_cache (bool) – Whether to cache the key/value representations for future reuse.

Return type:

CausalLMOutputWithCache[TypeVar(CacheT, bound= CacheProtocol)]

Returns:

Causal language model output with key/value cache.

class curated_transformers.models.TransformerCausalLM(config)

Bases: Generic[ConfigT], CausalLMModule[ConfigT, KeyValueCache]

Transformer causal LM (Vaswani et al., 2017) base class.

This class provides an implementation of the forward method. Subclasses must set the given member attributes..

property config: ConfigT: Returns the model’s configuration.

forward(piece_ids, attention_mask, cache=None, positions=None, store_cache=False)

Apply the causal language model to the given piece identifiers.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the decoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
cache (Optional[List[KeyValueCache]]) – Key/value cache to avoid recomputing key/value representations for tokens that were previously seen.
positions (Optional[Tensor]) –
Input positions. Positions are needed to look up rotary embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
store_cache (bool) – Whether to cache the key/value representations for future reuse.

Return type:

CausalLMOutputWithCache[KeyValueCache]

Returns:

Causal language model output with key/value cache.

Architectures

These modules represent the supported causal LM architectures. Generally, every decoder-only architecture has a corresponding causal LM architecture.

class curated_transformers.models.FalconCausalLM(config, *, device=None)

Bases: TransformerCausalLM[FalconConfig], FromHF[FalconConfig], Quantizable

Falcon (Penedo et al., 2019) causal language model.

Construct a Falcon causal LM.

Parameters:

config (FalconConfig) – Causal LM configuration.
device (Optional[device]) – Device to which the module is to be moved.

Returns:

The causal LM.

property config: ConfigT: Returns the model’s configuration.

classmethod config_from_hf(hf_config)

Convert a Hugging Face model configuration to the module’s configuration.

Parameters:: hf_config (Mapping[str, Any]) – The Hugging Face model configuration.
Return type:: FalconConfig
Returns:: The converted Curated Transformer configuration.

classmethod config_to_hf(curated_config)

Convert the module’s configuration to the a Hugging Face model configuration.

Parameters:: curated_config (FalconConfig) – The Curated Transformer model configuration.
Return type:: Mapping[str, Any]
Returns:: The converted Hugging Face configuration.

forward(piece_ids, attention_mask, cache=None, positions=None, store_cache=False)

Apply the causal language model to the given piece identifiers.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the decoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
cache (Optional[List[KeyValueCache]]) – Key/value cache to avoid recomputing key/value representations for tokens that were previously seen.
positions (Optional[Tensor]) –
Input positions. Positions are needed to look up rotary embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
store_cache (bool) – Whether to cache the key/value representations for future reuse.

Return type:

CausalLMOutputWithCache[KeyValueCache]

Returns:

Causal language model output with key/value cache.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_fsspec_(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Load parameters from a fsspec filestytem in-place into the model.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_config(*, hf_config, device=None)

Create the module from a Hugging Face model JSON-deserialized model configuration.

Parameters:

hf_config (Any) – Hugging Face model configuration.
device (Optional[device]) – Device on which to initialize the model.

Return type:

TypeVar(Self, bound= FalconCausalLM)

Returns:

Module constructed using the configuration.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct a module and load its parameters from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_hf_hub_(*, name, revision='main', device=None, quantization_config=None)

Load parameters from Hugging Face Hub in-place into the model.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

from_repo_(*, repo, device=None, quantization_config=None)

Load parameters from a repository in-place into the model.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

classmethod is_supported(config)

Check if the model with the given configuration is supported by this class.

Parameters:: config (Dict[str, Any]) – Hugging Face model configuration.
Return type:: bool
Returns:: Whether the model is supported by this class.

classmethod modules_to_not_quantize()

Return a set of prefixes that specify which modules are to be ignored during quantization.

Return type:

Set[str]

Returns:

Set of module prefixes.

If empty, all submodules will be quantized.

classmethod state_dict_from_hf(params)

Convert a state dict of a Hugging Face model to a valid state dict for the module.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

classmethod state_dict_to_hf(params)

Convert the state dict of the module to a compatible Hugging Face model’s format.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

class curated_transformers.models.GPTNeoXCausalLM(config, *, device=None)

Bases: TransformerCausalLM[GPTNeoXConfig], FromHF[GPTNeoXConfig], Quantizable

GPT-NeoX (Black et al., 2022) causal language model.

Construct a GPT-NeoX causal LM.

Parameters:

config (GPTNeoXConfig) – Causal LM configuration.
device (Optional[device]) – Device to which the module is to be moved.

Returns:

The causal LM.

property config: ConfigT: Returns the model’s configuration.

classmethod config_from_hf(hf_config)

Convert a Hugging Face model configuration to the module’s configuration.

Parameters:: hf_config (Mapping[str, Any]) – The Hugging Face model configuration.
Return type:: GPTNeoXConfig
Returns:: The converted Curated Transformer configuration.

classmethod config_to_hf(curated_config)

Convert the module’s configuration to the a Hugging Face model configuration.

Parameters:: curated_config (GPTNeoXConfig) – The Curated Transformer model configuration.
Return type:: Mapping[str, Any]
Returns:: The converted Hugging Face configuration.

forward(piece_ids, attention_mask, cache=None, positions=None, store_cache=False)

Apply the causal language model to the given piece identifiers.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the decoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
cache (Optional[List[KeyValueCache]]) – Key/value cache to avoid recomputing key/value representations for tokens that were previously seen.
positions (Optional[Tensor]) –
Input positions. Positions are needed to look up rotary embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
store_cache (bool) – Whether to cache the key/value representations for future reuse.

Return type:

CausalLMOutputWithCache[KeyValueCache]

Returns:

Causal language model output with key/value cache.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_fsspec_(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Load parameters from a fsspec filestytem in-place into the model.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_config(*, hf_config, device=None)

Create the module from a Hugging Face model JSON-deserialized model configuration.

Parameters:

hf_config (Any) – Hugging Face model configuration.
device (Optional[device]) – Device on which to initialize the model.

Return type:

TypeVar(Self, bound= GPTNeoXCausalLM)

Returns:

Module constructed using the configuration.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct a module and load its parameters from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_hf_hub_(*, name, revision='main', device=None, quantization_config=None)

Load parameters from Hugging Face Hub in-place into the model.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

from_repo_(*, repo, device=None, quantization_config=None)

Load parameters from a repository in-place into the model.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

classmethod is_supported(config)

Check if the model with the given configuration is supported by this class.

Parameters:: config (Dict[str, Any]) – Hugging Face model configuration.
Return type:: bool
Returns:: Whether the model is supported by this class.

classmethod modules_to_not_quantize()

Return a set of prefixes that specify which modules are to be ignored during quantization.

Return type:

Set[str]

Returns:

Set of module prefixes.

If empty, all submodules will be quantized.

classmethod state_dict_from_hf(params)

Convert a state dict of a Hugging Face model to a valid state dict for the module.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

classmethod state_dict_to_hf(params)

Convert the state dict of the module to a compatible Hugging Face model’s format.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

class curated_transformers.models.LlamaCausalLM(config, *, device=None)

Bases: TransformerCausalLM[LlamaConfig], FromHF[LlamaConfig], Quantizable

Llama (Touvron et al., 2023 [a], Touvron et al., 2023 [b]) causal language model.

Construct a Llama causal LM.

Parameters:

config (LlamaConfig) – Causal LM configuration.
device (Optional[device]) – Device to which the module is to be moved.

Returns:

The causal LM.

property config: ConfigT: Returns the model’s configuration.

classmethod config_from_hf(hf_config)

Convert a Hugging Face model configuration to the module’s configuration.

Parameters:: hf_config (Mapping[str, Any]) – The Hugging Face model configuration.
Return type:: LlamaConfig
Returns:: The converted Curated Transformer configuration.

classmethod config_to_hf(curated_config)

Convert the module’s configuration to the a Hugging Face model configuration.

Parameters:: curated_config (LlamaConfig) – The Curated Transformer model configuration.
Return type:: Mapping[str, Any]
Returns:: The converted Hugging Face configuration.

forward(piece_ids, attention_mask, cache=None, positions=None, store_cache=False)

Apply the causal language model to the given piece identifiers.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the decoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
cache (Optional[List[KeyValueCache]]) – Key/value cache to avoid recomputing key/value representations for tokens that were previously seen.
positions (Optional[Tensor]) –
Input positions. Positions are needed to look up rotary embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
store_cache (bool) – Whether to cache the key/value representations for future reuse.

Return type:

CausalLMOutputWithCache[KeyValueCache]

Returns:

Causal language model output with key/value cache.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_fsspec_(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Load parameters from a fsspec filestytem in-place into the model.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_config(*, hf_config, device=None)

Create the module from a Hugging Face model JSON-deserialized model configuration.

Parameters:

hf_config (Any) – Hugging Face model configuration.
device (Optional[device]) – Device on which to initialize the model.

Return type:

TypeVar(Self, bound= LlamaCausalLM)

Returns:

Module constructed using the configuration.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct a module and load its parameters from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_hf_hub_(*, name, revision='main', device=None, quantization_config=None)

Load parameters from Hugging Face Hub in-place into the model.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

from_repo_(*, repo, device=None, quantization_config=None)

Load parameters from a repository in-place into the model.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

classmethod is_supported(config)

Check if the model with the given configuration is supported by this class.

Parameters:: config (Dict[str, Any]) – Hugging Face model configuration.
Return type:: bool
Returns:: Whether the model is supported by this class.

classmethod modules_to_not_quantize()

Return a set of prefixes that specify which modules are to be ignored during quantization.

Return type:

Set[str]

Returns:

Set of module prefixes.

If empty, all submodules will be quantized.

classmethod state_dict_from_hf(params)

Convert a state dict of a Hugging Face model to a valid state dict for the module.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

classmethod state_dict_to_hf(params)

Convert the state dict of the module to a compatible Hugging Face model’s format.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

class curated_transformers.models.MPTCausalLM(config, *, device=None)

Bases: TransformerCausalLM[MPTConfig], FromHF[MPTConfig], Quantizable

MosaicML MPT causal language model.

Construct an MPT causal LM.

Parameters:

config (MPTConfig) – Causal LM configuration.
device (Optional[device]) – Device to which the module is to be moved.

Returns:

The causal LM.

property config: ConfigT: Returns the model’s configuration.

classmethod config_from_hf(hf_config)

Convert a Hugging Face model configuration to the module’s configuration.

Parameters:: hf_config (Mapping[str, Any]) – The Hugging Face model configuration.
Return type:: MPTConfig
Returns:: The converted Curated Transformer configuration.

classmethod config_to_hf(curated_config)

Convert the module’s configuration to the a Hugging Face model configuration.

Parameters:: curated_config (MPTConfig) – The Curated Transformer model configuration.
Return type:: Mapping[str, Any]
Returns:: The converted Hugging Face configuration.

forward(piece_ids, attention_mask, cache=None, positions=None, store_cache=False)

Apply the causal language model to the given piece identifiers.

Parameters:

piece_ids (Tensor) –
Piece identifiers to apply the decoder to.

Shape: (batch_size, seq_len)
attention_mask (AttentionMask) – Attention mask. Sequence elements for which the corresponding mask element is set to False are ignored during attention calculation.
cache (Optional[List[KeyValueCache]]) – Key/value cache to avoid recomputing key/value representations for tokens that were previously seen.
positions (Optional[Tensor]) –
Input positions. Positions are needed to look up rotary embeddings. Normally, these positions are calculated automatically. But if the positions deviate for some reason, they can be provided through this argument.

Shape: (batch_size, seq_len)
store_cache (bool) – Whether to cache the key/value representations for future reuse.

Return type:

CausalLMOutputWithCache[KeyValueCache]

Returns:

Causal language model output with key/value cache.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_fsspec_(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Load parameters from a fsspec filestytem in-place into the model.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_config(*, hf_config, device=None)

Create the module from a Hugging Face model JSON-deserialized model configuration.

Parameters:

hf_config (Any) – Hugging Face model configuration.
device (Optional[device]) – Device on which to initialize the model.

Return type:

TypeVar(Self, bound= MPTCausalLM)

Returns:

Module constructed using the configuration.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct a module and load its parameters from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

from_hf_hub_(*, name, revision='main', device=None, quantization_config=None)

Load parameters from Hugging Face Hub in-place into the model.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

from_repo_(*, repo, device=None, quantization_config=None)

Load parameters from a repository in-place into the model.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(Self, bound= FromHF)

Returns:

Loaded model.

classmethod is_supported(config)

Check if the model with the given configuration is supported by this class.

Parameters:: config (Dict[str, Any]) – Hugging Face model configuration.
Return type:: bool
Returns:: Whether the model is supported by this class.

classmethod modules_to_not_quantize()

Return a set of prefixes that specify which modules are to be ignored during quantization.

Return type:

Set[str]

Returns:

Set of module prefixes.

If empty, all submodules will be quantized.

classmethod state_dict_from_hf(params)

Convert a state dict of a Hugging Face model to a valid state dict for the module.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

classmethod state_dict_to_hf(params)

Convert the state dict of the module to a compatible Hugging Face model’s format.

Parameters:: params (Mapping[str, Tensor]) – The state dict to convert.
Return type:: Mapping[str, Tensor]
Returns:: The converted state dict.

Downloading

Each causal LM type provides a from_hf_hub function that will load a model from Hugging Face Hub. If you want to load a causal LM without committing to a specific causal LM type, you can use the AutoCausalLM class. This class also provides a from_hf_hub method but will try to infer the correct type automatically.

class curated_transformers.models.AutoCausalLM

Causal LM model loaded from the Hugging Face Model Hub.

classmethod from_fsspec(*, fs, model_path, fsspec_args=None, device=None, quantization_config=None)

Construct a module and load its parameters from a fsspec filesystem.

Parameters:

fs (AbstractFileSystem) – The filesystem to load the model from.
model_path (str) – The path of the model on the filesystem.
fsspec_args (Optional[FsspecArgs]) – Implementation-specific keyword arguments to pass to fsspec filesystem operations.
device (Optional[device]) – Device on which the model is initialized.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(ModelT)

Returns:

Module with the parameters loaded.

classmethod from_hf_hub(*, name, revision='main', device=None, quantization_config=None)

Construct and load a model or a generator from Hugging Face Hub.

Parameters:

name (str) – Model name.
revision (str) – Model revision.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

TypeVar(ModelT)

Returns:

Loaded model or generator.

classmethod from_hf_hub_to_cache(*, name, revision='main')

Download the model’s weights from Hugging Face Hub into the local Hugging Face cache directory. Subsequent loading of the model will read the weights from disk. If the weights are already cached, this is a no-op.

Parameters:

name (str) – Model name.
revision (str) – Model revision.

classmethod from_repo(*, repo, device=None, quantization_config=None)

Construct and load a model or a generator from a repository.

Parameters:

repository – The repository to load from.
device (Optional[device]) – Device on which to initialize the model.
quantization_config (Optional[BitsAndBytesConfig]) – Configuration for loading quantized weights.

Return type:

CausalLMModule[TransformerConfig, KeyValueCache]

Returns:

Loaded model or generator.

Caching

Causal language models apply causal attention, meaning that the attention mechanism only attends to preceding pieces. So, when the model predicts the next piece, the attention and hidden representations of the pieces before it do not change. This means we can avoid recomputing hidden representations of already-seen pieces by caching them. This allows us to generate text in \(\mathcal{O}(n^2)\) time rather than \(\mathcal{O}(n^3)\).

Caching works by calling the causal language model with the store_cache argument. The model will then return the cached representations as part of its output. The cached representations can then be passed in the next call to the language model with the cache argument:

cache = None
while not_done:
    ...
    output = lm(..., cache=cache, store_cache=True)
    cache = output.cache
    ...

Configuration

See decoder model configuration.