Curated Transformers
v2.0.x
  • Usage
    • Installation
      • CUDA Support
    • Text Generation Using Causal LMs
    • Loading a Model
      • Hugging Face Hub
      • fsspec filesystem
    • Quantization
    • Loading a Tokenizer
    • Text Encoding
  • API
    • Building Blocks
      • Embedding Layers
        • EmbeddingDropouts
        • EmbeddingLayerNorms
        • TransformerEmbeddings
      • Encoder/Decoder Layers
        • TransformerDropouts
        • TransformerLayerNorms
        • DecoderLayer
        • EncoderLayer
      • Attention
        • QkvMode
        • QkvSplit
        • AttentionHeads
        • AttentionMask
        • KeyValueCache
        • AttentionScorer
        • AttentionLinearBiases
        • ScaledDotProductAttention
        • SelfAttention
      • Embeddings
        • SinusoidalPositionalEmbedding
        • RotaryEmbeddings
        • QueryKeyRotaryEmbeddings
      • Feed-forward Layers
        • PointwiseFeedForward
      • Activations
        • Activation
        • GELUFast
        • GELUNew
      • Normalization
        • RMSNorm
      • Model Outputs
        • ModelOutput
        • ModelOutputWithCache
        • CausalLMOutputWithCache
      • Model Configs
        • RotaryEmbeddingConfig
        • TransformerAttentionLayerConfig
        • TransformerEmbeddingLayerConfig
        • TransformerFeedForwardLayerConfig
        • TransformerLayerConfig
        • TransformerConfig
    • Encoders
      • Base Classes
        • EncoderModule
        • TransformerEncoder
      • Architectures
        • ALBERTEncoder
        • BERTEncoder
        • CamemBERTEncoder
        • RoBERTaEncoder
        • XLMREncoder
      • Downloading
        • AutoEncoder
      • Configuration
        • ALBERT
        • BERT
        • CamemBERT
        • RoBERTa
        • XLM-RoBERTa
    • Decoders
      • Base Classes
        • DecoderModule
        • TransformerDecoder
      • Architectures
        • FalconDecoder
        • GPTNeoXDecoder
        • LlamaDecoder
        • MPTDecoder
      • Downloading
        • AutoDecoder
      • Configuration
        • Falcon
        • GPT-NeoX
        • Llama
        • MPT
    • Causal Language Models
      • Base Classes
        • CausalLMModule
        • TransformerCausalLM
      • Architectures
        • FalconCausalLM
        • GPTNeoXCausalLM
        • LlamaCausalLM
        • MPTCausalLM
      • Downloading
        • AutoCausalLM
      • Caching
      • Configuration
    • Generation
      • Models
        • Generator
        • StringGenerator
        • GeneratorWrapper
        • DefaultGenerator
        • DollyV2Generator
        • FalconGenerator
        • LlamaGenerator
        • MPTGenerator
      • Downloading
        • AutoGenerator
      • Configuration
        • GeneratorConfig
        • GreedyGeneratorConfig
        • SampleGeneratorConfig
        • StopCondition
        • CompoundStopCondition
        • EndOfSequenceCondition
        • MaxGeneratedPiecesCondition
        • LogitsTransform
        • CompoundLogitsTransform
        • TopKTransform
        • TopPTransform
        • TemperatureTransform
        • VocabMaskTransform
    • Registries
    • Repositories
      • Base Classes
        • Repository
        • RepositoryFile
        • TransactionContext
      • Repositories
        • FsspecRepository
        • HfHubRepository
      • Repository Files
        • FsspecFile
        • LocalFile
        • HfHubFile
    • Tokenizers
      • Inputs
        • InputChunks
        • SpecialPieceChunk
        • TextChunk
      • Outputs
        • PiecesWithIds
      • Downloading
        • AutoTokenizer
      • Architectures
        • TokenizerBase
        • Non-Legacy
        • Legacy
    • Quantization
      • Quantizable
        • Quantizable.modules_to_not_quantize()
      • bitsandbytes
        • Dtype4Bit
        • BitsAndBytesConfig
    • Utilities
      • Context Managers
        • enable_torch_sdp()
        • use_nvtx_ranges_for_forward_pass()
      • Hugging Face
        • Loading Models from Hugging Face Hub
  • Deployment
  • Development
    • Branches
  • API Compatibility
    • Specific Constructions
      • Enums
      • Mandatory Arguments
    • Types Used for API Compatibility
      • Default
      • FutureMandatory
    • Changes Between Major Versions
      • Version 1 to 2
Curated Transformers
  • API
  • Edit on GitHub

API

  • Building Blocks
    • Embedding Layers
    • Encoder/Decoder Layers
    • Attention
    • Embeddings
    • Feed-forward Layers
    • Activations
    • Normalization
    • Model Outputs
    • Model Configs
  • Encoders
    • Base Classes
    • Architectures
    • Downloading
    • Configuration
  • Decoders
    • Base Classes
    • Architectures
    • Downloading
    • Configuration
  • Causal Language Models
    • Base Classes
    • Architectures
    • Downloading
    • Caching
    • Configuration
  • Generation
    • Models
    • Downloading
    • Configuration
  • Registries
  • Repositories
    • Base Classes
    • Repositories
    • Repository Files
  • Tokenizers
    • Inputs
    • Outputs
    • Downloading
    • Architectures
  • Quantization
    • Quantizable
    • bitsandbytes
  • Utilities
    • Context Managers
    • Hugging Face
Previous Next

© Copyright 2021-2023, ExplosionAI GmbH. Revision 491b4086.

Built with Sphinx using a theme provided by Read the Docs.