Quantization
- class curated_transformers.quantization.Quantizable
Mixin class for models that are quantizable.
A module using this mixin provides the necessary configuration and parameter information to quantize it on-the-fly during the module loading phase.
bitsandbytes
These classes can be used to specify the configuration for quantizing model
parameters using the bitsandbytes
library.
- class curated_transformers.quantization.bnb.Dtype4Bit(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Data type to use for 4-bit quantization.
- FP4 = 'fp4'
FP4
- Float 4-bit.
- NF4 = 'nf4'
NF4
- NormalFloat 4-bit.