Development
Branches
We use two branches during regular development. If the current version is 1.0.3, then the active branches are:
main: the development branch that will lead up to 1.1.0.v1.0.x: the bugfix branch that will lead up to 1.0.4.
Following semver, only bug fixes must be pushed to v1.0.x. When applicable,
a bug should first be fixed in the main branch using a PR. After that, a
backport PR should be made for the v1.0.x branch with the backport label.
TorchScript
Tracing
We support TorchScript tracing and test it with all models when using
the --slow flag.
Tracing only accepts a small number of types for the arguments and return values
of a traced module. For our purposes, these types are: Tensor, Dict[str,
Tensor], or tuples of these types. This has ramifications for our models
because they take different argument types (e.g., AttentionMask and
KeyValueCache) and return ModelOutput or one of its subclasses. What
complicates this is that we want to keep strong typing outside TorchScript. We
have addressed these issues as described below.
Module Arguments
Our argument types are dataclasses with only Tensor fields. These types can
be represented as Dict[str, Tensor] without any loss of information. To this
end, we have made a DataclassAsDict base class. Dataclasses that inherit
from this class are also proper dictionaries. This allows us to pass these data
structures to traced models. When such a type is passed to a traced model, the
original type information is erased and inside the model, the argument will be a
regular dictionary. To handle these arguments uniformly and retain access to
utility methods and properties, we rewrap the dictionary as a class. For instance, a
method that uses AttentionMask can rewrap Union[AttentionMask, Dict[str,
Tensor]] as an AttentionMask:
attention_mask = AttentionMask.jit_rewrap(attention_mask)
Module Return Values
The ModelOutput-based return types can contain nested dataclasses. For
instance, ModelOutputWithCache contains an Optional[List[CacheT]] field
where CacheT can be KeyValueCache. Consequently, not every
ModelOutput can be represented as a Dict[str, Tentor]. For that reason,
we represent model outputs as tuples instead. Dataclasses that inherit from
DataclassAsTuple are also a tuple.
Scripting
We do not support TorchScript scripting, since it would require too many
compromises to code quality (e.g., we cannot use torch.finfo).