Repositories

Models and tokenizers can be loaded from repositories using the from_repo method. You can add your own type of repository by implementing the curated_transformers.repository.Repository base class.

This is an example repository that opens files on the local filesystem:

import os.path
from typing import Optional

from curated_transformers.repository import Repository, RepositoryFile

class LocalRepository(Repository):
   def __init__(self, path: str):
      super().__init__()
      self.repo_path = path

   def file(self, path: str) -> RepositoryFile:
      full_path = f"{self.repo_path}/path"
      if not os.path.isfile(full_path):
         raise FileNotFoundError(f"File not found: {full_path}")
      return LocalFile(path=full_path)

   def pretty_path(self, path: Optional[str] = None) -> str:
      return self.full_path

Base Classes

class curated_transformers.repository.Repository

Bases: ABC

A repository that contains a model or tokenizer.

abstract file(path)

Get a lazily-loaded repository file.

Parameters:

path (str) – The path of the file within the repository.

Return type:

RepositoryFile

Returns:

The file.

json_file(path)

Get and parse a JSON file.

Parameters:

path (str) – The path of the file within the repository.

Return type:

Dict[str, Any]

Returns:

The deserialized JSON.

Raises:
abstract pretty_path(path=None)

Get a user-consumable path representation (e.g. for error messages).

Parameters:

path (Optional[str]) – The path of a file within the repository. The repository path will be returned if path is falsy.

Return type:

str

Returns:

The path representation.

abstract transaction()

Begins a new transaction. File operations performed on the transaction context will be deferred until the transaction completes successfully.

Return type:

TransactionContext

Returns:

The transaction context manager.

class curated_transformers.repository.RepositoryFile

Bases: ABC

A repository file.

Repository files can be a local path or a remote path exposed as a file-like object. This is a common base class for such different types of repository files.

abstract exists()

Returns if the file exists. This can cause the file to be cached locally.

Return type:

bool

abstract open(mode='rb', encoding=None)

Get the file as a file-like object.

Parameters:
  • mode (str) – Mode to open the file with (see Python open).

  • encoding (Optional[str]) – Encoding to use when the file is opened as text.

Return type:

IO

Returns:

An I/O stream.

Raises:
abstract property path: str | None

Get the file as a local path.

Returns:

The repository file. If the file is not available as a local path, the value of this property is None. In these cases open can be used to get the file as a file-like object.

class curated_transformers.repository.TransactionContext

Bases: ABC

A context manager that represents an active transaction in a repository.

abstract open(path, mode, encoding=None)

Opens a file as a part of a transaction. Changes to the file are deferred until the transaction has completed successfully.

Parameters:
  • path (str) – The path to the file on the parent repository.

  • mode (str) – Mode to open the file with (see Python open).

  • encoding (Optional[str]) – Encoding to use when the file is opened as text.

Return type:

IO

Returns:

An I/O stream.

Raises:
abstract property repo: Repository
Returns:

The parent repository on which this transaction is performed.

Repositories

class curated_transformers.repository.FsspecRepository(fs, path, fsspec_args=None)

Bases: Repository

Repository using a filesystem that uses the fsspec interface.

Parameters:
  • fs (AbstractFileSystem) – The filesystem.

  • path (str) – The the path of the repository within the filesystem.

  • fsspec_args (Optional[FsspecArgs]) – Additional arguments that should be passed to the fsspec implementation.

file(path)

Get a lazily-loaded repository file.

Parameters:

path (str) – The path of the file within the repository.

Return type:

RepositoryFile

Returns:

The file.

pretty_path(path=None)

Get a user-consumable path representation (e.g. for error messages).

Parameters:

path (Optional[str]) – The path of a file within the repository. The repository path will be returned if path is falsy.

Return type:

str

Returns:

The path representation.

transaction()

Begins a new transaction. File operations performed on the transaction context will be deferred until the transaction completes successfully.

Return type:

TransactionContext

Returns:

The transaction context manager.

class curated_transformers.repository.HfHubRepository(name, *, revision='main')

Bases: Repository

Hugging Face Hub repository.

Parameters:
  • name (str) – Name of the repository on Hugging Face Hub.

  • revision (str) – Source repository revision. Can either be a branch name or a SHA hash of a commit.

file(path)

Get a lazily-loaded repository file.

Parameters:

path (str) – The path of the file within the repository.

Return type:

RepositoryFile

Returns:

The file.

pretty_path(path=None)

Get a user-consumable path representation (e.g. for error messages).

Parameters:

path (Optional[str]) – The path of a file within the repository. The repository path will be returned if path is falsy.

Return type:

str

Returns:

The path representation.

transaction()

Begins a new transaction. File operations performed on the transaction context will be deferred until the transaction completes successfully.

Return type:

TransactionContext

Returns:

The transaction context manager.

Repository Files

class curated_transformers.repository.FsspecFile(fs, path, fsspec_args=None)

Bases: RepositoryFile

Repository file on an fsspec filesystem.

Construct an fsspec file representation.

Parameters:
  • fs (AbstractFileSystem) – The filesystem.

  • path (str) – The path of the file on the filesystem.

  • fsspec_args (Optional[FsspecArgs]) – Implementation-specific arguments to pass to fsspec filesystem operations.

exists()

Returns if the file exists. This can cause the file to be cached locally.

Return type:

bool

open(mode='rb', encoding=None)

Get the file as a file-like object.

Parameters:
  • mode (str) – Mode to open the file with (see Python open).

  • encoding (Optional[str]) – Encoding to use when the file is opened as text.

Return type:

IO

Returns:

An I/O stream.

Raises:
property path: str | None

Get the file as a local path.

Returns:

The repository file. If the file is not available as a local path, the value of this property is None. In these cases open can be used to get the file as a file-like object.

class curated_transformers.repository.LocalFile(path)

Bases: RepositoryFile

Repository file on the local machine.

Construct a local file representation.

Parameters:

path (str) – The path of the file on the local filesystem.

exists()

Returns if the file exists. This can cause the file to be cached locally.

Return type:

bool

open(mode='rb', encoding=None)

Get the file as a file-like object.

Parameters:
  • mode (str) – Mode to open the file with (see Python open).

  • encoding (Optional[str]) – Encoding to use when the file is opened as text.

Return type:

IO

Returns:

An I/O stream.

Raises:
property path: str | None

Get the file as a local path.

Returns:

The repository file. If the file is not available as a local path, the value of this property is None. In these cases open can be used to get the file as a file-like object.

class curated_transformers.repository.HfHubFile(repo, path)

Bases: RepositoryFile

Wraps either a remote file on a Hugging Face Hub repository or a local file in the Hugging Face cache.

Construct a Hugging Face file representation.

Parameters:
  • repo (HfHubRepository) – The parent repository.

  • path (str) – The path of the remote file in the repository.

exists()

Returns if the file exists. This can cause the file to be cached locally.

Return type:

bool

open(mode='rb', encoding=None)

Get the file as a file-like object.

Parameters:
  • mode (str) – Mode to open the file with (see Python open).

  • encoding (Optional[str]) – Encoding to use when the file is opened as text.

Return type:

IO

Returns:

An I/O stream.

Raises:
property path: str | None

Get the file as a local path.

Returns:

The repository file. If the file is not available as a local path, the value of this property is None. In these cases open can be used to get the file as a file-like object.