Repositories

Important

This API is currently experimental and can change between minor releases.

Models and tokenizers can be loaded from repositories using the from_repo method. You can add your own type of repository by implementing the Repository base class.

This is an example repository that opens files on the local filesystem:

import os.path
from typing import Optional

from curated_transformers.repository import Repository, RepositoryFile, LocalFile

class LocalRepository(Repository):
   def __init__(self, path: str):
      super().__init__()
      self.repo_path = path

   def file(self, path: str) -> RepositoryFile:
      full_path = f"{self.repo_path}/path"
      if not os.path.isfile(full_path):
         raise FileNotFoundError(f"File not found: {full_path}")
      return LocalFile(path=full_path)

   def pretty_path(self, path: Optional[str] = None) -> str:
      return self.full_path

Base Classes

class curated_transformers.repository.Repository

Bases: ABC

A repository that contains a model or tokenizer.

abstract file(path)

Get a repository file.

Parameters:

path (str) – The path of the file within the repository.

Return type:

RepositoryFile

Returns:

The file.

Raises:
json_file(path)

Get and parse a JSON file.

Parameters:

path (str) – The path of the file within the repository.

Return type:

Dict[str, Any]

Returns:

The deserialized JSON.

Raises:
abstract pretty_path(path=None)

Get a user-consumable path representation (e.g. for error messages).

Parameters:

path (Optional[str]) – The path of a file within the repository. The repository path will be returned if path is falsy.

Return type:

str

Returns:

The path representation.

class curated_transformers.repository.RepositoryFile

Bases: ABC

A repository file.

Repository files can be a local path or a remote path exposed as a file-like object. This is a common base class for such different types of repository files.

abstract open(mode='rb', encoding=None)

Get the file as a file-like object.

Parameters:
  • mode (str) – Mode to open the file with (see Python open).

  • encoding (Optional[str]) – Encoding to use when the file is opened as text.

Return type:

IO

Returns:

An I/O stream.

Raises:

OSError – When the file cannot be opened.

abstract property path: str | None

Get the file as a local path.

Returns:

The repository file. If the file is not available as a local path, the value of this property is None. In these cases open can be used to get the file as a file-like object.

Repositories

class curated_transformers.repository.FsspecRepository(fs, path, fsspec_args=None)

Bases: Repository

Repository using a filesystem that uses the fsspec interface.

Parameters:
  • fs (AbstractFileSystem) – The filesystem.

  • path (str) – The the path of the repository within the filesystem.

  • fsspec_args (Optional[FsspecArgs]) – Additional arguments that should be passed to the fsspec implementation.

file(path)

Get a repository file.

Parameters:

path (str) – The path of the file within the repository.

Return type:

RepositoryFile

Returns:

The file.

Raises:
pretty_path(path=None)

Get a user-consumable path representation (e.g. for error messages).

Parameters:

path (Optional[str]) – The path of a file within the repository. The repository path will be returned if path is falsy.

Return type:

str

Returns:

The path representation.

class curated_transformers.repository.HfHubRepository(name, *, revision='main')

Bases: Repository

Hugging Face Hub repository.

Parameters:
  • name (str) – Name of the repository on Hugging Face Hub.

  • revision (str) – Source repository revision. Can either be a branch name or a SHA hash of a commit.

file(path)

Get a repository file.

Parameters:

path (str) – The path of the file within the repository.

Return type:

RepositoryFile

Returns:

The file.

Raises:
pretty_path(path=None)

Get a user-consumable path representation (e.g. for error messages).

Parameters:

path (Optional[str]) – The path of a file within the repository. The repository path will be returned if path is falsy.

Return type:

str

Returns:

The path representation.

Repository Files

class curated_transformers.repository.FsspecFile(fs, path, fsspec_args=None)

Bases: RepositoryFile

Repository file on an fsspec filesystem.

Construct an fsspec file representation.

Parameters:
  • fs (AbstractFileSystem) – The filesystem.

  • path (str) – The path of the file on the filesystem.

  • fsspec_args (Optional[FsspecArgs]) – Implementation-specific arguments to pass to fsspec filesystem operations.

open(mode='rb', encoding=None)

Get the file as a file-like object.

Parameters:
  • mode (str) – Mode to open the file with (see Python open).

  • encoding (Optional[str]) – Encoding to use when the file is opened as text.

Return type:

IO

Returns:

An I/O stream.

Raises:

OSError – When the file cannot be opened.

property path: str | None

Get the file as a local path.

Returns:

The repository file. If the file is not available as a local path, the value of this property is None. In these cases open can be used to get the file as a file-like object.

class curated_transformers.repository.LocalFile(path)

Bases: RepositoryFile

Repository file on the local machine.

Construct a local file representation.

Parameters:

path (str) – The path of the file on the local filesystem.

open(mode='rb', encoding=None)

Get the file as a file-like object.

Parameters:
  • mode (str) – Mode to open the file with (see Python open).

  • encoding (Optional[str]) – Encoding to use when the file is opened as text.

Return type:

IO

Returns:

An I/O stream.

Raises:

OSError – When the file cannot be opened.

property path: str | None

Get the file as a local path.

Returns:

The repository file. If the file is not available as a local path, the value of this property is None. In these cases open can be used to get the file as a file-like object.