LightningIRTokenizerClassFactory

class lightning_ir.base.class_factory.LightningIRTokenizerClassFactory(MixinConfig: Type[LightningIRConfig])[source]

Bases: LightningIRClassFactory

Class factory for creating derived LightningIRTokenizer classes from HuggingFace tokenizer classes.

__init__(MixinConfig: Type[LightningIRConfig]) None

Creates a new LightningIRClassFactory.

Parameters:

MixinConfig (Type[LightningIRConfig]) – LightningIRConfig mixin class

Methods

__init__(MixinConfig)

Creates a new LightningIRClassFactory.

from_backbone_class(BackboneClass)

Creates a derived LightningIRTokenizer from a transformers.PreTrainedTokenizerBase backbone tokenizer.

from_backbone_classes(BackboneClasses[, ...])

Creates derived slow and fastLightningIRTokenizers from a tuple of backbone HuggingFace tokenizer classes.

from_pretrained(model_name_or_path, *args[, ...])

Loads a derived LightningIRTokenizer from a pretrained HuggingFace tokenizer.

get_backbone_config(model_name_or_path)

Grabs the tokenizer configuration class from a checkpoint of a pretrained HuggingFace tokenizer.

get_backbone_model_type(model_name_or_path, ...)

Grabs the model type from a checkpoint of a pretrained HuggingFace tokenizer.

get_lightning_ir_config(model_name_or_path)

Grabs the LightningIR configuration class from a checkpoint of a pretrained Lightning IR model.

get_lightning_ir_model_type(model_name_or_path)

Grabs the Lightning IR model type from a checkpoint of a pretrained HuggingFace model.

Attributes

cc_lir_model_type

Camel case model type of the LightningIR model.

property cc_lir_model_type: str

Camel case model type of the LightningIR model.

from_backbone_class(BackboneClass: Type[PreTrainedTokenizerBase]) Type[LightningIRTokenizer][source]

Creates a derived LightningIRTokenizer from a transformers.PreTrainedTokenizerBase backbone tokenizer. If the backbone tokenizer is already a LightningIRTokenizer, it is returned as is.

Parameters:

BackboneClass (Type[PreTrainedTokenizerBase]) – Backbone tokenizer class

Returns:

Derived LightningIRTokenizer

Return type:

Type[LightningIRTokenizer]

from_backbone_classes(BackboneClasses: Tuple[Type[PreTrainedTokenizerBase] | None, Type[PreTrainedTokenizerBase] | None], BackboneConfig: Type[PretrainedConfig] | None = None) Tuple[Type[LightningIRTokenizer] | None, Type[LightningIRTokenizer] | None][source]

Creates derived slow and fastLightningIRTokenizers from a tuple of backbone HuggingFace tokenizer classes.

Parameters:
  • BackboneClasses (Tuple[Type[PreTrainedTokenizerBase] | None, Type[PreTrainedTokenizerBase] | None]) – Slow and fast backbone tokenizer classes

  • BackboneConfig (Type[PretrainedConfig], optional) – Backbone configuration class, defaults to None

Returns:

Slow and fast derived LightningIRTokenizers

Return type:

Tuple[Type[LightningIRTokenizer] | None, Type[LightningIRTokenizer] | None]

from_pretrained(model_name_or_path: str | Path, *args, use_fast: bool = True, **kwargs) Type[LightningIRTokenizer][source]

Loads a derived LightningIRTokenizer from a pretrained HuggingFace tokenizer.

Parameters:
  • model_name_or_path (str | Path) – Path to the tokenizer or its name

  • use_fast (bool, optional) – Whether to use the fast or slow tokenizer, defaults to True

Raises:
  • ValueError – If use_fast is True and no fast tokenizer is found

  • ValueError – If use_fast is False and no slow tokenizer is found

Returns:

Derived LightningIRTokenizer

Return type:

Type[LightningIRTokenizer]

static get_backbone_config(model_name_or_path: str | Path) PretrainedConfig[source]

Grabs the tokenizer configuration class from a checkpoint of a pretrained HuggingFace tokenizer.

Parameters:

model_name_or_path (str | Path) – Path to the tokenizer or its name

Returns:

Configuration class of the backbone tokenizer

Return type:

PretrainedConfig

static get_backbone_model_type(model_name_or_path: str | Path, *args, **kwargs) str[source]

Grabs the model type from a checkpoint of a pretrained HuggingFace tokenizer.

Parameters:

model_name_or_path (str | Path) – Path to the tokenizer or its name

Returns:

Model type of the backbone tokenizer

Return type:

str

static get_lightning_ir_config(model_name_or_path: str | Path) Type[LightningIRConfig] | None

Grabs the LightningIR configuration class from a checkpoint of a pretrained Lightning IR model.

Parameters:

model_name_or_path (str | Path) – Path to the model or its name

Returns:

Configuration class of the Lightning IR model

Return type:

Type[LightningIRConfig]

static get_lightning_ir_model_type(model_name_or_path: str | Path) str | None

Grabs the Lightning IR model type from a checkpoint of a pretrained HuggingFace model.

Parameters:

model_name_or_path (str | Path) – Path to the model or its name

Returns:

Model type of the Lightning IR model

Return type:

str | None