Embedding models

more info here

Base embedding model

class src.models.embedding.base.EmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]

Bases: Model

Custom class implementation for model.

_embed(text: str | list[str]) array[source]

This internal function is used to create embeddings.

Parameters:

text – the text to embed

Returns:

_load_model(model_uri: str) None[source]

This function enables custom model preperations before executing the classification

Parameters:

model_id – model_id to pull

Returns:

_prep_labels(taxonomy: Taxonomy | list[str]) None[source]

The function that prepares the labels, this converts them to the required format for further processing with a model. :param taxonomy: Taxonomy object where we will use the labels from :return:

add_labels(labels: list[str]) None[source]

This function enables the adding of extra labels to the models setup

Parameters:

labels – list of new labels to add/ set in place

Returns:

nothing

classify(text: str, multi_label, **kwargs) dict[str, float][source]

[Adaptation] customized for embedding similarity predictions

Abstract function that executes the text classificatoin

Parameters:
  • text – the text to classify

  • multi_label – boolean to identify if it is a multilabel problem

  • kwargs – potential extra vars

Returns:

the results

Child label embedding model

class src.models.embedding.child_labels.ChildLabelsEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]

Bases: EmbeddingModel

Child label class implementation for embedding model

_prep_labels(taxonomy: Taxonomy | list[str]) None[source]

The function that prepares the labels, this converts them to the required format for further processing with a model. :param taxonomy: Taxonomy object where we will use the labels from :return:

_text_formatting(taxonomy_node: Taxonomy) str[source]

Chunked embedding model

class src.models.embedding.chunked.ChunkedEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]

Bases: EmbeddingModel

Embedding implementation that chunks the text into slices of a certain length

classify(text: str, multi_label, **kwargs) dict[str, float][source]

[Adaptation] customized for embedding similarity predictions

Abstract function that executes the text classificatoin

Parameters:
  • text – the text to classify

  • multi_label – boolean to identify if it is a multilabel problem

  • kwargs – potential extra vars

Returns:

the results

Ground up embedding model

class src.models.embedding.ground_up.GroundUpRegularEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]

Bases: EmbeddingModel

This embedding model builds the tree from bottom to top based on confidences

_embed(text: str | list[str]) array[source]

This internal function is used to create embeddings.

Parameters:

text – the text to embed

Returns:

_prep_labels(taxonomy: Taxonomy | list[str]) None[source]

The function that prepares the labels, this converts them to the required format for further processing with a model. :param taxonomy: Taxonomy object where we will use the labels from :return:

classify(text: str, multi_label, **kwargs) dict[str, float][source]

[Adaptation] customized for embedding similarity predictions

Abstract function that executes the text classificatoin

Parameters:
  • text – the text to classify

  • multi_label – boolean to identify if it is a multilabel problem

  • kwargs – potential extra vars

Returns:

the results

Greedy ground up embedding model

class src.models.embedding.ground_up_greedy.GroundUpGreedyEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]

Bases: GroundUpRegularEmbeddingModel

Ground up greedy approach only takes the values with the highest possible scores

classify(text: str, multi_label, **kwargs) dict[str, float][source]

[Adaptation] customized for embedding similarity predictions

Abstract function that executes the text classificatoin

Parameters:
  • text – the text to classify

  • multi_label – boolean to identify if it is a multilabel problem

  • kwargs – potential extra vars

Returns:

the results

Sentence based embedding model

class src.models.embedding.sentence.SentenceEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]

Bases: EmbeddingModel

classify(text: str, multi_label, **kwargs) dict[str, float][source]

[Adaptation] customized for embedding similarity predictions

Abstract function that executes the text classificatoin

Parameters:
  • text – the text to classify

  • multi_label – boolean to identify if it is a multilabel problem

  • kwargs – potential extra vars

Returns:

the results