Embedding models

more info here

Base embedding model

class src.models.embedding.base.EmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]

Bases: Model

Custom class implementation for model.

_embed(text: str | list[str]) → array[source]

This internal function is used to create embeddings.

Parameters:: text – the text to embed
Returns:

_load_model(model_uri: str) → None[source]

This function enables custom model preperations before executing the classification

Parameters:: model_id – model_id to pull
Returns:

_prep_labels(taxonomy: Taxonomy | list[str]) → None[source]: The function that prepares the labels, this converts them to the required format for further processing with a model. :param taxonomy: Taxonomy object where we will use the labels from :return:

add_labels(labels: list[str]) → None[source]

This function enables the adding of extra labels to the models setup

Parameters:: labels – list of new labels to add/ set in place
Returns:: nothing

classify(text: str, multi_label, **kwargs) → dict[str, float][source]

[Adaptation] customized for embedding similarity predictions

Abstract function that executes the text classificatoin

Parameters:

text – the text to classify
multi_label – boolean to identify if it is a multilabel problem
kwargs – potential extra vars

Returns:

the results

Child label embedding model

class src.models.embedding.child_labels.ChildLabelsEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]

Bases: EmbeddingModel

Child label class implementation for embedding model

_prep_labels(taxonomy: Taxonomy | list[str]) → None[source]: The function that prepares the labels, this converts them to the required format for further processing with a model. :param taxonomy: Taxonomy object where we will use the labels from :return:

_text_formatting(taxonomy_node: Taxonomy) → str[source]

Chunked embedding model

class src.models.embedding.chunked.ChunkedEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]

Bases: EmbeddingModel

Embedding implementation that chunks the text into slices of a certain length

classify(text: str, multi_label, **kwargs) → dict[str, float][source]

[Adaptation] customized for embedding similarity predictions

Abstract function that executes the text classificatoin

Parameters:

text – the text to classify
multi_label – boolean to identify if it is a multilabel problem
kwargs – potential extra vars

Returns:

the results

Ground up embedding model

class src.models.embedding.ground_up.GroundUpRegularEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]

Bases: EmbeddingModel

This embedding model builds the tree from bottom to top based on confidences

_embed(text: str | list[str]) → array[source]

This internal function is used to create embeddings.

Parameters:: text – the text to embed
Returns:

_prep_labels(taxonomy: Taxonomy | list[str]) → None[source]: The function that prepares the labels, this converts them to the required format for further processing with a model. :param taxonomy: Taxonomy object where we will use the labels from :return:

classify(text: str, multi_label, **kwargs) → dict[str, float][source]

[Adaptation] customized for embedding similarity predictions

Abstract function that executes the text classificatoin

Parameters:

text – the text to classify
multi_label – boolean to identify if it is a multilabel problem
kwargs – potential extra vars

Returns:

the results

Greedy ground up embedding model

class src.models.embedding.ground_up_greedy.GroundUpGreedyEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]

Bases: GroundUpRegularEmbeddingModel

Ground up greedy approach only takes the values with the highest possible scores

classify(text: str, multi_label, **kwargs) → dict[str, float][source]

[Adaptation] customized for embedding similarity predictions

Abstract function that executes the text classificatoin

Parameters:

text – the text to classify
multi_label – boolean to identify if it is a multilabel problem
kwargs – potential extra vars

Returns:

the results

Sentence based embedding model

class src.models.embedding.sentence.SentenceEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]

Bases: EmbeddingModel

classify(text: str, multi_label, **kwargs) → dict[str, float][source]

[Adaptation] customized for embedding similarity predictions

Abstract function that executes the text classificatoin

Parameters:

text – the text to classify
multi_label – boolean to identify if it is a multilabel problem
kwargs – potential extra vars

Returns:

the results