Embedding models
more info here
Base embedding model
- class src.models.embedding.base.EmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]
Bases:
ModelCustom class implementation for model.
- _embed(text: str | list[str]) array[source]
This internal function is used to create embeddings.
- Parameters:
text – the text to embed
- Returns:
- _load_model(model_uri: str) None[source]
This function enables custom model preperations before executing the classification
- Parameters:
model_id – model_id to pull
- Returns:
- _prep_labels(taxonomy: Taxonomy | list[str]) None[source]
The function that prepares the labels, this converts them to the required format for further processing with a model. :param taxonomy: Taxonomy object where we will use the labels from :return:
- add_labels(labels: list[str]) None[source]
This function enables the adding of extra labels to the models setup
- Parameters:
labels – list of new labels to add/ set in place
- Returns:
nothing
- classify(text: str, multi_label, **kwargs) dict[str, float][source]
[Adaptation] customized for embedding similarity predictions
Abstract function that executes the text classificatoin
- Parameters:
text – the text to classify
multi_label – boolean to identify if it is a multilabel problem
kwargs – potential extra vars
- Returns:
the results
Child label embedding model
- class src.models.embedding.child_labels.ChildLabelsEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]
Bases:
EmbeddingModelChild label class implementation for embedding model
Chunked embedding model
- class src.models.embedding.chunked.ChunkedEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]
Bases:
EmbeddingModelEmbedding implementation that chunks the text into slices of a certain length
- classify(text: str, multi_label, **kwargs) dict[str, float][source]
[Adaptation] customized for embedding similarity predictions
Abstract function that executes the text classificatoin
- Parameters:
text – the text to classify
multi_label – boolean to identify if it is a multilabel problem
kwargs – potential extra vars
- Returns:
the results
Ground up embedding model
- class src.models.embedding.ground_up.GroundUpRegularEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]
Bases:
EmbeddingModelThis embedding model builds the tree from bottom to top based on confidences
- _embed(text: str | list[str]) array[source]
This internal function is used to create embeddings.
- Parameters:
text – the text to embed
- Returns:
- _prep_labels(taxonomy: Taxonomy | list[str]) None[source]
The function that prepares the labels, this converts them to the required format for further processing with a model. :param taxonomy: Taxonomy object where we will use the labels from :return:
- classify(text: str, multi_label, **kwargs) dict[str, float][source]
[Adaptation] customized for embedding similarity predictions
Abstract function that executes the text classificatoin
- Parameters:
text – the text to classify
multi_label – boolean to identify if it is a multilabel problem
kwargs – potential extra vars
- Returns:
the results
Greedy ground up embedding model
- class src.models.embedding.ground_up_greedy.GroundUpGreedyEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]
Bases:
GroundUpRegularEmbeddingModelGround up greedy approach only takes the values with the highest possible scores
- classify(text: str, multi_label, **kwargs) dict[str, float][source]
[Adaptation] customized for embedding similarity predictions
Abstract function that executes the text classificatoin
- Parameters:
text – the text to classify
multi_label – boolean to identify if it is a multilabel problem
kwargs – potential extra vars
- Returns:
the results
Sentence based embedding model
- class src.models.embedding.sentence.SentenceEmbeddingModel(config: Config, logger: Logger, model_id: str, taxonomy: Taxonomy)[source]
Bases:
EmbeddingModel- classify(text: str, multi_label, **kwargs) dict[str, float][source]
[Adaptation] customized for embedding similarity predictions
Abstract function that executes the text classificatoin
- Parameters:
text – the text to classify
multi_label – boolean to identify if it is a multilabel problem
kwargs – potential extra vars
- Returns:
the results