Data models

This module’s main purpose is to interface with a sparql data-‘swamp’.

TODO: add more high level info _here_

This module allows us to parse sparql data into structure python objects, and write python objects to sparql.

Annotation

class src.data_models.annotation.Annotation(config: DataModelConfig, logger: Logger, taxonomy: Taxonomy, date: datetime | int, user: User = None, labels: list[Label] = None, model: Model = None, uri: str = None)[source]

Bases: Base

This class is used for parsing annotations from the structured input that is received from the sparql into a python object with extended functionality.

Using the config it is possible to extend this classes functionality for loading and saving custom configs. Given there is some resemblance with the initial sparql schema

An annotation can come from two seperate sources, these are specified as specific properties. The two possible annotation types are:

  1. User annotation (annotation made by a user)
    >>> user_annotation = Annotation(
            date=datetime.now(),
            config=DataModelConfig(),
            logger=logging.logger,
            taxonomy=Taxonomy(...),
            user=User(...),
            labels=[Labels(...), ...],
        )
    
  2. Model annotation (annotation made by a model)
    >>> model_annotation = Annotation(
            date=datetime.now(),
            config=DataModelConfig(),
            logger=logging.logger,
            taxonomy=Taxonomy(...),
            model=Model(...),
            labels=[Labels(...), ...],
        )
    

For more specific usage, check functions below.

property date: int

This property is used for setting and getting the date value. The setter contains extra functionality to cast it to the specifically required type.

Example usage:
>>> article = Annotation(...)
>>> article.date = datetime.now()
>>> date = article.date
Returns:

The integer epoch time value for the provided timestamp

classmethod from_sparql(*args, **kwargs)

Class method for class initialization from sparql. When provided with an uri from an annotations, it will automatically execute all related queries to populate the object with all necessary information.

Example usage:
>>> annotation = Annotation.from_sparql(
        config = DataModelConfig(),
        logger = logging.logger,
        request_handler = ...,
        annotation_uri = "..."
    )
Parameters:
  • config – the general DataModelConfig

  • logger – logger object that can be used for logs

  • request_handler – the request wrapper used for sparql requests

  • annotation_uri – the uri which is used to find all relevant information

Returns:

an instance of the Annotation Class

property label_uris

This property is used to retrieve the list of taxonomy uris for all the linked labels.

Returns:

The linked label uris as a list of strings

property labels: list[Label | ...]

This property returns a the linked label object(s). The property is extended with extra logic to force/check if the input is of type list[Label] or Label

Example usage:
>>> article = Annotation(...)
>>> article.labels = [Label(...), ...]
>>> labels = article.labels
Returns:

All the linked labels as List[Label] or an empty list if there are no labels linked

property model: Model | None

This property is used for setting and getting the model value. The setter contains extra functionality to cast it to the specific required type.

Example usage:
>>> article = Annotation(...)
>>> article.model = Model(...)
>>> model = article.model
Returns:

The provided model or None if no model is available

property subquery

Property (getter only) to retrieve the subquery for the annotation object. The sub queries are generally used for creation of insert statements, it checks for the user/model annotation status and calls the specific submodules in order to create the complete annotation statement.

Example usage:
>>> article = Annotation(...)
>>> labels = article.subquery
Returns:

The formatted subquery as string

property taxonomy

The property is used for setting and getting the taxonomy value. The setter contains extra functionality to cast it to the specific required type.

Example usage:
>>> article = Annotation(...)
>>> article.date = Taxonomy(...)
>>> taxonomy = article.taxonomy
Returns:

The specific linked taxonomy

property user: User

This property is used for setting and getting the user value. The setter contains extra functionality to cast it to the specific required type.

Example usage:
>>> article = Annotation(...)
>>> article.user = "..."
>>> user = article.user
Returns:

The provided user or None if no model is available

Article

class src.data_models.article.Article(config: DataModelConfig, logger: Logger, uri: str, number: str, content: str)[source]

Bases: Base

This class is mainly used to parse articles from the structured input formate that is received from sparql, into a unstructured text representation.

The main goal here is to simply create an instance of an ‘Article’ and enable abstract logic for data processing. It would be possible to extend this class in order to load from sparql_uri or write to sparql. This however, is out of scope for the current project.

Typical usage example:
>>> article = Article(
    config=DataModelConfig(),
    logger=logging.logger.
    uri="https://data_souce/content/some_article_id/",
    number=1,
    content="Confirmation about the deduction of the ..."
)
property content: str

This property is used for setting and getting the content value. The setter contains extra functionality to cast it to the specifically required type

Example usage:
>>> article = Article(...)
>>> article.content = "..."
>>> article_content = article.content
Returns:

The content as string

property formatted_article

This property returns a formatted string for the Article class This string is formatted like follows: -> {article_number}: {article_text}

Example usage:
>>> article = Article(...)
>>> formatted_article = article.formatted_article
Returns:

The formatted article as string

property number: int

This property is used for setting and getting the number value. The setter contains extra functionality to cast it to the specifically required type

Example usage:
>>> article = Article(...)
>>> article.number = 10
>>> article_number = article.number
Returns:

The article number as int

property uri: str

This property is used for setting and getting the uri value. The setter contains extra functionality to cast it to the specifically required type

Example usage:
>>> article = Article(...)
>>> article.uri = "..."
>>> uri = article.uri
Returns:

The string uri as string

Base

class src.data_models.base.Base(config: DataModelConfig, logger: Logger)[source]

Bases: object

This class implements the basic functionality that is re-used in all data models.

These are:
  1. set commonly used variables (config & logger)

  2. generate_uri

_ensure_encapsulation(str_input)[source]
static generate_uri(base_uri: str) str[source]

This function generates new randomized uri’s based on a prefix and uuid4 combo. These uri’s are required when writing new objects to the sparql endpoint.

(During testing, base_uri is overwritten by TESTING environment value)

Parameters:

base_uri – the base_uri to use for the freshly generated uri

Returns:

The string formatted randomized uri

Decision

class src.data_models.decision.Decision(config: DataModelConfig, logger: Logger, uri: str, annotations: list[Annotation] | Annotation | None = None, articles: list[Article] | Article | None = None, uuid: str = None, description: str = None, short_title: str = None, motivation: str = None, publication_date: str = None, language: str = None, points: str = None)[source]

Bases: Base

This class is used to parse Decisions (and all the submodules) from the structured input that is received from sparql, but also create insert statements for a custom defined instances of this object.

Once the sparql input is parsed in the Decision Object, this class offers extra functionality.

Typical usage example:
>>> decision = Decision(
    config=DataModelConfig(),
    logger=logging.logger.
    uri="https://data_souce/content/some_article_id/",
    annotations=[Annotations(...), ...],
    articles=[Article(...), ...],
    ...
)
property annotations

This property is used to set and retrieve annotation for the specific decision object.

Example usage:
>>> decision = Decision(...)
>>> decision.annotations = [Annotation(...), ...]
>>> annotations = decision.annotations
Returns:

current listed annotations or None if no available annotations

property article_list

This property is used for getting the formatted linked articles.

Example usage:
>>> decision = Decision(...)
>>> formatted_articles = decision.article_list
Returns:

A list of string representations for the linked articles

property articles: list[Article] | None

This property is used for setting and retrieving the articles that are linked to the decision. The setter contains extra functionality to cast it to the specifically required type.

Example usage:
>>> decision = Decision(...)
>>> decision.articles = [Article(...), ...]
>>> articles = decision.articles
Returns:

A list of found Articles

property description: str

This property is used for setting and getting the description value. The setter contains extra functionality to cast it to the specifically required type.

Example usage:
>>> decision = Decision(...)
>>> decision.description = "..."
>>> description = decision.description
Returns:

The string representation for description

classmethod from_sparql(*args, **kwargs)

Class method that creates a decision object from sparql. When provided with an uri from a decision, it will automatically execute all related (sub)queries to populate the Decision object.

Example usage:
>>> annotation = Decision.from_sparql(
        config = DataModelConfig(),
        logger = logging.logger,
        request_handler = ...,
        annotation_uri = "..."
    )
Parameters:
  • annotation_uri – the uri to pull all information for form sparql

  • config – the general config of the project

  • logger – object that can be used to generate logs

  • decision_uri – the string value of the uri that will be used to extract all decision information from

  • request_handler – the request wrapper for sparql interactions

Returns:

an instance of the Decision class

property insert_query

This property is used for getting the insert_query value.

Example usage:
>>> decision = Decision(...)
>>> points = decision.insert_query
Returns:

The string representation for the insert_query

property language: str

This property is used for setting and getting the language value. The setter contains extra functionality to cast it to the specifically required type.

Example usage:
>>> decision = Decision(...)
>>> decision.language = "..."
>>> motivation = decision.language
Returns:

The string representation for language

property last_human_annotation

This property is used for getting the latest human annotation from all current linked annotations.

Example usage:
>>> decision = Decision(...)
>>> lha = decision.last_human_annotation
Returns:

The last human annotation

property motivation: str

This property is used for setting and getting the motivation value. The setter contains extra functionality to cast it to the specifically required type.

Example usage:
>>> decision = Decision(...)
>>> decision.motivation = "..."
>>> motivation = decision.motivation
Returns:

The string representation for motivation

property points: str

This property is used for setting and getting the points value. The setter contains extra functionality to cast it to the specifically required type.

Example usage:
>>> decision = Decision(...)
>>> decision.points = "..."
>>> points = decision.points
Returns:

The string representation for language

property publication_date: str

This property is used for setting and getting the publication_date value. The setter contains extra functionality to cast it to the specifically required type.

Example usage:
>>> decision = Decision(...)
>>> decision.publication_date = "..."
>>> motivation = decision.publication_date
Returns:

The string representation for publication_date

property short_title: str

This property is used for setting and getting the short_title value. The setter contains extra functionality to cast it to the specifically required type.

Example usage:
>>> decision = Decision(...)
>>> decision.short_title = "..."
>>> short_title = decision.short_title
Returns:

The string representation for short_title

property train_record: dict[str, str | list[str]]

This property is used for getting the formatted training records. This can be seen as a dictionary with all relevant values that can be used for training for the specific

Descision object.

Example usage:
>>> decision = Decision(...)
>>> train_records = decision.train_record
Returns:

A list of string representations for the linked articles

property uri: str

This property is used for setting and retrieving the uri for the object. The setter contains extra functionality to cast it to the specifically required type.

Example usage:
>>> decision = Decision(...)
>>> decision.uri = "..."
>>> uri = decision.uri
Returns:

The string value for the uri

property uuid: str

This property is used for setting and getting the uuid value. The setter contains extra functionality to cast it to the specifically required type.

Example usage:
>>> decision = Decision(...)
>>> decision.uuid = "..."
>>> uuid = decision.uuid
Returns:

The string uuid linked to the decision

Label

class src.data_models.label.Label(config: DataModelConfig, logger: Logger, taxonomy_node_uri: str, score: float = 1.0, uri: str = None)[source]

Bases: Base

This class is used for parsing a Label from the structured input that is received from the sparql endpoint, it is parsed into this python object, that enables extended functionality.

Using the config, it is possible to extend this class with custom loading/saving behaviour.

Typical usage example:
>>> label = Label(
        config=DataModelConfig(),
        logger=logging.logger,
        taxonomy_node_uri="...",
        score=1.0,
        uri="..."
    )
classmethod from_sparql(*args, **kwargs)

Class method for class initialization from sparql. When provided with an uri from a Label, it will automatically execute all related queries to populate the object with all necessary information.

Example usage:
>>> annotation = Label.from_sparql(
        config = DataModelConfig(),
        logger = logging.logger,
        request_handler = ...,
        annotation_uri = "..."
    )
Parameters:
  • config – the general DataModelConfig

  • logger – logger object that can be used for logs

  • request_handler – the request wrapper used for sparql requests

  • uri – the uri which is used to find all relevant information

Returns:

an instance of the Annotation Class

property score: float

This property is used for setting and getting the score value. The setter contains extra functionality to cast it to the specifically required type

Example usage:
>>> article = Label(...)
>>> article.taxonomy_node_uri = "..."
>>> uri = article.taxonomy_node_uri
Returns:

The string taxonomy_node_uri

property subquery

Property (getter only) to retrieve the subquery for the Label object. The sub queries are generally used for creation of insert statements, it checks for the user/model annotation status and calls the specific submodules in order to create the complete label statement.

Example usage:
>>> article = Label(...)
>>> labels = article.subquery
Returns:

The formatted subquery as string

property taxonomy_node_uri: str

This property is used for setting and getting the taxonomy_node_uri value. The setter contains extra functionality to cast it to the specifically required type

Example usage:
>>> article = Label(...)
>>> article.taxonomy_node_uri = "..."
>>> uri = article.taxonomy_node_uri
Returns:

The string taxonomy_node_uri

Model

class src.data_models.model.Model(config: DataModelConfig, logger: Logger, name: str = None, mlflow_reference: str = None, date: int = datetime.datetime(2024, 5, 13, 13, 17, 45, 270306), category: str = None, registered_model: str = None, uri: str = None, register: bool = False)[source]

Bases: Base

This class is used for parsing Model isntances from the structured input that is received from the sparql into a python object with extended functionality.

Using the config it is possible to extend this classes functionality for loading and saving custom configs. Given there is some resemblance with the initial sparql schema

Typical usage example:
>>> model = Model(
    config=DataModelConfig(),
    logger=logging.logger.
    mlflow_reference="...",
    category="...",
    register=True
)
property category: str

This property is used for setting and getting the category value. The setter contains extra functionality to cast it to the specifically required type

Example usage:
>>> model = Model(...)
>>> model.category = "..."
>>> name = model.category
Returns:

The string category as string

property date: int

This property is used for setting and getting the date value. The setter contains extra functionality to cast it to the specifically required type.

Example usage:
>>> model = Model(...)
>>> model.date = datetime.now()
>>> date = model.date
Returns:

The integer epoch time value for the provided timestamp

classmethod from_sparql(*args, **kwargs)

This function is the classmethod that creates an instance of the model class from a given model uri.

Parameters:
  • config – the generatl config used in the project

  • logger – the object that can be used for logging

  • request_handler – the request wrapper for sparql

  • uri – the model uri used to poppulate the model object

Returns:

property mlflow_reference: str

This property is used for setting and getting the mlflow_reference value. The setter contains extra functionality to cast it to the specifically required type

Example usage:
>>> model = Model(...)
>>> model.mlflow_reference = "..."
>>> mlflow_reference = model.mlflow_reference
Returns:

The string mlflow_reference as string

property name: str

This property is used for setting and getting the name value. The setter contains extra functionality to cast it to the specifically required type

Example usage:
>>> model = Model(...)
>>> model.name = "..."
>>> name = model.name
Returns:

The string uri as string

property register: bool

This property is used for setting and getting the register value. The setter contains extra functionality to cast it to the specifically required type

Example usage:
>>> model = Model(...)
>>> model.register = True
>>> register = model.register
Returns:

The bool value for register

property registered_model: str

This property is used for setting and getting the registered_model value. The setter contains extra functionality to cast it to the specifically required type

Example usage:
>>> model = Model(...)
>>> model.registered_model = "..."
>>> registered_model = model.registered_model
Returns:

The string registered_model as string

property subquery

Property (getter only) to retrieve the subquery for the Model object. The sub queries are generally used for creation of insert statements. It will automaticly execute the calls for the submodules in order to create the complete annotation statement.

Example usage:
>>> model = Model(...)
>>> sub_query = model.subquery
Returns:

The formatted subquery as string

write_to_sparql(request_handler: RequestHandler)[source]

Taxonomy

class src.data_models.taxonomy.Taxonomy(config: DataModelConfig, logger: Logger, uri: str, label: str = None, children: list[Taxonomy] = None, level: int = 0)[source]

Bases: Base

This class is used for parsing the taxonomy from the structured input that is received from the sparql. The whole idea is that the taxonomy can be of various depth, which makes searching and other extra functionality a recursive problem.

Using a config enables you to have a custom SPARQL retrieval approach, you can adapt the relations and base query.

Typical usage example (different from all other sparql models -> this one only makes sense to init from sparql):
>>> taxonomy = Taxonomy.from_sparql(
        config=DataModelConfig(),
        logger=logging.logger,
        request_handler= RequestHandler(...),
        endpoint = ...
        taxonomy_item_uri = "..."
    )
_remap_tree(**kwargs)

An internal function that allows us to recursively create the taxonomy tree, from the flat query response we got from the sparql.

Working this way is highly optimized compared to recursively executing queries to build the taxonomy tree.

Example usage:
>>> / # internal method, no usage provided
Parameters:
  • entire_tree – Full tree that is pulled from the sparql

  • subselector – Key that sub selects the entire tree

  • curr_depth – Index that identifies what depth we are currently at

Returns:

property all_linked_labels: list[str]

Property that provides getter access to all_linked_labels for the taxonomy isntance

Example usage:
>>> taxonomy = Taxonomy(...)
>>> taxonomy_label = taxonomy.all_linked_labels
Returns:

List of all labels linked to taxonomy

property children

Property that provides a getter and setter for the taxonomy children nodes.

Example usage:
>>> taxonomy = Taxonomy(...)
>>> taxonomy_children = taxonomy.children
>>> taxonomy.children = [Taxonomy(...), ...]
Returns:

The children that are linked to the taxonomy

create_blank_config()[source]

This function generates a blank config for a given taxonomy, this config could be used for creating the correct model inference graph when working with the config based multi layer predictions

Returns:

json object containing the blank configuration

find(**kwargs)

This function allows users to find the exact location of an item on the taxonomy tree. The provided search term will be used in combination with search_kind (LABEL or URI) in order to find the location in the tree. Once the location is found, it will respond with a dictionary that contains all parent nodes. These parent nodes are returned in a structured manner dict[<integer for level>: <taxonomy represented as dict>]]

Example usage:
>>> taxonomy = Taxonomy(...)
>>> taxonomy.find(
    search_term = "Bestuur",
    search_kind = TaxonomyFindTypes.LABEL,
    max_depth = 2
    )
>>> # if the label is not before reaching the max depth, it will not be found
Parameters:
  • search_kind – enum of what values to search on

  • search_term – uri of the taxonomy that has to be found

  • max_depth – maximum depth for the tree search

  • kwargs – extra variables

Returns:

a dictionary containing each level up to the found taxonomy

classmethod from_checkpoint(config: DataModelConfig, logger: Logger, checkpoint_folder: str) Taxonomy[source]

Class method for class initialization from a created checkpoint.

Example usage:
>>> annotation = Taxonomy.from_checkpoint(
        config = DataModelConfig(),
        logger = logging.logger,
        checkpoint_dir = "..."
    )
Parameters:
  • checkpoint_folder – string indication of where the checkpoint is located

  • config – the general DataModelConfig

  • logger – logger object that can be used for logs

Returns:

an instance of the Annotation Class

classmethod from_dict(config: DataModelConfig, logger: Logger, dictionary: dict[Any]) Taxonomy[source]

Class method for class initialization from a dictionary.

Example usage:
>>> annotation = Taxonomy.from_dict(
        config = DataModelConfig(),
        logger = logging.logger,
        dictionary = {...: ...}
    )
Parameters:
  • dictionary – dictionary containing the parsed taxonomy

  • config – the general DataModelConfig

  • logger – logger object that can be used for logs

Returns:

an instance of the Annotation Class

classmethod from_sparql(*args, **kwargs)

Class method for class initialization from sparql. This function creates the taxonomy tree from sparql, the taxonomy tree is created from taxonomy objects that are nested in the children property.

Example usage:
>>> taxonomy = Taxonomy.from_sparql(
        config = DataModelConfig(),
        logger = logging.logger,
        request_handler = ...,
        endpoint = EndpointType.TAXONOMY,
        taxonomy_item_uri = "..."
    )
Parameters:
  • config – the general config used in the project

  • logger – the object used for logging

  • request_handler – the request wrapper for sparql

  • endpoint – the endpoint enum for endpoint reference

  • taxonomy_item_uri – the taxonomy uri

Returns:

an instance of the Taxonomy object

get_labels(**kwargs)

This function generates a flat list of labels for all the nodes in the taxonomy tree.

Example usage:
>>> taxonomy = Taxonomy(...)
>>> labels_up_to_2 = taxonomy.get_labels(max_depth=2)
Parameters:
  • include_tree_indication – when to include the level it is based upon

  • max_depth – The maximum depth level to extract from the label tree

Returns:

a flat list of labels

get_labels_for_node(search_term: str, search_kind: TaxonomyFindTypes)[source]

This function provides the child labels for the given input search term It calls the find function to retrieve the relevant information from the taxonomy tree.

Parameters:
  • search_term

  • search_kind

Returns:

get_level_specific_labels(level: int)[source]

This function provides fucntionality to retrieve ONLY a specific level of the taxonomy.

Example usage:
>>> taxonomy = Taxonomy(...)
>>> level_2_labels = taxonomy.get_level_specific_labels(level=2)
Parameters:

level – The depth you want to retreive the labels from

Returns:

List of found labels

property label: str

Property that provides acces to the label for a given taxonomy

Example usage:
>>> taxonomy = Taxonomy(...)
>>> taxonomy_label = taxonomy.label
>>> taxonomy.label = "..."
Returns:

The label for the taxonomy item

property label2uri

Property (getter only) to retrieve the label2uri for the Taxonomy object.

Example usage:
>>> taxonomy = Taxonomy(...)
>>> uri2label = taxonomy.label2uri
Returns:

The label2uri dictionary

todict(with_children: bool = False, max_depth: int = 10, **kwargs) dict[str, str | list[...]][source]

This function parses the current taxonomy tree to a dictionary

Example usage:
>>> taxonomy = Taxonomy(...)
>>> full_taxonomy_dictionary = taxonomy.todict(with_children=True)
Parameters:
  • max_depth – maximum depth to retrieve child nodes from

  • with_children – flag that allows you to go for full depth

Returns:

dictionary for the given object

property uri: str

Property that provides acces to the uri for a given taxonomy

Example usage:
>>> taxonomy = Taxonomy(...)
>>> taxonomy_uri = taxonomy.uri
>>> taxonomy.uri = "..."
Returns:

The uri for the taxonomy item

property uri2label

Property (getter only) to retrieve the uri2label for the Taxonomy object.

Example usage:
>>> taxonomy = Taxonomy(...)
>>> uri2label = taxonomy.uri2label
Returns:

The uri2label dictionary

TaxonomyType

class src.data_models.taxonomy_type.TaxonomyTypes(config: DataModelConfig, logger: Logger, taxonomies: list[Taxonomy])[source]

Bases: Base

This class is used for providing access to all different taxonomy nodes under the taxonomy masternode, this masternode is a default value that can be overwritten with environment variables. (see config)

Using a config enables you to have a custom SPARQL retrieval approach, you can adapt the relations and base query.

Typical usage example (different from all other sparql models -> this one only makes sense to init from sparql):
>>> taxonomies = TaxonomyTypes.from_sparql(
        config=DataModelConfig(),
        logger=logging.logger,
        request_handler= RequestHandler(...),
        endpoint = ...
    )
classmethod from_sparql(*args, **kwargs)

Class method for class initialization from sparql. This function loads all taxonomies linked to the parent taxonomy node

Example usage:
>>> taxonomy = TaxonomyTypes.from_sparql(
        config = DataModelConfig(),
        logger = logging.logger,
        request_handler = ...,
        endpoint = EndpointType.TAXONOMY
    )
Parameters:
  • config – the general config used in the project

  • logger – object used for logging

  • request_handler – the request wrapper for sparql

  • endpoint – endpoint enum to use for requests

Returns:

an isntance of the taxonomytype object

get(**kwargs)

This function checks the list of existing taxonomies and returns the matching taxonomy

Example usage:
>>> taxonomies = TaxonomyTypes(...)
>>> taxoxnomy = taxonomies.get(taxonomy_uri="...")
Parameters:

taxonomy_uri – taxonomy_uri to check for

Returns:

taxonomy object when it exists

User

class src.data_models.user.User(config: DataModelConfig, logger: Logger, username: str = None, email: str = None, uri: str = None)[source]

Bases: Base

This class is used for parsing the linked user(s) for a given decision.

Currently, this class is not linked to the data infrastructure (annonimety issue)

Typical usage example:
>>> taxonomy = User(
        config=Config(),
        logger=logging.logger,
        username="...",
        email="...@...",
        uri="..."
    )
classmethod from_sparql(*args, **kwargs)

Class method for class initialization from sparql uri

=== This function is currently not used and fully implemented ===

Parameters:
  • config – the general config used in the project

  • logger – object used for logging

  • request_handler – the request wrapper for sparql

  • uri – the user uri to use

Returns:

instance of a user