Data models
This module’s main purpose is to interface with a sparql data-‘swamp’.
TODO: add more high level info _here_
This module allows us to parse sparql data into structure python objects, and write python objects to sparql.
Annotation
- class src.data_models.annotation.Annotation(config: DataModelConfig, logger: Logger, taxonomy: Taxonomy, date: datetime | int, user: User = None, labels: list[Label] = None, model: Model = None, uri: str = None)[source]
Bases:
BaseThis class is used for parsing annotations from the structured input that is received from the sparql into a python object with extended functionality.
Using the config it is possible to extend this classes functionality for loading and saving custom configs. Given there is some resemblance with the initial sparql schema
An annotation can come from two seperate sources, these are specified as specific properties. The two possible annotation types are:
- User annotation (annotation made by a user)
>>> user_annotation = Annotation( date=datetime.now(), config=DataModelConfig(), logger=logging.logger, taxonomy=Taxonomy(...), user=User(...), labels=[Labels(...), ...], )
- Model annotation (annotation made by a model)
>>> model_annotation = Annotation( date=datetime.now(), config=DataModelConfig(), logger=logging.logger, taxonomy=Taxonomy(...), model=Model(...), labels=[Labels(...), ...], )
For more specific usage, check functions below.
- property date: int
This property is used for setting and getting the date value. The setter contains extra functionality to cast it to the specifically required type.
- Example usage:
>>> article = Annotation(...) >>> article.date = datetime.now() >>> date = article.date
- Returns:
The integer epoch time value for the provided timestamp
- classmethod from_sparql(*args, **kwargs)
Class method for class initialization from sparql. When provided with an uri from an annotations, it will automatically execute all related queries to populate the object with all necessary information.
- Example usage:
>>> annotation = Annotation.from_sparql( config = DataModelConfig(), logger = logging.logger, request_handler = ..., annotation_uri = "..." )
- Parameters:
config – the general DataModelConfig
logger – logger object that can be used for logs
request_handler – the request wrapper used for sparql requests
annotation_uri – the uri which is used to find all relevant information
- Returns:
an instance of the Annotation Class
- property label_uris
This property is used to retrieve the list of taxonomy uris for all the linked labels.
- Returns:
The linked label uris as a list of strings
- property labels: list[Label | ...]
This property returns a the linked label object(s). The property is extended with extra logic to force/check if the input is of type list[Label] or Label
- Example usage:
>>> article = Annotation(...) >>> article.labels = [Label(...), ...] >>> labels = article.labels
- Returns:
All the linked labels as List[Label] or an empty list if there are no labels linked
- property model: Model | None
This property is used for setting and getting the model value. The setter contains extra functionality to cast it to the specific required type.
- Example usage:
>>> article = Annotation(...) >>> article.model = Model(...) >>> model = article.model
- Returns:
The provided model or None if no model is available
- property subquery
Property (getter only) to retrieve the subquery for the annotation object. The sub queries are generally used for creation of insert statements, it checks for the user/model annotation status and calls the specific submodules in order to create the complete annotation statement.
- Example usage:
>>> article = Annotation(...) >>> labels = article.subquery
- Returns:
The formatted subquery as string
- property taxonomy
The property is used for setting and getting the taxonomy value. The setter contains extra functionality to cast it to the specific required type.
- Example usage:
>>> article = Annotation(...) >>> article.date = Taxonomy(...) >>> taxonomy = article.taxonomy
- Returns:
The specific linked taxonomy
- property user: User
This property is used for setting and getting the user value. The setter contains extra functionality to cast it to the specific required type.
- Example usage:
>>> article = Annotation(...) >>> article.user = "..." >>> user = article.user
- Returns:
The provided user or None if no model is available
Article
- class src.data_models.article.Article(config: DataModelConfig, logger: Logger, uri: str, number: str, content: str)[source]
Bases:
BaseThis class is mainly used to parse articles from the structured input formate that is received from sparql, into a unstructured text representation.
The main goal here is to simply create an instance of an ‘Article’ and enable abstract logic for data processing. It would be possible to extend this class in order to load from sparql_uri or write to sparql. This however, is out of scope for the current project.
- Typical usage example:
>>> article = Article( config=DataModelConfig(), logger=logging.logger. uri="https://data_souce/content/some_article_id/", number=1, content="Confirmation about the deduction of the ..." )
- property content: str
This property is used for setting and getting the content value. The setter contains extra functionality to cast it to the specifically required type
- Example usage:
>>> article = Article(...) >>> article.content = "..." >>> article_content = article.content
- Returns:
The content as string
- property formatted_article
This property returns a formatted string for the Article class This string is formatted like follows: -> {article_number}: {article_text}
- Example usage:
>>> article = Article(...) >>> formatted_article = article.formatted_article
- Returns:
The formatted article as string
- property number: int
This property is used for setting and getting the number value. The setter contains extra functionality to cast it to the specifically required type
- Example usage:
>>> article = Article(...) >>> article.number = 10 >>> article_number = article.number
- Returns:
The article number as int
- property uri: str
This property is used for setting and getting the uri value. The setter contains extra functionality to cast it to the specifically required type
- Example usage:
>>> article = Article(...) >>> article.uri = "..." >>> uri = article.uri
- Returns:
The string uri as string
Base
- class src.data_models.base.Base(config: DataModelConfig, logger: Logger)[source]
Bases:
objectThis class implements the basic functionality that is re-used in all data models.
- These are:
set commonly used variables (config & logger)
generate_uri
- static generate_uri(base_uri: str) str[source]
This function generates new randomized uri’s based on a prefix and uuid4 combo. These uri’s are required when writing new objects to the sparql endpoint.
(During testing, base_uri is overwritten by TESTING environment value)
- Parameters:
base_uri – the base_uri to use for the freshly generated uri
- Returns:
The string formatted randomized uri
Decision
- class src.data_models.decision.Decision(config: DataModelConfig, logger: Logger, uri: str, annotations: list[Annotation] | Annotation | None = None, articles: list[Article] | Article | None = None, uuid: str = None, description: str = None, short_title: str = None, motivation: str = None, publication_date: str = None, language: str = None, points: str = None)[source]
Bases:
BaseThis class is used to parse Decisions (and all the submodules) from the structured input that is received from sparql, but also create insert statements for a custom defined instances of this object.
Once the sparql input is parsed in the Decision Object, this class offers extra functionality.
- Typical usage example:
>>> decision = Decision( config=DataModelConfig(), logger=logging.logger. uri="https://data_souce/content/some_article_id/", annotations=[Annotations(...), ...], articles=[Article(...), ...], ... )
- property annotations
This property is used to set and retrieve annotation for the specific decision object.
- Example usage:
>>> decision = Decision(...) >>> decision.annotations = [Annotation(...), ...] >>> annotations = decision.annotations
- Returns:
current listed annotations or None if no available annotations
- property article_list
This property is used for getting the formatted linked articles.
- Example usage:
>>> decision = Decision(...) >>> formatted_articles = decision.article_list
- Returns:
A list of string representations for the linked articles
- property articles: list[Article] | None
This property is used for setting and retrieving the articles that are linked to the decision. The setter contains extra functionality to cast it to the specifically required type.
- Example usage:
>>> decision = Decision(...) >>> decision.articles = [Article(...), ...] >>> articles = decision.articles
- Returns:
A list of found Articles
- property description: str
This property is used for setting and getting the description value. The setter contains extra functionality to cast it to the specifically required type.
- Example usage:
>>> decision = Decision(...) >>> decision.description = "..." >>> description = decision.description
- Returns:
The string representation for description
- classmethod from_sparql(*args, **kwargs)
Class method that creates a decision object from sparql. When provided with an uri from a decision, it will automatically execute all related (sub)queries to populate the Decision object.
- Example usage:
>>> annotation = Decision.from_sparql( config = DataModelConfig(), logger = logging.logger, request_handler = ..., annotation_uri = "..." )
- Parameters:
annotation_uri – the uri to pull all information for form sparql
config – the general config of the project
logger – object that can be used to generate logs
decision_uri – the string value of the uri that will be used to extract all decision information from
request_handler – the request wrapper for sparql interactions
- Returns:
an instance of the Decision class
- property insert_query
This property is used for getting the insert_query value.
- Example usage:
>>> decision = Decision(...) >>> points = decision.insert_query
- Returns:
The string representation for the insert_query
- property language: str
This property is used for setting and getting the language value. The setter contains extra functionality to cast it to the specifically required type.
- Example usage:
>>> decision = Decision(...) >>> decision.language = "..." >>> motivation = decision.language
- Returns:
The string representation for language
- property last_human_annotation
This property is used for getting the latest human annotation from all current linked annotations.
- Example usage:
>>> decision = Decision(...) >>> lha = decision.last_human_annotation
- Returns:
The last human annotation
- property motivation: str
This property is used for setting and getting the motivation value. The setter contains extra functionality to cast it to the specifically required type.
- Example usage:
>>> decision = Decision(...) >>> decision.motivation = "..." >>> motivation = decision.motivation
- Returns:
The string representation for motivation
- property points: str
This property is used for setting and getting the points value. The setter contains extra functionality to cast it to the specifically required type.
- Example usage:
>>> decision = Decision(...) >>> decision.points = "..." >>> points = decision.points
- Returns:
The string representation for language
- property publication_date: str
This property is used for setting and getting the publication_date value. The setter contains extra functionality to cast it to the specifically required type.
- Example usage:
>>> decision = Decision(...) >>> decision.publication_date = "..." >>> motivation = decision.publication_date
- Returns:
The string representation for publication_date
- property short_title: str
This property is used for setting and getting the short_title value. The setter contains extra functionality to cast it to the specifically required type.
- Example usage:
>>> decision = Decision(...) >>> decision.short_title = "..." >>> short_title = decision.short_title
- Returns:
The string representation for short_title
- property train_record: dict[str, str | list[str]]
This property is used for getting the formatted training records. This can be seen as a dictionary with all relevant values that can be used for training for the specific
Descision object.
- Example usage:
>>> decision = Decision(...) >>> train_records = decision.train_record
- Returns:
A list of string representations for the linked articles
- property uri: str
This property is used for setting and retrieving the uri for the object. The setter contains extra functionality to cast it to the specifically required type.
- Example usage:
>>> decision = Decision(...) >>> decision.uri = "..." >>> uri = decision.uri
- Returns:
The string value for the uri
- property uuid: str
This property is used for setting and getting the uuid value. The setter contains extra functionality to cast it to the specifically required type.
- Example usage:
>>> decision = Decision(...) >>> decision.uuid = "..." >>> uuid = decision.uuid
- Returns:
The string uuid linked to the decision
Label
- class src.data_models.label.Label(config: DataModelConfig, logger: Logger, taxonomy_node_uri: str, score: float = 1.0, uri: str = None)[source]
Bases:
BaseThis class is used for parsing a Label from the structured input that is received from the sparql endpoint, it is parsed into this python object, that enables extended functionality.
Using the config, it is possible to extend this class with custom loading/saving behaviour.
- Typical usage example:
>>> label = Label( config=DataModelConfig(), logger=logging.logger, taxonomy_node_uri="...", score=1.0, uri="..." )
- classmethod from_sparql(*args, **kwargs)
Class method for class initialization from sparql. When provided with an uri from a Label, it will automatically execute all related queries to populate the object with all necessary information.
- Example usage:
>>> annotation = Label.from_sparql( config = DataModelConfig(), logger = logging.logger, request_handler = ..., annotation_uri = "..." )
- Parameters:
config – the general DataModelConfig
logger – logger object that can be used for logs
request_handler – the request wrapper used for sparql requests
uri – the uri which is used to find all relevant information
- Returns:
an instance of the Annotation Class
- property score: float
This property is used for setting and getting the score value. The setter contains extra functionality to cast it to the specifically required type
- Example usage:
>>> article = Label(...) >>> article.taxonomy_node_uri = "..." >>> uri = article.taxonomy_node_uri
- Returns:
The string taxonomy_node_uri
- property subquery
Property (getter only) to retrieve the subquery for the Label object. The sub queries are generally used for creation of insert statements, it checks for the user/model annotation status and calls the specific submodules in order to create the complete label statement.
- Example usage:
>>> article = Label(...) >>> labels = article.subquery
- Returns:
The formatted subquery as string
- property taxonomy_node_uri: str
This property is used for setting and getting the taxonomy_node_uri value. The setter contains extra functionality to cast it to the specifically required type
- Example usage:
>>> article = Label(...) >>> article.taxonomy_node_uri = "..." >>> uri = article.taxonomy_node_uri
- Returns:
The string taxonomy_node_uri
Model
- class src.data_models.model.Model(config: DataModelConfig, logger: Logger, name: str = None, mlflow_reference: str = None, date: int = datetime.datetime(2024, 5, 13, 13, 17, 45, 270306), category: str = None, registered_model: str = None, uri: str = None, register: bool = False)[source]
Bases:
BaseThis class is used for parsing Model isntances from the structured input that is received from the sparql into a python object with extended functionality.
Using the config it is possible to extend this classes functionality for loading and saving custom configs. Given there is some resemblance with the initial sparql schema
- Typical usage example:
>>> model = Model( config=DataModelConfig(), logger=logging.logger. mlflow_reference="...", category="...", register=True )
- property category: str
This property is used for setting and getting the category value. The setter contains extra functionality to cast it to the specifically required type
- Example usage:
>>> model = Model(...) >>> model.category = "..." >>> name = model.category
- Returns:
The string category as string
- property date: int
This property is used for setting and getting the date value. The setter contains extra functionality to cast it to the specifically required type.
- Example usage:
>>> model = Model(...) >>> model.date = datetime.now() >>> date = model.date
- Returns:
The integer epoch time value for the provided timestamp
- classmethod from_sparql(*args, **kwargs)
This function is the classmethod that creates an instance of the model class from a given model uri.
- Parameters:
config – the generatl config used in the project
logger – the object that can be used for logging
request_handler – the request wrapper for sparql
uri – the model uri used to poppulate the model object
- Returns:
- property mlflow_reference: str
This property is used for setting and getting the mlflow_reference value. The setter contains extra functionality to cast it to the specifically required type
- Example usage:
>>> model = Model(...) >>> model.mlflow_reference = "..." >>> mlflow_reference = model.mlflow_reference
- Returns:
The string mlflow_reference as string
- property name: str
This property is used for setting and getting the name value. The setter contains extra functionality to cast it to the specifically required type
- Example usage:
>>> model = Model(...) >>> model.name = "..." >>> name = model.name
- Returns:
The string uri as string
- property register: bool
This property is used for setting and getting the register value. The setter contains extra functionality to cast it to the specifically required type
- Example usage:
>>> model = Model(...) >>> model.register = True >>> register = model.register
- Returns:
The bool value for register
- property registered_model: str
This property is used for setting and getting the registered_model value. The setter contains extra functionality to cast it to the specifically required type
- Example usage:
>>> model = Model(...) >>> model.registered_model = "..." >>> registered_model = model.registered_model
- Returns:
The string registered_model as string
- property subquery
Property (getter only) to retrieve the subquery for the Model object. The sub queries are generally used for creation of insert statements. It will automaticly execute the calls for the submodules in order to create the complete annotation statement.
- Example usage:
>>> model = Model(...) >>> sub_query = model.subquery
- Returns:
The formatted subquery as string
- write_to_sparql(request_handler: RequestHandler)[source]
Taxonomy
- class src.data_models.taxonomy.Taxonomy(config: DataModelConfig, logger: Logger, uri: str, label: str = None, children: list[Taxonomy] = None, level: int = 0)[source]
Bases:
BaseThis class is used for parsing the taxonomy from the structured input that is received from the sparql. The whole idea is that the taxonomy can be of various depth, which makes searching and other extra functionality a recursive problem.
Using a config enables you to have a custom SPARQL retrieval approach, you can adapt the relations and base query.
- Typical usage example (different from all other sparql models -> this one only makes sense to init from sparql):
>>> taxonomy = Taxonomy.from_sparql( config=DataModelConfig(), logger=logging.logger, request_handler= RequestHandler(...), endpoint = ... taxonomy_item_uri = "..." )
- _remap_tree(**kwargs)
An internal function that allows us to recursively create the taxonomy tree, from the flat query response we got from the sparql.
Working this way is highly optimized compared to recursively executing queries to build the taxonomy tree.
- Example usage:
>>> / # internal method, no usage provided
- Parameters:
entire_tree – Full tree that is pulled from the sparql
subselector – Key that sub selects the entire tree
curr_depth – Index that identifies what depth we are currently at
- Returns:
- property all_linked_labels: list[str]
Property that provides getter access to all_linked_labels for the taxonomy isntance
- Example usage:
>>> taxonomy = Taxonomy(...) >>> taxonomy_label = taxonomy.all_linked_labels
- Returns:
List of all labels linked to taxonomy
- property children
Property that provides a getter and setter for the taxonomy children nodes.
- Example usage:
>>> taxonomy = Taxonomy(...) >>> taxonomy_children = taxonomy.children >>> taxonomy.children = [Taxonomy(...), ...]
- Returns:
The children that are linked to the taxonomy
- create_blank_config()[source]
This function generates a blank config for a given taxonomy, this config could be used for creating the correct model inference graph when working with the config based multi layer predictions
- Returns:
json object containing the blank configuration
- find(**kwargs)
This function allows users to find the exact location of an item on the taxonomy tree. The provided search term will be used in combination with search_kind (LABEL or URI) in order to find the location in the tree. Once the location is found, it will respond with a dictionary that contains all parent nodes. These parent nodes are returned in a structured manner dict[<integer for level>: <taxonomy represented as dict>]]
- Example usage:
>>> taxonomy = Taxonomy(...) >>> taxonomy.find( search_term = "Bestuur", search_kind = TaxonomyFindTypes.LABEL, max_depth = 2 ) >>> # if the label is not before reaching the max depth, it will not be found
- Parameters:
search_kind – enum of what values to search on
search_term – uri of the taxonomy that has to be found
max_depth – maximum depth for the tree search
kwargs – extra variables
- Returns:
a dictionary containing each level up to the found taxonomy
- classmethod from_checkpoint(config: DataModelConfig, logger: Logger, checkpoint_folder: str) Taxonomy[source]
Class method for class initialization from a created checkpoint.
- Example usage:
>>> annotation = Taxonomy.from_checkpoint( config = DataModelConfig(), logger = logging.logger, checkpoint_dir = "..." )
- Parameters:
checkpoint_folder – string indication of where the checkpoint is located
config – the general DataModelConfig
logger – logger object that can be used for logs
- Returns:
an instance of the Annotation Class
- classmethod from_dict(config: DataModelConfig, logger: Logger, dictionary: dict[Any]) Taxonomy[source]
Class method for class initialization from a dictionary.
- Example usage:
>>> annotation = Taxonomy.from_dict( config = DataModelConfig(), logger = logging.logger, dictionary = {...: ...} )
- Parameters:
dictionary – dictionary containing the parsed taxonomy
config – the general DataModelConfig
logger – logger object that can be used for logs
- Returns:
an instance of the Annotation Class
- classmethod from_sparql(*args, **kwargs)
Class method for class initialization from sparql. This function creates the taxonomy tree from sparql, the taxonomy tree is created from taxonomy objects that are nested in the children property.
- Example usage:
>>> taxonomy = Taxonomy.from_sparql( config = DataModelConfig(), logger = logging.logger, request_handler = ..., endpoint = EndpointType.TAXONOMY, taxonomy_item_uri = "..." )
- Parameters:
config – the general config used in the project
logger – the object used for logging
request_handler – the request wrapper for sparql
endpoint – the endpoint enum for endpoint reference
taxonomy_item_uri – the taxonomy uri
- Returns:
an instance of the Taxonomy object
- get_labels(**kwargs)
This function generates a flat list of labels for all the nodes in the taxonomy tree.
- Example usage:
>>> taxonomy = Taxonomy(...) >>> labels_up_to_2 = taxonomy.get_labels(max_depth=2)
- Parameters:
include_tree_indication – when to include the level it is based upon
max_depth – The maximum depth level to extract from the label tree
- Returns:
a flat list of labels
- get_labels_for_node(search_term: str, search_kind: TaxonomyFindTypes)[source]
This function provides the child labels for the given input search term It calls the find function to retrieve the relevant information from the taxonomy tree.
- Parameters:
search_term
search_kind
- Returns:
- get_level_specific_labels(level: int)[source]
This function provides fucntionality to retrieve ONLY a specific level of the taxonomy.
- Example usage:
>>> taxonomy = Taxonomy(...) >>> level_2_labels = taxonomy.get_level_specific_labels(level=2)
- Parameters:
level – The depth you want to retreive the labels from
- Returns:
List of found labels
- property label: str
Property that provides acces to the label for a given taxonomy
- Example usage:
>>> taxonomy = Taxonomy(...) >>> taxonomy_label = taxonomy.label >>> taxonomy.label = "..."
- Returns:
The label for the taxonomy item
- property label2uri
Property (getter only) to retrieve the label2uri for the Taxonomy object.
- Example usage:
>>> taxonomy = Taxonomy(...) >>> uri2label = taxonomy.label2uri
- Returns:
The label2uri dictionary
- todict(with_children: bool = False, max_depth: int = 10, **kwargs) dict[str, str | list[...]][source]
This function parses the current taxonomy tree to a dictionary
- Example usage:
>>> taxonomy = Taxonomy(...) >>> full_taxonomy_dictionary = taxonomy.todict(with_children=True)
- Parameters:
max_depth – maximum depth to retrieve child nodes from
with_children – flag that allows you to go for full depth
- Returns:
dictionary for the given object
- property uri: str
Property that provides acces to the uri for a given taxonomy
- Example usage:
>>> taxonomy = Taxonomy(...) >>> taxonomy_uri = taxonomy.uri >>> taxonomy.uri = "..."
- Returns:
The uri for the taxonomy item
- property uri2label
Property (getter only) to retrieve the uri2label for the Taxonomy object.
- Example usage:
>>> taxonomy = Taxonomy(...) >>> uri2label = taxonomy.uri2label
- Returns:
The uri2label dictionary
TaxonomyType
- class src.data_models.taxonomy_type.TaxonomyTypes(config: DataModelConfig, logger: Logger, taxonomies: list[Taxonomy])[source]
Bases:
BaseThis class is used for providing access to all different taxonomy nodes under the taxonomy masternode, this masternode is a default value that can be overwritten with environment variables. (see config)
Using a config enables you to have a custom SPARQL retrieval approach, you can adapt the relations and base query.
- Typical usage example (different from all other sparql models -> this one only makes sense to init from sparql):
>>> taxonomies = TaxonomyTypes.from_sparql( config=DataModelConfig(), logger=logging.logger, request_handler= RequestHandler(...), endpoint = ... )
- classmethod from_sparql(*args, **kwargs)
Class method for class initialization from sparql. This function loads all taxonomies linked to the parent taxonomy node
- Example usage:
>>> taxonomy = TaxonomyTypes.from_sparql( config = DataModelConfig(), logger = logging.logger, request_handler = ..., endpoint = EndpointType.TAXONOMY )
- Parameters:
config – the general config used in the project
logger – object used for logging
request_handler – the request wrapper for sparql
endpoint – endpoint enum to use for requests
- Returns:
an isntance of the taxonomytype object
- get(**kwargs)
This function checks the list of existing taxonomies and returns the matching taxonomy
- Example usage:
>>> taxonomies = TaxonomyTypes(...) >>> taxoxnomy = taxonomies.get(taxonomy_uri="...")
- Parameters:
taxonomy_uri – taxonomy_uri to check for
- Returns:
taxonomy object when it exists
User
- class src.data_models.user.User(config: DataModelConfig, logger: Logger, username: str = None, email: str = None, uri: str = None)[source]
Bases:
BaseThis class is used for parsing the linked user(s) for a given decision.
Currently, this class is not linked to the data infrastructure (annonimety issue)
- Typical usage example:
>>> taxonomy = User( config=Config(), logger=logging.logger, username="...", email="...@...", uri="..." )
- classmethod from_sparql(*args, **kwargs)
Class method for class initialization from sparql uri
=== This function is currently not used and fully implemented ===
- Parameters:
config – the general config used in the project
logger – object used for logging
request_handler – the request wrapper for sparql
uri – the user uri to use
- Returns:
instance of a user