INDRA WM service modules

exception indra_wm_service.InvalidCorpusError[source]
class indra_wm_service.curator.LiveCurator(scorer=None, corpora=None, eidos_url=None, ont_manager=None, cache=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/indra-wm-service/checkouts/latest/indra_wm_service/_local_cache'))[source]

Class coordinating the real-time curation of a corpus of Statements.

Parameters:
  • scorer (indra.belief.BeliefScorer) – A scorer object to use for the curation
  • corpora (dict[str, Corpus]) – A dictionary mapping corpus IDs to Corpus objects.
get_corpus(corpus_id, check_s3=True, use_cache=True)[source]

Return a corpus given an ID.

If the corpus ID cannot be found, an InvalidCorpusError is raised.

Parameters:
  • corpus_id (str) – The ID of the corpus to return.
  • check_s3 (bool) – If True, look on S3 for the corpus if it’s not currently loaded. Default: True
  • use_cache (bool) – If True, look in local cache before trying to find corpus on s3. If True while check_s3 if False, this option will be ignored. Default: False.
Returns:

The corpus with the given ID.

Return type:

Corpus

get_curations(corpus_id, reader=None)[source]

Download curations for corpus id filtered to reader

Parameters:
  • corpus_id (str) – The ID of the corpus to download curations from
  • reader (str) – The name of the reader to filter to. Has to be among valid reader names of ‘all’.
Returns:

A dict containing the requested curations

Return type:

dict

reset_scorer()[source]

Reset the scorer used for curation.

save_curations(corpus_id, save_to_cache=True)[source]

Save the current state of curations for a corpus given its ID

If the corpus ID cannot be found, an InvalidCorpusError is raised.

Parameters:
  • corpus_id (str) – the ID of the corpus to save the
  • save_to_cache (bool) – If True, also save the current curation to the local cache. Default: True.
submit_curations(curations, save=True)[source]

Submit correct/incorrect curations fo a given corpus.

Parameters:
  • curations (list of dict) – A list of curationss.
  • save (bool) – If True, save the updated curations to the local cache. Default: True
update_beliefs(corpus_id, project_id=None)[source]

Return updated belief scores for a given corpus.

Parameters:corpus_id (str) – The ID of the corpus for which beliefs are to be updated.
Returns:A dictionary of belief scores with keys corresponding to Statement UUIDs and values to new belief scores.
Return type:dict
update_metadata(corpus_id, meta_data, save_to_cache=True)[source]

Update the meta data for a given corpus

Parameters:
  • corpus_id (str) – The ID of the corpus to update the meta data for
  • meta_data (dict) – A json compatible dict containing the meta data
  • save_to_cache (bool) – If True, also update the local cache of the meta data dict. Default: True.
class indra_wm_service.corpus.Corpus(corpus_id, statements=None, raw_statements=None, meta_data=None, aws_name='wm', cache=PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/indra-wm-service/checkouts/latest/indra_wm_service/_local_cache'))[source]

Represent a corpus of statements with curation.

Parameters:
  • corpus_id (str) – The key by which the corpus is identified.
  • statements (list[indra.statement.Statement]) – A list of INDRA Statements to embed in the corpus.
  • raw_statements (list[indra.statement.Statement]) – A List of raw statements forming the basis of the statements in ‘statements’.
  • meta_data (dict) – A dict with meta data associated with the corpus
  • aws_name (str) – The name of the profile in the AWS credential file to use. ‘default’ is used by default.
  • cache (Pathlib.path) – A Pathlib.path object representing the path to a cache folder.
statements

A dict of INDRA Statements keyed by UUID.

Type:dict
raw_statements

A list of the raw statements

Type:list
curations

A list keeping track of the curations submitted so far for Statements in the corpus.

Type:list
get_curations(look_in_cache=False)[source]

Get curations for the corpus

Parameters:look_in_cache (bool) – If True, look in local cache if there are no curations loaded
Returns:The curations for this corpus, if any
Return type:dict
s3_get(bucket='world-modelers', cache=True, raise_exc=False)[source]

Fetch a corpus object from S3 in the form of three json files

The json files representing the object have S3 keys of the format <s3key>/statements.json and <s3key>/raw_statements.json.

Parameters:
  • bucket (str) – The S3 bucket to fetch the Corpus from. Default: ‘world-modelers’.
  • cache (bool) – If True, look for corpus in local cache instead of loading it from s3. Default: True.
  • raise_exc (bool) – If True, raise InvalidCorpusError when corpus failed to load
s3_put(bucket='world-modelers', cache=True)[source]

Push a corpus object to S3 in the form of three json files

The json files representing the object have S3 keys of the format <key_base_name>/<name>/<file>.json

Parameters:
  • bucket (str) – The S3 bucket to upload the Corpus to. Default: ‘world-modelers’.
  • cache (bool) – If True, also create a local cache of the corpus. Default: True.
Returns:

keys – A tuple of three strings giving the S3 key to the pushed objects

Return type:

tuple(str)

save_curations_to_cache()[source]

Save current curations to cache

upload_curations(look_in_cache=False, save_to_cache=False, bucket='world-modelers')[source]

Upload the current state of curations for the corpus

Parameters:
  • look_in_cache (bool) – If True, when no curations are available check if there are curations cached locally. Default: False
  • save_to_cache (bool) – If True, also save current curation state to cache. If look_in_cache is True, this option will have no effect. Default: False.
  • bucket (str) – The bucket to upload to. Default: ‘world-modelers’.