The Knowledge Middleware (KM) is designed to provide an intermediate job queue for management of long-running extraction and profiling tasks. It enables the Terarium HMI to request large knowledge discovery and curation tasks to be performed asynchronously, with robust error handling, and with customized ETL of backend service responses into Terarium specific schemas and specifications. It currently supports the following functions:
- Equation to AMR: both LaTeX and MathML
- Code to AMR: code snippets only
- PDF Extraction: via Cosmos
- Variable Extraction: via SKEMA
- Data Card: via MIT
- Model Card: via MIT
- Model/Paper Linking: via SKEMA
- Run
make initwhich will create a stub.envfile. - Ensure that
.envcontains the correct endpoint and other information - You can now run the tests or reports using the information below.
- You can also run
KMwithmake up
Important Note: Running make up will target the regular docker-compose.yaml file. This file expects to be running with Terarium Data Service(TDS) on that same machine as it looks for a docker network set up by TDS. In order to run this stack standalone (pointing to a remote TDS installation) use the command make up-prod.
KM provides a TA1 integration test harness that powers the ASKEM Integration Dashboard. It makes it easy to add new test cases and scenarios which will automatically be evaluated and surfaced in the dashboard. Additionally, the KM test harness can be run offline for development purposes. Running the `KM`` test harness requires docker compose. Please see reporting/README.md for more information on how to run the test harness locally.
Scenarios should be added in the reporting/scenarios directory. When you create a new directory in reporting/scenarios the directory name constitutes the new scenario's name.
In the scenarios directory, you'll find multiple example scenarios. To add a new scenario, start by creating a directory with the name of your scenario. Within this directory, include a file named description.txt containing a detailed scenario description. Additionally, each scenario must have at least one of the following assets:
paper.pdf: the paper to extractcode.zip: the zipfile of a code repodyanmics.*: a file that contains only code representing the core dynamics of the model. The file should have the correct programming language extension; e.g.dynamics.pyequations.latex.txt: a set of equations representing the model dynamicsdataset.csv: a dataset to profile
Note: each scenario should ONLY have one of
[code.zip, dynamics.*, equations.latex.txt]. This is what will be used to generate a model.
You can use the existing scenarios as examples while following these guidelines to prepare your new scenario for inclusion in the system.
KM also has an extensive unit testing suite. You can run the tests by initializing the environment:
poetry install
poetry shell
Then from the top of the repo run:
pytest tests
You can generate a coverage report with:
pytest --cov . tests
You can add the flag
--cov-report htmlto generate an HTML report
note: the live option will be removed soon
Set environment variable MOCK_TA1 to False and adding the correct endpoints for TA1 will send real payloads to TA1 services and validate the results.
To add additional scenarios, create a new directory in tests/scenarios. The directory must contain a config.yaml where each tests you wish to be run
will be specified in enabled.
The .env will be used to specify the MOCK_TA1 setting as well as the appropriate endpoints and can be passed into the test suite with:
poetry shell && export $(cat .env | xargs) && pytest -s
Run poetry run poe report, to generate tests/output/report.json which contains the status of each scenario and operation.
Once the report has been generated, run poetry run streamlit run tests/Home.py to run the web interface into the test suite, which will be available at http://localhost:8501.
Note: if the tests fail,
poetry poewill exit and not generate a report. To work around this, runpytest --json-report --json-report-file=tests/output/tests.jsonthenpython tests/report.pymanually.
Test scenarious can be added to tests/scenarios. Each scenario should have it's own directory and must contain a config.yaml file which provides a name for the scenario and indicates which test(s) should be run. See scenarios/basic for a boilerplate that runs all test cases.
The files required to run each scenario are defined in tests/resources.yaml. Note that some files are considered optional: e.g. ground_truth_model_card.