Skip to content

eghuro/dcat-dry

Repository files navigation

DCAT DRY

github licence

DCAT-AP Dataset Relationship Indexer. Indexing linked data and relationships between datasets.

Features:
  • index a distribution or a SPARQL endpoint
  • extract and index distributions from a DCAT catalog
  • extract a DCAT catalog from SPARQL endpoint and index distributions from it
  • generate a dataset profile
  • show related datasets based mainly on DataCube and SKOS vocabularies
  • indexing sameAs identities and related concepts

Build & run with Docker

For DCAT-DRY service only:

docker build . -t dcat-dry
docker run -p 80:8000 --name dcat-dry dcat-dry

For the full environment use docker-compose:

docker-compose up --build

Build & run manually

CPython 3.8+ is supported.

Install redis server first. In following example we will assume it runs on localhost, port 6379 and DB 0 is used.

Setup postgresql server as well. In the following example we will assume it runs on localhost, port 5432, DB is postgres and user/password is postgres:example

You will need some libraries installed: libxml2-dev libxslt-dev libleveldb-dev libsqlite3-dev and sqlite3

Run the following commands to bootstrap your environment

git clone https://github.com/eghuro/dcat-dry
cd dcat-dry
poetry install --with robots,gevent --without dev
# Start redis and postgres servers

# Export environment variables
export REDIS_CELERY=redis://localhost:6379/1
export REDIS=redis://localhost:6379/0
export DB=postgresql+psycopg2://postgres:example@localhost:5432/postgres

# Setup the database
alembic upgrade head

# Run concurrently
celery -A tsa.celery worker -l debug -Q high_priority,default,query,low_priority -c 4
gunicorn -w 4 -b 0.0.0.0:8000 --log-level debug app:app
nice -n 10 celery -l info -A tsa.celery beat

In general, before running shell commands, set the FLASK_APP and FLASK_DEBUG environment variables

export FLASK_APP=autoapp.py
export FLASK_DEBUG=1

Deployment

To deploy:

export FLASK_DEBUG=0
# Follow commands above to bootstrap the environment

In your production environment, make sure the FLASK_DEBUG environment variable is unset or is set to 0, so that ProdConfig is used.

Shell

To open the interactive shell, run

flask shell

By default, you will have access to the flask app.

Running Tests

To run all tests, run

flask test

Before execution

# Prepare couchdb

curl -X PUT http://admin:[email protected]:5984/_users
curl -X PUT http://admin:[email protected]:5984/_replicator
curl -X PUT http://admin:[email protected]:5984/_global_changes

# Migrate database

alembic upgrade head

API

To start batch scan, run

flask batch -g /tmp/graphs.txt -s http://10.114.0.2:8890/sparql

Get a full result

/api/v1/query/analysis

Query a dataset

/api/v1/query/dataset?iri=http://abc