.. _thesauri: Thesauri ======== Introduction ------------ A **thesaurus** is a structured vocabulary used to manage and standardize keywords (also known as tags) that describe resources. It helps improve metadata quality, searchability, and interoperability by enforcing controlled vocabularies. Key Functions ------------- Controlled Vocabulary Instead of free-text keywords, thesauri offer predefined and **standardized terms** organized thematically (e.g., ISO 19115 topics, GEMET, INSPIRE themes). Semantic Consistency Users tagging datasets can choose from a consistent list of terms, **reducing redundancy** and ambiguity (e.g., avoiding both "roads" and "road" as separate tags). Improved search and filtering Thesauri enable structured tagging of datasets, allowing more accurate searches and the use of **faceted filters** to easily narrow down results. Localization Each concept in a thesaurus can have **translations for different languages**, allowing localized display based on the user’s interface language. Metadata Standards Integration Thesauri can align with **international standards** (like ISO, INSPIRE, GEMET), which is especially important when GeoNode is used in institutional or governmental contexts. Data model ---------- The *GeoNode thesaurus model* is designed to support multilingual, structured vocabularies. It consists of the following key components: Thesaurus: * Represents a full controlled vocabulary (e.g., GEMET, INSPIRE themes). * In SKOS terms, it's a ``skos:ConceptScheme``. ThesaurusLabel: * Stores the localized names (titles/descriptions) of a thesaurus for different languages. * In SKOS terms, it's a ``skos:preflabel`` within the ``skos:ConceptScheme``. ThesaurusKeyword: * Represents a single concept or term within a thesaurus (e.g., "Land Cover", "Transport"), also storing the default label (used where the translation for a given requested language is not defined) and its identifying URI. * In SKOS terms, it's a ``skos:Concept``. ThesaurusKeywordLabel: * Stores the multilingual labels for each keyword. * In SKOS terms, it's a ``skos:preflabel`` within the ``skos:Concept``. .. _thesaurus_add: Adding a Thesaurus ================== A thesaurus can be added in Geonode by: * creating a new thesaurus instance within the GeoNode admin pages. As a minumum, you need to: * add a thesaurus in admin / base / Thesaurus * add one or more instances of Keywords in admin / base / ThesaurusKeywords * uploading a RDF file (either xml, ttl, jsonld or any other format recognized by `RDFlib `__). When uploading a file, the behaviour is the same as running the command ``thesaurus load --action update`` (see :ref:`load_thesaurus`) * loading a RDF file using the `thesaurus load` management command (see :ref:`load_thesaurus`). Upload an RDF file via the thesaurus admin page ----------------------------------------------- Navigate to the thesaurus page in the admin panel ``http:///admin/base/thesaurus``. On the top-right of the page a button named :guilabel:`Upload thesaurus` will be available: .. figure:: img/thesaurus_admin_1.png :align: center After clicking on it, a simple form for the upload will be shown which will allow you to select your desired RDF file: .. figure:: img/thesaurus_admin_2.png :align: center By clicking on `Upload RDF`, the system will load the thesaurus and assign it a "slugified" name based on the file name. The name can be easily changed later in the edit page. If everything goes fine, a success message will be shown: .. figure:: img/thesaurus_admin_success.png :align: center Otherwise the UI will show the error message: .. figure:: img/thesaurus_admin_fail.png :align: center Management commands =================== GeoNode provides a single command (``thesaurus``) with multiple actions: * ``list``: list existing thesauri * ``load``: load a RDF file * ``dump``: dump a thesaurus into a file .. code-block:: python manage.py thesaurus --help usage: manage.py thesaurus [-h] [-i [IDENTIFIER]] [-f [FILE]] [--action {create,update,append,parse}] [-o [OUT]] [--include INCLUDE] [--exclude EXCLUDE] [--format {json-ld,n3,nt,pretty-xml,sorted-xml,trig,ttl,xml}] [--default-lang LANG] [--version] [-v {0,1,2,3}] [--settings SETTINGS] [--pythonpath PYTHONPATH] [--traceback] [--no-color] [--force-color] [--skip-checks] [{list,load,dump}] Handles thesaurus commands ['list', 'load', 'dump'] positional arguments: {list,load,dump} thesaurus operation to run options: -h, --help show this help message and exit --version Show program's version number and exit. -v {0,1,2,3}, --verbosity {0,1,2,3} Verbosity level; 0=minimal output, 1=normal output, 2=verbose output, 3=very verbose output --settings SETTINGS The Python path to a settings module, e.g. "myproject.settings.main". If this isn't provided, the DJANGO_SETTINGS_MODULE environment variable will be used. --pythonpath PYTHONPATH A directory to add to the Python path, e.g. "/home/djangoprojects/myproject". --traceback Raise on CommandError exceptions. --no-color Don't colorize the command output. --force-color Force colorization of the command output. --skip-checks Skip system checks. Common params: -i [IDENTIFIER], --identifier [IDENTIFIER] Thesaurus identifier. Dump: required. Load: optional - if omitted will be created out of the filename Params for "load" subcommand: -f [FILE], --file [FILE] Full path to a thesaurus in RDF format --action {create,update,append,parse} Actions to run upon data loading (default: create) Params for "dump" subcommand: -o [OUT], --out [OUT] Full path to the output file to be created --include INCLUDE Inclusion filter (wildcard is * as suffix or prefix); can be repeated --exclude EXCLUDE Exclusion filter (wildcard is * as suffix or prefix); can be repeated --format {json-ld,n3,nt,pretty-xml,sorted-xml,trig,ttl,xml} Format string supported by rdflib, or sorted-xml (default: sorted-xml) --default-lang LANG Default language code for untagged string literals (default: None) List thesauri: ``thesaurus list`` --------------------------------- Get a list of the thesauri in GeoNode. Useful to find out the id of the thesauri when you need to export one of them. .. _load_thesaurus: Importing a thesaurus: ``thesaurus load`` ----------------------------------------- The ``load`` command may create an entire Thesaurus, or just update part of it. Allowed params: * ``file``: file to load; required * ``action``: ``create``, ``update``, ``append``, ``parse``; optional, default ``create``; * ``identifier``: the id of the thesaurus; optional, defaults to a name created using the filename. The **automatic identifier creation** skips all the chars after the first dot in order to allow a thesaurus partitioning. For instance we may have different rdf files containing the labels for multiple projects, e.g.: ``labels-i18n.proj1.rdf``, ``labels-i18n.proj2.rdf``... We may simply loop on the filenames and run the ``load`` subcommand on each of them, and all the keywords will be added to the same Thesaurus having id ``labels-i18n``. The ``load`` command has different behaviours according to the ``action`` parameter: Actions: * ``parse``: parse the file and loop on all the concepts without writng anything in the db. Is equivalent to the classic ``dryrun`` option; * ``create`` (default action): tries and create a thesaurus. If the thesaurus already exists, raises an exception. * ``append``: creates entries if they do not exist; pk are the ones listed in update action. If the entry already exists, it is not changed in any way. * ``update``: creates and updates entries: * *Thesaurus*: creates it if it doesn't exist, pk is "identifier". If it exists updates "date", "description", "title", "about" * *ThesaurusLabel*: creates it if it doesn't exist, pk is "thesaurus", "lang" If it exists updates "value" * *ThesaurusKeyword*: creates it if it doesn't exist, pk is "thesaurus", "about" If it exists updates "alt_label" * *ThesaurusKeywordlabel*: creates it if it doesn't exist, pk is "thesauruskeyword", "lang" If it exists updates "label" .. _dump_thesaurus: Exporting a thesaurus: ``thesaurus dump`` ----------------------------------------- The ``dump`` command may export an entire Thesaurus or just a part of it. Allowed params: - ``identifier``: the id of the thesaurus; required. - ``include``: Optional; filter ThesaurusKeywords to be dumped. Can be repeated. Filtering is applied on the `about` field. Filters are in the format either `*string` or `string*` - ``exclude``: Optional; like `include` but filter out ThesaurusKeywords from being dumped. - ``format``: optional, RDF format for the output (``json-ld``, ``n3``, ``nt``, ``pretty-xml``, ``sorted-xml``, ``trig``, ``ttl``, ``xml``). Default ``sorted-xml`` - ``default-lang``: Default language code for untagged string literals; default is from ``settings.THESAURUS_DEFAULT_LANG`` - ``out``: Full path to the output file to be created. Optional; if omitted the RDF content is sent to stderr. Format ^^^^^^ All the formats, except for ``sorted-xml``, use the *RDFlib* library to serialize the thesaurus data. Since RDFlib handles the concepts as a graph, there is no ordering in the output data. This means that two consecutive ``dump`` of the same thesaurus may create two different files. When importing and exporting thesauri as a file, it may be useful to perform diff on them to find out what has changed. The format ``sorted-xml`` creates a predictable output, where the ConceptScheme is at the start of the file, and the Concepts are sorted by their ``about`` field. Furthermore, the ``prefLabel``'s are sorted by their ``lang`` attribute. Partial export ^^^^^^^^^^^^^^ The ``dump`` command also allows to export a subset of the keywords (concepts) in a Thesaurus. As an example, let's say we have the ``labels-i18n`` thesaurus, which contains some GeoNode official labels. In our project we added some keywords prefixed with "proj1_", since they belong to project1. Also in our GeoNode instance, we added some labels which override the standard ones, and are postfixed with ``_ovr``. In order to only export the entries we edited, we'll issue the command:: python manage.py thesaurus dump -i labels-i18n --include "proj1_*" --include "*_ovr" -f labels-i18n.proj1.rdf Configuring a Thesaurus ======================= After a thesaurus is loaded o created in GeoNode, it should be configured in the :guilabel:`Admin` panel. The panel can be reached from :guilabel:`Admin` link of the *User Menu* in the navigation bar or through this URL: ``http:///admin/base/thesaurus``. Once you are on the Thesaurus lists, select one thesaurus to open the Edit page .. figure:: img/thesaurus_edit_page.png :align: center *The GeoNode Thesaurus edit Interface* These are the thesaurus main attributes: - ``identifier``: (mandatory) the thesaurus identifier (set by the ``--identifier`` parameter in the ``thesaurus load`` command, or automatically generated using the file name). - ``title``: (mandatory) The default title of the thesaurus (may be set from the loaded RDF file). - ``date``: (mandatory) The Date of the thesaurus (may be set from the loaded RDF file). - ``description``: (mandatory) The description of the thesaurus (may be set from the loaded RDF file). - ``slug``: (deprecated, use ``identifier`` instead) The slug of the thesaurus. - ``about``: (optional) The ``rdf:about`` URI of the thesaurus (may be set from the loaded RDF file). Next attributes define **how the thesaurus shall be used** within GeoNode. - ``card min``: (optional) The minimum cardinality, default = 0 - ``card max``: (optional) The maximum cardinality, default = -1 (no limit) - ``facet``: (boolean) Decide if the thesaurus will be shown in the facet list, default: True -- To be set to `true` only when ``card_max != 0`` - ``order``: (integer) Set the listing order of the thesaurus in the facet list and in the metadata editor, default: 0, asc order from 0 to N If ``card max`` is not zero, the metadata editor will automatically display the Thesaurus in the list of the controlled terms. More precisely these are the cases according to the two cardinality fields: - `card_max=0` --> Disabled, The Thesaurus will not appear in the GUI - `card_max=1` & `card_min = 0` --> Single choice, optional. - `card_max=1` & `card_min = 1` --> Single choice, required - `card_max=-1` & `card_min = 0` --> [0..N] Multiple choices, optional - `card_max=-1` & `card_min = 1` --> [1..N] Multiple choices, required The metadata editor will show all the thesauri with ``card_max != 0``, each one with its own title, like in the following image: .. figure:: img/thesaurus_choices.png :align: center *The metadata interface with the Thesaurus enabled* The Thesauri having ``card_max == 0`` are used as **codelists**: it means that they will be referred within GeoNode via their identifier for specific purposes. There will be ad-hoc documentation for each of such codelists. For instance, the thesaurus with identifier ``labels-i18n`` is used for the metadata labels translations. Using keywords from a thesaurus =============================== After you've finished the setup you should find a new input widget in the metadata editor allowing you to choose keywords from the thesaurus for your resource. Also, if you set the ``facet`` attribute to ``true``, the thesaurus should be listed in the filter section in GeoNode's resource list views. For instance, if we have these thesauri: .. figure:: img/thesaurus_admin_list.png :align: center :width: 450px *List of configured sample thesauri* both set with ``card max != 0`` and ``facet = true``, we'll have in the editor: .. figure:: img/thesaurus_edit_sample.png :align: center :width: 450px *Keyword selectors for sample thesauri* and we'll also have them in the filtering panel as facets: .. figure:: img/thesaurus_facet_sample.png :align: center *Facets selectors for sample thesauri*