Tutorial

XML Catalog configuration

An XML Catalog is a lookup mechanism which can be used to prevent network requests from being performed while loading external DTDs.

For performance and safety, instances of stylechecker.XMLValidator do not perform network connections, so we strongly recommend that you set up an XML catalog, which translates public ids to local file URIs.

packtools is shipped with a standard catalog, and can be used basically in 2 ways:

  1. Registering packtools’ catalog in the super catalog with the appropriate delegates, which can be done by adding the following lines to make the file /etc/xml/catalog looks like (this is preferred for production):
<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD Entity Resolution XML Catalog V1.0//EN" "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
    <delegatePublic publicIdStartString="-//NLM//DTD JATS"
                    catalog="file://<packtools_dir>/packtools/catalogs/scielo-publishing-schema.xml"/>
    <delegatePublic publicIdStartString="-//NLM//DTD Journal"
                    catalog="file://<packtools_dir>/packtools/catalogs/scielo-publishing-schema.xml"/>
    <delegateSystem systemIdStartString="JATS-journalpublishing1.dtd"
                    catalog="file://<packtools_dir>/packtools/catalogs/scielo-publishing-schema.xml"/>
    <delegateSystem systemIdStartString="journalpublishing3.dtd"
                    catalog="file://<packtools_dir>/packtools/catalogs/scielo-publishing-schema.xml"/>
    <delegateSystem systemIdStartString="http://jats.nlm.nih.gov/publishing/"
                    catalog="file://<packtools_dir>/packtools/catalogs/scielo-publishing-schema.xml"/>
</catalog>

This shell script can help you with the task.

2. Setting the environment variable XML_CATALOG_FILES with the absolute path to <packtools_dir>/packtools/catalogs/scielo-publishing-schema.xml. This setup can also be made by the main Python program, so for these cases a constant pointing to the catalog file is also provided.

import os
from packtools.catalogs import XML_CATALOG
os.environ['XML_CATALOG_FILES'] = XML_CATALOG

In some cases where the system’s entry-point is a single function, for instance the main function, a special helper decorator can be used, as follows:

from packtools.utils import config_xml_catalog
@config_xml_catalog
def main():
    """At this point the XML Catalog is configured"""

More information at http://xmlsoft.org/catalog.html#Simple

Settings up the logger handler

It is expected that the application using packtools defines a logger for packtools, e.g.:

import logging
logging.getLogger('packtools').addHandler(logging.StreamHandler())

See the official docs for more info.

Validation basics

The validation of an XML document is performed through instances of packtools.XMLValidator. The easiest way to get an instance is by running packtools.XMLValidator.parse(), which in addition to accepting absolute or relative path to file in the local filesystem, URL, etree objects, or file-objects, it also loads the most appropriate validation schemas to the document according to its version.

import packtools
xmlvalidator = packtools.XMLValidator.parse('path/to/file.xml')

The validation can be performed in two levels: DTD and SciELO Style. To do this, the packtools.XMLValidator.validate() and packtools.XMLValidator.validate_style() methods are available, respectively. Full validation can be performed with the packtools.XMLValidator.validate_all() method. All these methods return a tuple comprising the validation status and the errors list.

import packtools
xmlvalidator = packtools.XMLValidator.parse('path/to/file.xml')
is_valid, errors = xmlvalidator.validate_all()