Main interface

Domain-level classes

These are the classes users will more frequently interact with.

class packtools.XMLValidator(file, dtd=None, style_validators=None)

Adapter that performs SPS validations.

SPS validation stages are:
  • JATS 1.0 or PMC 3.0 (as bound by the doctype declaration or passed explicitly)
  • SciELO Style - ISO Schematron
  • SciELO Style - Python based pipeline
Parameters:
  • file – etree._ElementTree instance.
  • sps_version – the version of the SPS that will be the basis for validation.
  • dtd – (optional) etree.DTD instance. If not provided, we try the external DTD.
  • style_validators – (optional) list of packtools.domain.SchematronValidator objects.
annotate_errors(fail_fast=False)

Add notes on all elements that have errors.

The errors list is generated as the result of calling validate_all().

assets

Lists all static assets referenced by the XML.

lookup_assets(base)

Look for each asset in base, and returns a list of tuples with the asset name and its presence status.

Parameters:base – any container that implements membership tests, i.e. it must support the in operator.
meta

Article metadata.

classmethod parse(file, no_doctype=False, sps_version=None, supported_sps_versions=None, extra_sch_schemas=None, **kwargs)

Factory of XMLValidator instances.

If file is not an etree instance, it will be parsed using packtools.utils.XML().

If the DOCTYPE is declared, its public id is validated against a white list, declared by ALLOWED_PUBLIC_IDS module variable. The system id is ignored. By default, the allowed values are:

  • SciELO PS >= 1.2: - -//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN
  • SciELO PS 1.1: - -//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN - -//NLM//DTD Journal Publishing DTD v3.0 20080202//EN
Parameters:
  • file – Path to the XML file, URL, etree or file-object.
  • no_doctype – (optional) if missing DOCTYPE declaration is accepted.
  • sps_version – (optional) force the style validation against a SPS version.
  • supported_sps_versions – (optional) list of supported versions. the only way to bypass this restriction is by using the arg sps_version.
  • extra_sch_schemas – (optional) list of extra Schematron schemas.
validate(*args, **kwargs)

Validate the source XML against JATS DTD.

Returns a tuple comprising the validation status and the errors list.

validate_all(fail_fast=False)

Runs all validations.

First, the XML is validated against the DTD (calling validate()). If no DTD is provided and the argument fail_fast == True, a TypeError is raised. After that, the XML is validated against the SciELO style (calling validate_style()).

Parameters:fail_fast – (optional) raise TypeError if the DTD has not been loaded.
validate_style(*args, **kwargs)

Validate the source XML against SPS-Style Tagging guidelines.

Returns a tuple comprising the validation status and the errors list.

class packtools.HTMLGenerator(file, xslt=None, css=None, print_css=None, js=None, permlink=None, url_article_page=None, url_download_ris=None)

Adapter that generates HTML from SPS XML.

Basic usage:

from lxml import etree

xml = etree.parse('valid-sps-file.xml')
generator = HTMLGenerator(xml)

html = generator.generate('pt')
html_string = etree.tostring(html, encoding='unicode', method='html')
Parameters:
  • file – etree._ElementTree instance.
  • xslt – (optional) etree.XSLT instance. If not provided, the default XSLT is used.
  • css – (optional) URI for a CSS file.
generate(lang)

Generates the HTML in the language lang.

Parameters:lang – 2-digit ISO 639-1 text string.
language

The language of the main document.

languages

The language of the main document plus all translations.

classmethod parse(file, valid_only=True, **kwargs)

Factory of HTMLGenerator instances.

If file is not an etree instance, it will be parsed using XML().

Parameters:
  • file – Path to the XML file, URL, etree or file-object.
  • valid_only – (optional) prevents the generation of HTML for invalid XMLs.

Utils

packtools.utils.XML(file, no_network=True, load_dtd=True)

Parses file to produce an etree instance.

The XML can be retrieved given its filesystem path, an URL or a file-object.

Parameters:
  • file – Path to the XML file, URL or file-object.
  • no_network – (optional) prevent network access for external DTD.
  • load_dtd – (optional) load DTD during parse-time.
class packtools.utils.Xray(zip_file)

Zip-file introspector.

Parameters:zip_file – instance of zipfile.ZipFile.
close()

Close the archive file.

get_file(member, mode=u'r')

Get file object for member.

A complete list of members can be checked calling show_members().

Parameters:member – a zip member, e.g. ‘foo.xml’
show_members()

Shows the package members.

show_sorted_members()

Shows the package members sorted by their file extensions.

packtools.utils.cachedmethod(wrappee)

Caches method calls within known arguments.

packtools.utils.config_xml_catalog(wrapped)

Decorator that wraps the execution of a function, setting-up and tearing-down the XML_CATALOG_FILES environment variable for the current process.

@config_xml_catalog
def main(xml_filepath):
    xml = XMLValidator(xml_filepath)
    # do some work here
packtools.utils.flatten(paths)

Produces absolute path for each path in paths.

Glob expansions are allowed.

Parameters:paths – Collection of paths. A path can be relative, absolute or a glob expression.
packtools.utils.get_schematron_from_buffer(buff, parser=<XMLParser object>)

Returns an isoschematron.Schematron for buff.

The default parser doesn’t collect ids on a hash table, i.e.: collect_ids=False.

packtools.utils.get_static_assets(xml_et)

Returns an iterable with all static assets referenced by xml_et.

packtools.utils.normalize_string(unistr)

Return the NFKC form for the unicode string unistr.

The normal form KD (NFKD) will apply the compatibility decomposition, i.e. replace all compatibility characters with their equivalents, followed by the canonical composition.

packtools.utils.prettify(jsonobj, colorize=True)

Serialize and prettify a Python object as JSON.

On windows, bypass pygments colorization.

Function copied from Circus process manager: https://github.com/circus-tent/circus/blob/master/circus/circusctl.py

packtools.utils.resolve_schematron_filepath(value)

Determine the filepath for value.

The lookup is run against all known schemas from packtools.catalog.SCH_SCHEMAS. If value is already a filepath, than it is returned as it is.

packtools.utils.setdefault(object, attribute, producer)

Like dict().setdefault but for object attributes.