Main interface¶
Domain-level classes¶
These are the classes users will more frequently interact with.
-
class
packtools.
XMLValidator
(file, dtd=None, style_validators=None)¶ Adapter that performs SPS validations.
- SPS validation stages are:
- JATS 1.0 or PMC 3.0 (as bound by the doctype declaration or passed explicitly)
- SciELO Style - ISO Schematron
- SciELO Style - Python based pipeline
Parameters: - file – etree._ElementTree instance.
- sps_version – the version of the SPS that will be the basis for validation.
- dtd – (optional) etree.DTD instance. If not provided, we try the external DTD.
- style_validators – (optional) list of
packtools.domain.SchematronValidator
objects.
-
annotate_errors
(fail_fast=False)¶ Add notes on all elements that have errors.
The errors list is generated as the result of calling
validate_all()
.
-
assets
¶ Lists all static assets referenced by the XML.
-
lookup_assets
(base)¶ Look for each asset in base, and returns a list of tuples with the asset name and its presence status.
Parameters: base – any container that implements membership tests, i.e. it must support the in
operator.
-
meta
¶ Article metadata.
-
classmethod
parse
(file, no_doctype=False, sps_version=None, supported_sps_versions=None, extra_sch_schemas=None, **kwargs)¶ Factory of XMLValidator instances.
If file is not an etree instance, it will be parsed using
packtools.utils.XML()
.If the DOCTYPE is declared, its public id is validated against a white list, declared by
ALLOWED_PUBLIC_IDS
module variable. The system id is ignored. By default, the allowed values are:- SciELO PS >= 1.2:
-
-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN
- SciELO PS 1.1:
-
-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.0 20120330//EN
--//NLM//DTD Journal Publishing DTD v3.0 20080202//EN
Parameters: - file – Path to the XML file, URL, etree or file-object.
- no_doctype – (optional) if missing DOCTYPE declaration is accepted.
- sps_version – (optional) force the style validation against a SPS version.
- supported_sps_versions – (optional) list of supported versions. the only way to bypass this restriction is by using the arg sps_version.
- extra_sch_schemas – (optional) list of extra Schematron schemas.
- SciELO PS >= 1.2:
-
-
validate
(*args, **kwargs)¶ Validate the source XML against JATS DTD.
Returns a tuple comprising the validation status and the errors list.
-
validate_all
(fail_fast=False)¶ Runs all validations.
First, the XML is validated against the DTD (calling
validate()
). If no DTD is provided and the argumentfail_fast == True
, aTypeError
is raised. After that, the XML is validated against the SciELO style (callingvalidate_style()
).Parameters: fail_fast – (optional) raise TypeError
if the DTD has not been loaded.
-
validate_style
(*args, **kwargs)¶ Validate the source XML against SPS-Style Tagging guidelines.
Returns a tuple comprising the validation status and the errors list.
-
class
packtools.
HTMLGenerator
(file, xslt=None, css=None, print_css=None, js=None, permlink=None, url_article_page=None, url_download_ris=None)¶ Adapter that generates HTML from SPS XML.
Basic usage:
from lxml import etree xml = etree.parse('valid-sps-file.xml') generator = HTMLGenerator(xml) html = generator.generate('pt') html_string = etree.tostring(html, encoding='unicode', method='html')
Parameters: - file – etree._ElementTree instance.
- xslt – (optional) etree.XSLT instance. If not provided, the default XSLT is used.
- css – (optional) URI for a CSS file.
-
generate
(lang)¶ Generates the HTML in the language
lang
.Parameters: lang – 2-digit ISO 639-1 text string.
-
language
¶ The language of the main document.
-
languages
¶ The language of the main document plus all translations.
-
classmethod
parse
(file, valid_only=True, **kwargs)¶ Factory of HTMLGenerator instances.
If file is not an etree instance, it will be parsed using
XML()
.Parameters: - file – Path to the XML file, URL, etree or file-object.
- valid_only – (optional) prevents the generation of HTML for invalid XMLs.
Utils¶
-
packtools.utils.
XML
(file, no_network=True, load_dtd=True)¶ Parses file to produce an etree instance.
The XML can be retrieved given its filesystem path, an URL or a file-object.
Parameters: - file – Path to the XML file, URL or file-object.
- no_network – (optional) prevent network access for external DTD.
- load_dtd – (optional) load DTD during parse-time.
-
class
packtools.utils.
Xray
(zip_file)¶ Zip-file introspector.
Parameters: zip_file – instance of zipfile.ZipFile
.-
close
()¶ Close the archive file.
-
get_file
(member, mode=u'r')¶ Get file object for member.
A complete list of members can be checked calling
show_members()
.Parameters: member – a zip member, e.g. ‘foo.xml’
-
show_members
()¶ Shows the package members.
-
show_sorted_members
()¶ Shows the package members sorted by their file extensions.
-
-
packtools.utils.
cachedmethod
(wrappee)¶ Caches method calls within known arguments.
-
packtools.utils.
config_xml_catalog
(wrapped)¶ Decorator that wraps the execution of a function, setting-up and tearing-down the
XML_CATALOG_FILES
environment variable for the current process.@config_xml_catalog def main(xml_filepath): xml = XMLValidator(xml_filepath) # do some work here
-
packtools.utils.
flatten
(paths)¶ Produces absolute path for each path in paths.
Glob expansions are allowed.
Parameters: paths – Collection of paths. A path can be relative, absolute or a glob expression.
-
packtools.utils.
get_schematron_from_buffer
(buff, parser=<XMLParser object>)¶ Returns an
isoschematron.Schematron
forbuff
.The default parser doesn’t collect ids on a hash table, i.e.:
collect_ids=False
.
-
packtools.utils.
get_static_assets
(xml_et)¶ Returns an iterable with all static assets referenced by xml_et.
-
packtools.utils.
normalize_string
(unistr)¶ Return the NFKC form for the unicode string
unistr
.The normal form KD (NFKD) will apply the compatibility decomposition, i.e. replace all compatibility characters with their equivalents, followed by the canonical composition.
-
packtools.utils.
prettify
(jsonobj, colorize=True)¶ Serialize and prettify a Python object as JSON.
On windows, bypass pygments colorization.
Function copied from Circus process manager: https://github.com/circus-tent/circus/blob/master/circus/circusctl.py
-
packtools.utils.
resolve_schematron_filepath
(value)¶ Determine the filepath for
value
.The lookup is run against all known schemas from
packtools.catalog.SCH_SCHEMAS
. Ifvalue
is already a filepath, than it is returned as it is.
-
packtools.utils.
setdefault
(object, attribute, producer)¶ Like dict().setdefault but for object attributes.