The resources
directory contains data describing the Gold Book - a list of known terms in different formats and also a default list of forbidden words. Currently following files are available:
goldbook_terms.xml
- a dictionary of GoldBook terms and IDs in XML format.goldbook_terms.js
- a dictionary of GoldBook terms and IDs in JavaScript format. Also contains a default list of forbidden terms.exclude.xml
- default list of forbidden terms.
Dictionary of terms
The dictionary of terms used by Goldify is a mapping between text terms that should be marked up in the source documents and IDs of these terms. When a hit is found in the text, this hit is converted to a link using an URL created from its ID (the process depends on implementation - in JavaScript its hardcoded, in Java it is configurable.)
The most recent version of the XML file can always be found at http://goldbook.iupac.org/goldbook_terms.xml.
The format of the file
The format of the dictionary file depends on implementation. In JavaScript it is a native dictionary (hash), in Java it is stored in XML format. Because the latter is more portable, we will describe the XML version here.
The XML format is very simple and does not need much explanation. Of interest are the two attributes of the top-level element - source
(a free-form description of the source of the terms) and version
(a free-form description of the version of the source document this list corresponds to). While these attributes are not used internally for anything useful now, they may help identify the source and version of the terms at hand.
<terms source="IUPAC GoldBook" version="2.1.2"> <term id="C00950">(chain) conformational repeating unit a polymer</term> <term id="C00950">(chain) conformational repeating unit of a polymer</term> <term id="I00954">(chain) identity period a polymer</term> <term id="I00954">(chain) identity period of a polymer</term> <term id="R05294">(chain) repeating distance</term> <term id="R05401">(vertical) rise velocity in flame emission and absorption spectrometry</term> <term id="R05401">(vertical) rise velocity in flame emission and absorption spectrometry, vf</term> ⋮ <term id="C00767">χ-parameter</term> </terms>
Forbidden terms
As an additional resource to the dictionary of terms, goldify uses a list of forbidden terms - terms that are part of GoldBook, but should not be marked and linked. Such terms fall in several categories:
- Ambiguous terms - such as second, host, net, normal, degree, base.
- Basic terms - terms that are so basic, it has no meaning to give links to definitions - atom, molecule, chemical reaction.
- Common terms - terms that are too frequent that it would polute the text with links to allow them. Among these are carbon, reactant, ion.
Of course these categories are completely artificial and some terms would fall into more than one category.
The user is welcome to modify the default list of terms - either by addition or removal of items.
The format of the file
The structure of this format is even simpler than that of the dictionary of terms. It should be self explanatory.
<exclude> <term>minus</term> <term>error</term> <term>shift</term> <term>net</term> ⋮ <term>bond</term> <term>activity</term> </exclude>