resources directory contains data describing the Gold Book - a list of known terms in different formats and also a default list of forbidden words. Currently following files are available:
goldbook_terms.xml- a dictionary of GoldBook terms and IDs in XML format.
exclude.xml- default list of forbidden terms.
Dictionary of terms
The most recent version of the XML file can always be found at http://goldbook.iupac.org/goldbook_terms.xml.
The format of the file
The XML format is very simple and does not need much explanation. Of interest are the two attributes of the top-level element -
source (a free-form description of the source of the terms) and
version (a free-form description of the version of the source document this list corresponds to). While these attributes are not used internally for anything useful now, they may help identify the source and version of the terms at hand.
<terms source="IUPAC GoldBook" version="2.1.2"> <term id="C00950">(chain) conformational repeating unit a polymer</term> <term id="C00950">(chain) conformational repeating unit of a polymer</term> <term id="I00954">(chain) identity period a polymer</term> <term id="I00954">(chain) identity period of a polymer</term> <term id="R05294">(chain) repeating distance</term> <term id="R05401">(vertical) rise velocity in flame emission and absorption spectrometry</term> <term id="R05401">(vertical) rise velocity in flame emission and absorption spectrometry, vf</term> ⋮ <term id="C00767">χ-parameter</term> </terms>
As an additional resource to the dictionary of terms, goldify uses a list of forbidden terms - terms that are part of GoldBook, but should not be marked and linked. Such terms fall in several categories:
- Ambiguous terms - such as second, host, net, normal, degree, base.
- Basic terms - terms that are so basic, it has no meaning to give links to definitions - atom, molecule, chemical reaction.
- Common terms - terms that are too frequent that it would polute the text with links to allow them. Among these are carbon, reactant, ion.
Of course these categories are completely artificial and some terms would fall into more than one category.
The user is welcome to modify the default list of terms - either by addition or removal of items.
The format of the file
The structure of this format is even simpler than that of the dictionary of terms. It should be self explanatory.
<exclude> <term>minus</term> <term>error</term> <term>shift</term> <term>net</term> ⋮ <term>bond</term> <term>activity</term> </exclude>