FIELD MARKS FOR WEBSTER 1913 and CIDE
=====================================
Tagset.web:
Explanations of the tags used to mark the Webster 1913 dictionary
and the CIDE (Collaborative International Dictionary of English).
Note that the list of tags used to mark the public domain version
of this dictionary is shorter than the full set described here.
If any tag is not listed here, it is either (1) one of the
"point" (font size) or "type" (font style) tags, which should be self-explanatory; or
(2) Is a functional field with no effect on the typography.
Last modified March 12, 1999.
For questions, contact:
Patrick Cassidy cassidy@micra.com
735 Belvidere Ave.
Plainfield, NJ 07062
(908) 561-3416 or (908) 668-5252
-------------------------------------------------------------
A separate file, webfont.asc, contains the list of the individual
non-ASCII characters represented by either higher-order hexadecimal
character marks (e.g., \'94, for o-umlaut) or by entity tags
(e.g.,
to indicate a line break is symbolized
here as an entity,
has a corresponding
An HTML tag indicating that the enclosed text is
of teletype form, preformatted in a uniform-spaced
font.
small caps (used mostly for "a. d.", "b. c.")
This is the same font a , but has no functional
or semantic significance
group of table data elements in a table
subscript, like
subscript
superscript
superscript
Sans-serif font
Bold (collocation font) and also a subtype.
HTML tage -- teletype font
A squared bold font without serifs approximating the
"universe bold" font on the HP Laserjet4, slightly
larger than the capitals in a definition body. Used
in expositions describing shapes, such as
"Y", "T", "U", "X", "V", "F".
Vertically organized column.
Vertically organized column -- only part of a table
which needs to be completed. Used once.
<...type> A series of tags, many unique, designating certain
unusual fonts, such as "bourgeoistype" for
"bourgeois type", in the section on typography.
Most of these occur only once, in the section on fonts.
=============================================================
Tags with semantic content:
. . . . . . . . . . . . . . . . . . . . . . . . . . .
* Alternative spelling segment. Almost always
contained within square brackets after the main
definition segment. Expository words
such as "Spelled also" are in plain font;
the actual alternative spelling is marked by
... tags within this segment.
italic Antonym.
italic Alternative spelling. The actual word which is an
alternative spelling to the headword. These
are functionally synonyms of the headword. In
most cases these also occur as headwords, with
reference to the word where the actual definition
is found, but not all such words are listed
separately, particularly if the spelling is
close enough to the headword to be found at the
same point in the dictionary. Whether listed
separately or not, these words should
be indexed at this location, also.
italic Authority or author. Used where an authority is
(may be right- given for a definition, and also used for the
justified. See author, where a quotation within double quotes
in the section is given in the same paragraph as the
on formatting). definition. The double quotes are indicated
by the open-quote (\'bd) and close-quote
(\'b8). In both cases, it is typically
right-justified, almost always fitting on
the same line with the last line of the
definition or quotation.
Within collocation segments, it is usually
used only after quotations, and is not right-
justified, except occasionally where it
would be close to the right margin, and then
apparently is is right-justified. We have
not explicitly marked those which are
right-justified, but they can be
recognized because they are on a line by
themselves, preceded by two carriage returns.
* Marks a biography. Should be longer than
a short mention of who a person was, which
is typically included as a definition.
* Same as
italic Marks the name of a book, pamphlet, or similar
document.
* A field of knowledge which of which the headword
is a division.
* Caption of a figure or table.
* tags the CAS (Chemical Abstracts Service) registry
number for a chemical substance.
italic tags the infectious disease caused by the headword.
Implied type of the agent is a microorganism, and
the tag must mark a disease.
* Same as without the italic type.
* Same as without the italic type.
italic inverse of causes: tags the causative agent of an
infectious disease, which is the headword .
the tag must mark a microorganism, virus, or
prion, and the implied type of the headword is
a disease.
Used only for The single letter in the headers to each
letter of the alphabet.
* marks the proper name of a city. Used only
occasionally and not consistently at this stage.
italic Converted to: used to tag substances which are
products prepared by conversion from the
headword. Usually chemicals or complex
products from mnatuarl materials. Rarely used
up to 1998.
* List of heads for the columns of a table.
* Title of a column in a table.
* Comment -- differs from in being in-line with
the definition paragraph. Provides a little
additional information.
* Name of a company (commercial firm). Compare
italic Composed of. Tags a substance of which the
headword is at least partly composed. The
substance may be particulate, such as
diatoms composing diatomaceous earth.
* marks an object contained within the headword.
italic Contrasting word. Not exactly an antonym, which
is marked , but a contrasting word which is
often introduced as "opposite to" or "contrasts
with".
* Name of a country (nation) of the world.
italic Collocation reference. A reference to a collocation.
Each such collocation should have its own entry,
marked by ... tags, and these
references should function as hypertext buttons
to access that entry.
* A Date, of any type, e.g. Dec. 25 .
* Date-with-year tags a date containing a year.
* definition. The definition may have subfields,
particularly (an illustrative phrase
starting with "as" or "thus" and containing
the headword (or a morphological derivative).
The , \'bd...\'b8 quotations (left and
right double quotes) and fields may be
found within a definition field, but should
and usually are located outside the definition
proper. The marking macro was
inconsistent in this placement, and the
exclusion of the , and quotations
needs to be completed by the proof-readers.
Certain definitions contain
fields within them, where the headword is
an irregular derivative of another headword.
In these cases, the field follows
immediately after the tag, and these
entries do not have a separate field.
In such cases, the field is italic, as
usual.
* Division of the headword, usually an organization.
E. g. a faculty or department of a university,
or a United Nations agency.
* Marks an education institution, a subtype of
organization.
* tags a physical object or form of radiation
emitted by the headword
Just a place-holder for illustrations, but seldom used.
italic Marks the name of a movie film.
italic Field of specialization. Most often used for
Zoology and Botany, but many "fields of
specialization" are marked for technical
terms. The parentheses are usually within this
field, but are not themselves in italics.
* Name of a geograpahical region of any size;
if applicable, the more specific ,
, or are preferred.
* Hyperym. Points to the hypernym from WordNet 1.5
Initially, used only for entries extracted
from WordNet 1.5. Not present in the original
1913 version.
* Illustrative usage -- mostly from WordNet, and placed
outside the definition, in contrast to usage.
These should be converted to ... illustrative
usage format for consistency.
* Illustration place-holder. Seldom used.
* HTML usage -- points to an image file, usually
.gif or .jpg. These have no closing tag, and
will appear as errors in parsing.
* Points to a word whose meaning is an intensified
form of the headword. Taken from WordNet
tags, used with some adjectives from WordNet
- * Designates one item in a row of a table. Used only when
intervening spaces do not serve properly as natural
field separaters.
italic Translation into a foreign (non-English) language
of the previous word in the text -- italic font.
( is a translation into English)
italic Same as
* Title of a journal (periodical).
* Always a filled rectangular array.
* A 2x5 matrix (2 rows by 5 columns).
* Multiple synonymous subtypes -- used in
def. of "grass".
* Multiple table, encloses figures.
* Music figure. Only in a note under the entry "Figure",
the two numbers of each such field
are bold, 20 point type, stacked as in a fraction with
a bar between them, but also having a horizontal stroke
midway through each numeral. Unique to this entry.