Image to Text: Mary Hamilton papers

Editorial policies

Screen presentation

Tooltip clarifications appear when the pointer hovers over blue text in the diplomatic version, while blue in the normalised version indicates the corresponding text (if present). Both screen versions have corresponding red text to show where they differ elsewhere, apart from ſ ~ s. Hover over annotations in the diplomatic version for information on the annotator.

Conventions in screen and file versions are summarised in the table below.

handwritten originalTEI/XML file
filename.xml
diplomatic text
on-screen
normalised text
on-screen
diplomatic plain
text file
filename.txt
normalised plain
text file
filename-n.txt
long sſ
(UTF-8 long s)
ʃ
(UTF-8 esh)
s
(normal s)
ſ
(long s)
s
(normal s)
any symbol for and&&&&&
other charactersnormal print equivalentnormal print equivalentnormal print equivalentnormal print equivalentnormal print equivalent
&c. = 'et cetera',
Dr = 'Doctor' (as title),
Mr, Mrs, Messrs, PS, St = 'Saint'
as written,1 untaggedas written1as written1 (inline)as written1 (inline)as written1 (inline)
other abbreviations, incl. Dr = 'Dear', St = 'Street'tagged 1
<abbr> ~ <expan>
as written1expanded (if known)as written1 (inline)expanded (if known)
initial for nametagged
<abbr> ~ <expan>
as writtenexpanded (if known)as writtenexpanded (if known)
dash as punctuation 2 --  --  --  --  -- 
obsolete spelling known at the period (acc. to OED)tagged
<orig> ~ <reg>
as writtennormalisedas writtennormalised
idiosyncratic spelling or errortagged
<sic> ~ <corr>
as writtennormalisedas writtennormalised
initial capital 3as written, untaggedas writtenas writtenas writtenas written
'd for -edas written, (currently) untaggedas writtenas writtenas writtenas written
(non)use of possessive apostrophe, incl. e.g. gen. sg. any bodiesas written, untaggedas writtenas writtenas writtenas written
foreign word or phrasetaggedunmarkedunmarkedunmarkedunmarked
obsolete morphology (e.g. had wrote, She eat some chicken)untagged
(a few tagged
<orig> ~ <reg>)
as written / as writtenas written / normalisedas writtenas written / normalised
text supplied by editorstagged[supplied text]
+ tooltip
supplied textunmarkedunmarked
text added by writertaggedunmarkedunmarkedunmarkedunmarked
substitution by writertaggedcancelled text
+ substitute

+ tooltip
substitute onlysubstitute only, unmarkedsubstitute only, unmarked
cancelled texttaggedcancelled text
+ tooltip
text absenttext absenttext absent
cancelled text, unreadable or uncertaintagged
<del> + <gap>
------
+ tooltip
text absenttext absenttext absent
unreadable or uncertain texttagged
<gap>
------
+ tooltip
------ <GAP: nn units> (characters, words, lines)<GAP: nn units> (characters, words, lines)
unclear or damaged but reasonably certain texttaggedunclear text
+ tooltip
unclear textunmarkedunmarked
underline, superscript, subscript, position above/below linetaggedformatting displayedformatting absentformatting absentformatting absent
boundary stroke or line
(≠ word underline)
[edits in progress]
tagged
<milestone>
thin horizontal linethin horizontal lineignoredignored
new linetaggedas writtenas writtenas writtenas written
word split across lines 4tagged
<orig> ~ <reg>
as
written4
reassembled on first line without internal punctuationas written4reassembled on first line without internal punctuation
new para at linebreak ± indenttaggedas writtenas writtenno indentno indent
new semantic para in midline ± spacingtaggedas writtenas writtenignoredignored
right-aligned text 5taggedright-aligned, on new lineright-aligned, on new lineleft-aligned, on new lineleft-aligned, on new line
new column or pagetaggedruled lineruled lineblank lineblank line
catchwordtagged
<fw>
catchword
+ tooltip
text absent<CATCHWORD: xyz>text absent
surplus wordtaggedsurplus word
+ tooltip
word absent<SURPLUS: xyz>word absent
editorial footnotetagged
<note/@resp>
lemma[numeral]
+ tooltip
note absentnote absentnote absent
quotationtagged
<cit/quote + bibl>
quoted text
+ tooltip
quoted textquoted textquoted text
line of versetagged
<l>
unmarkedunmarkedunmarkedunmarked
change of hand in letter as senttagged
<handShift>
unmarked unless footnote neededunmarked<HANDSHIFT><HANDSHIFT>
annotation not present in letter as senttagged
<note/@hand>
annotation
+ tooltip
annotation absent<ANNOTATION: annotation>annotation absent
moved section 6original and destination locations tagged
<anchor>, <ref>
at original location
+ tooltip,
footnote at destination
at original locationno indication at original location, <MOVED> at destinationno indication at original location, <MOVED> at destination
Notes to table

1 Any punctuation under superscripted letter(s) in abbreviations is placed last, regardless of relative left-right orientation in the original. Thus, Mr. Mr: Mr– Mr may occur (inline versions Mr. Mr: Mr- Mr), but M.r M:r M-r will not. A letter+macron abbreviation (ac̄ept, com̄and, etc.) is generally expanded as doubling of the letter (accept, command), but note thrō, wc̄h (through, which).

2 The dash (represented by two hyphens) always has a space on either side. By contrast, a single hyphen character is used for normal hyphen (well-known), suppressed letters (K-g = ‘King’, D-d = ‘Damned’) and horizontal stroke under superscript abbreviation (Mrs).

3 In some hands it can be difficult to distinguish upper and lower case in word-initial position. Decisions are based on close comparison with other letter-forms in the same hand, but some arbitrariness is inevitable.

4 Words split across two lines may have a hyphen on the first, the second or both fragments (reco-|ver, imperfect|-ly, satisfacti-|-on); or a double hyphen (pur=|port, dan|=ger, qua=|=litys); or none (respect|ing).

5 Currently, only left alignment and right alignment are distinguished. Centred text is usually treated as right-aligned on-screen.

6 Insertions that interrupt the text are moved to their logical point or to the start or end of a letter; address panels are placed at the end.

Project files

The master-copy of each document in the project is an XML file conforming to TEI P5. End-of-line is LF only.

Two different TXT files are derived from each XML file: plain and (partially) normalised. The main purpose of normalisation is to facilitate research and improve future part-of-speech tagging; coverage is subject to change. EOL is CR + LF.

The whole corpus edited to date, with each TXT format in a separate zip file, is freely available for non-profit use to anyone who registers. Just fill in our simple online form here.