Tooltip clarifications appear when the pointer hovers over blue text in the diplomatic version, while blue in the normalised version indicates the corresponding text (if present). Both screen versions have corresponding red text to show where they differ elsewhere, apart from ſ ~ s. Hover over annotations in the diplomatic version for information on the annotator.

Conventions in screen and file versions are summarised in the table below. (Transcriptions on the MDC site are under development and do not follow these conventions.)

handwritten originalTEI/XML file
diplomatic text
normalised text
diplomatic plain
text file
normalised plain
text file
long sſ
(UTF-8 long s)
(UTF-8 esh)
(normal s)
(long s)
(normal s)
any symbol for and&&&&&
other charactersnormal print equivalentnormal print equivalentnormal print equivalentnormal print equivalentnormal print equivalent
&c. = 'et cetera',
Dr = 'Doctor' (as title),
Mr, Mrs, Messrs, PS, St = 'Saint'
as written,1 untaggedas written1as written1 (inline)as written1 (inline)as written1 (inline)
other abbreviations, incl. Dr = 'Dear', St = 'Street'tagged 1
<abbr> ~ <expan>
as written1expanded (if known)as written1 (inline)expanded (if known)
initial for nametagged
<abbr> ~ <expan>
as writtenexpanded (if known)as writtenexpanded (if known)
dash as punctuation 2 --  --  --  --  -- 
dash to shorten or suppress a word 2——————————
obsolete spelling known at the period (acc. to OED)tagged
<orig> ~ <reg>
as writtennormalisedas writtennormalised
idiosyncratic spelling or errortagged
<sic> ~ <corr>
as writtennormalisedas writtennormalised
initial capital 3as written, untaggedas writtenas writtenas writtenas written
'd for -edas written, (currently) untaggedas writtenas writtenas writtenas written
(non)use of possessive apostrophe, incl. e.g. gen. sg. any bodiesas written, untaggedas writtenas writtenas writtenas written
foreign word or phrase4taggedunmarkedunmarkedunmarkedunmarked
obsolete morphology (e.g. had wrote, She eat some chicken)(some) tagged
<orig> ~ <reg>
[in progress]
as written / as writtenas written / normalisedas writtenas written / normalised
text supplied by editorstagged[supplied text]
+ tooltip
supplied textsupplied text, unmarkedsupplied text, unmarked
text added by writertaggedadded textadded textadded text, unmarkedadded text, unmarked
substitution by writertaggedcancelled text
+ substitute

+ tooltip
substitute onlysubstitute only, unmarkedsubstitute only, unmarked
cancelled texttaggedcancelled text
+ tooltip
text absenttext absenttext absent
cancelled text, unreadable or uncertaintagged
<del> + <gap>
+ tooltip
text absenttext absenttext absent
unreadable or uncertain texttagged
+ tooltip
------ <GAP: nn units> (characters, words, lines)<GAP: nn units> (characters, words, lines)
unclear or damaged but reasonably certain texttaggedunclear text
+ tooltip
unclear textunmarkedunmarked
underline, superscript, subscript, position above/below linetaggedformatting displayedformatting absentformatting absentformatting absent
boundary stroke or line
(≠ word underline)
(some) tagged
[in progress]
thin horizontal linethin horizontal lineignoredignored
new linetaggedas writtenas writtenas writtenas written
word split across lines 5tagged
<orig> ~ <reg>
reassembled on first line without internal punctuationas written5reassembled on first line without internal punctuation
new paragraph at linebreak ± indenttaggedas writtenas writtenno indentno indent
centred text 6(most) tagged
[in progress]
centred, on new linecentred, on new lineleft-aligned, on new lineleft-aligned, on new line
right-aligned text 6taggedright-aligned, on new lineright-aligned, on new lineleft-aligned, on new lineleft-aligned, on new line
new column or pagetaggedruled lineruled lineblank lineblank line
+ tooltip
text absent<CATCHWORD: word>text absent
surplus wordtaggedsurplus word
+ tooltip
word absent<SURPLUS: word>word absent
editorial footnotetagged
+ tooltip
note absentnote absentnote absent
quoted speechtagged
literary or biblical quotationtagged
<cit/quote + bibl>
quoted text
+ tooltip
quoted textquoted textquoted text
line of versetagged
change of hand in letter as senttagged
unmarked unless footnote neededunmarked<HANDSHIFT><HANDSHIFT>
annotation not present in letter as senttagged
+ tooltip
annotation absent<ANNOTATION: annotation>annotation absent
moved section 7original and destination locations tagged
<anchor>, <ref>
at original location
+ tooltip,
footnote at destination
at original locationno indication at original location, <MOVED> at destinationno indication at original location, <MOVED> at destination
Notes to table

1 Any punctuation under superscripted letter(s) in abbreviations is placed last, regardless of relative left-right orientation in the original. Thus, Mr. Mr: Mr– Mr may occur (inline versions Mr. Mr: Mr- Mr), but M.r M:r M-r will not. A letter+macron abbreviation (ac̄ept, com̄and, etc.) is generally expanded as doubling of the letter (accept, command), but note thrō, wc̄h (through, which).

2 The dash as punctuation, represented by two hyphens, always has a space on either side. By contrast, a single unspaced hyphen character is used for normal hyphen (well-known) and horizontal stroke under superscript abbreviation (Mrs). Unspaced double em-dash is used for a dash that suppresses all or part of a name or place (Miſs —— = ‘Miss Goldsworthy’, their —— = ‘their Majesties’, to —— = ‘to Windsor’, Mr. H—— = ‘Mr. Hodges’, Ly– S.—— = ‘Lady Stormont’, the K——g = ‘the King’), shortens a word (by T——w = ‘by Tomorrow’) or euphemistically blanks all or part of a profanity (D——d = ‘Damned’).

3 In some hands it can be difficult to distinguish upper and lower case in word-initial position. Decisions are based on close comparison with other letter-forms in the same hand, but some arbitrariness is inevitable.

4 French and other foreign languages are not normalised – neither corrected nor regularised to present-day grammar and orthography. Place-names are not generally normalised either.

5 Words split across two lines may have a hyphen on the first, the second or both fragments (reco-|ver, imperfect|-ly, satisfacti-|-on); or a double hyphen (pur=|port, dan|=ger, qua=|=litys); or none (respect|ing).

6 Centred text and right alignment are simulated on-screen by extra indentation.

7 Insertions that interrupt the text are moved to their logical point or to the start or end of a letter; address panels are placed at the end.

Project files

The master-copy of each document in the project is an XML file conforming to TEI P5. End-of-line is LF only.

Two different TXT files are derived from each XML file: plain and (partially) normalised. The main purpose of normalisation is to facilitate research and improve future part-of-speech tagging; coverage is subject to change. EOL is CR + LF.

The corpus edited and released to date, with each TXT format in a separate zip file, is freely available for non-profit use to anyone who registers.