sort | uniq
Has anyone has done this before? I separated the Corpus of Old English into genres and sub-genres. It enabled me to find words unique to poetry. The poetic texts are largely from the ASPR, but include Chronicle poems, the Meters of Boethius, and others.
First, I sorted the words into alphabetical order and removed duplicates. Second, I did the same for all prose texts. I also removed all foreign words from the prose texts—those are words that the Dictionary of Old English designated as foreign by placing them within <foreign> tags. Third, I compared prose words with poetic words. The resulting list is a set of all words used only in the poetic texts. Here is the file (right-click to download): PoeticWords
The next step is to classify each word by word class. That will allow me to differentiate verbal phrases from noun phrases in the poetry. Once noun phrases are isolated, I can begin to build a semantic map of poetic discourse in Old English. Afterwards, I’ll add verb phrases. So we’ll be able to know how OE poets described queens (adjectives) and what sort of acts queens performed (verbs), and compare that to descriptions of kings and the acts they performed. We can then further differentiate dryhten from cyning, and cwen from ides. But there’s a big caveat.
Because Old English poets wrote alliterative verse, adjectives and verbs may have been chosen simply on account of their initial sound. So, cwen may have attracted /k/-initial words. That is why it is essential to also build a map in prose of cwen. Since the formal structure of prose was not governed by alliteration (with the possible exception of Ælfric), the map in prose and the map in poetry of any given noun might well be distinct.