Texts & Resources

This section of our site contains various texts. We're starting with a selection of Shakespeare plays and American Civil War documents; we're planning to add other texts such as the Iliad. All of the texts here are complete and unabridged. They've come from sources such as Project Gutenberg, and are being provided here without charge and within the terms of the original source. If you have any suggestions for other texts to add, then please let us know via our suggestions page.

We recommmend that you read our FAQs page to make best use of the features in the Search Visualizer. This discusses issues such as how the Search Visualizer handles partial word matches, case sensitivity, etc.

Shakespeare plays

Our Symmetry in Shakespeare document contains some examples of how you can use Search Visualizer to gain new insights into structures within his plays.

The Shakespeare plays on this site are from Project Gutenberg, and are subject to the following copyright statement. They are available without charge on this site, for non-commercial use.

THIS ELECTRONIC VERSION OF THE COMPLETE WORKS OF WILLIAM SHAKESPEARE IS COPYRIGHT 1990-1993 BY WORLD LIBRARY, INC., AND IS PROVIDED BY PROJECT GUTENBERG ETEXT OF ILLINOIS BENEDICTINE COLLEGE WITH PERMISSION. ELECTRONIC AND MACHINE READABLE COPIES MAY BE DISTRIBUTED SO LONG AS SUCH COPIES (1) ARE FOR YOUR OR OTHERS PERSONAL USE ONLY, AND (2) ARE NOT DISTRIBUTED OR USED COMMERCIALLY. PROHIBITED COMMERCIAL DISTRIBUTION INCLUDES BY ANY SERVICE THAT CHARGES FOR DOWNLOAD TIME OR FOR MEMBERSHIP.

Shakespeare hints and tips

In the Shakespeare plays on this site, the stage directions, including names of characters when used as stage directions, are in uppercase (e.g. ROMEO). When a character is referred to by name by another character, they are in lowercase with the first letter of the name in uppercase (e.g. Romeo). At time of writing, the Search Visualizer is not case-sensitive, so it will show both ROMEO and Romeo as hits for "Romeo" and for "ROMEO" and "romeo".

When searching for short words, it's advisable to check whether you have Search Visualizer set to accept partial matches. If it's set to accept partial matches, then a search for "sing" will show matches within "singer" and "singing". If it's set for whole word search, then it will only show matches for the word "sing". If you're searching for things like how often "he" is mentioned compared to "she" then you need to use whole word search, otherwise you may get false positives because the word "he" is a partial match for "she" and "the" etc.

American Civil War documents

Our document "Mentions of War: Analyzing large quantities of historical text visually" contains some examples of how you can use Search Visualizer to search and to study large historical documents. The examples include finding an individual with a common name within a large document, and identifying similarities, differences and patterns in content and structure across two or more documents.

The American Civil War official records documents on this part of the Search Visualizer site are from the archive at Cornell University at:
http://digital.library.cornell.edu/m/moawar/waro.html

That archive contains the complete collected official war records for the armies and navies on both sides.

The records contain significant quantities of text – typically about a thousand pages per volume. The text has been scanned in by OCR and contains a moderate proportion of typographic errors as a result. We have not edited the text in the files on this site, apart from splitting the files into more manageable sizes, so these errors remain in the text. This will lead to some false negatives, where a word is corrupted by a typographic error and is missed by the Search Visualizer as a result; for instance, a sentence which claimed that gnus were entering a city (presumably intended to be "guns"). The rate of false negatives for a given search due to typographic errors will probably be around 1% (i.e. the SV will detect about 99% of the words that it would have detected if there were no typos).

The copyright statement for Cornell is here:
http://cdl.library.cornell.edu/guidelines.html

The copyright statement for their use on the Search Visualizer site is here:
Pages/ACWcopyright.aspx

We have selected three volumes from the army archives, from different stages in the war – the beginning, the turning point at the Battle of Gettysburg, and the end of the war.

The first volume in this selection is Series 1, Volume 1. We have split this into three files to make it more tractable. The Search Visualizer can handle the full-sized volume, but most readers will want to be able to compare text from before and after key points, so we have split the volume accordingly.

The first file, Volume 1a, runs from the start of the official war records before the war, to the end of the chapter dealing with the siege and surrender of Fort Sumter.

The second file, Volume 1b, contains the remaining chapters of Volume 1, apart from the index. These chapters deal with the secession of several states, and with operations in the South.

The third file, Volume 1 Index, contains only the index for Volume 1. We have separated out the index so that the Search Visualizer results for the main text aren't complicated by hits from the index.

The second volume we have selected deals with the period around the Battle of Gettysburg: Series 1, Volume 27, part 1.

We have divided it into two files.

The first file, contains the whole of the body text of this volume, excluding only the index. Unlike the other volumes in our selection, this volume deals with a single central event and a relatively short period of time, with no logical dividing point within it.

The second file contains the index for this volume.

The third volume we have selected deals with the end of the war: Series 1, Volume 49, part 1.

We have divided it into four files as follows.

The first file contains the opening section of this volume, Union records, and ends with the capture of Jefferson Davis.

The second file contains the next section of the volume, consisting of Union records from after the capture of Jefferson Davis, and concluding at the start of the section containing Confederate records.

The third file contains the third section of this volume, containing Confederate records.

The fourth file contains the index for this volume.

Comparison of 'love' (hightlighted in red) in Romeo and Juliet (left) and Anthony and Cleopatra (right)

Romeo and Juliet Anthony and Cleopatra

Mentions of the word cavalry in the official war record volume for Gettysburg. The cavalry play a prominent role in scouting before the battle, and in the early stages of the battle. They also play a prominent role in the aftermath of the battle. This example illustrates how SV can show underlying structure in large documents – this text is over half a million words long, but it fits into a single SV image.

Gettysburg