Guides: Introduction to Text Data Mining: Armstrong Browning Library's Victorian Collection: Home

Workshop Materials

Victorian Collection Datasets
Login Required
Introduction Slideshow
Jupyter Notebook on Google Colab
Voyant Tools
AntConc

Director of the Liaison Program
Research & Engagement

Ellen Hampton Filgo

Email Me

Contact:

Ellen_Filgo@baylor.edu
Jones 121

710-2968

Subjects: Communication, Film & Digital Media, Honors College, Journalism, PR & New Media, Women's & Gender Studies

Curator, Armstrong Browning Library

Laura French

Email Me

Contact:

Armstrong Browning Library
710 Speight Avenue
Waco, Texas 76798-7152

254-710-4959

Subjects: Special Collections

Workshop Procedures

The Armstrong Browning Library is home to the world's largest collection of Robert Browning and Elizabeth Barrett Browning research resources. Robert Browning, May 7, 1812 – December 12, 1889, is the British poet credited with creating and popularizing the dramatic monolog form of poetry. He was so popular that Browning Societies dedicated to gathering together to read and discuss his work began during his lifetime and continue to this day. Robert Browning was married to Elizabeth Barrett Browning, March 6, 1806- June 29, 1861, one of the foremost British poet of the 19th Century.

A. J. Armstrong was a Robert Browning scholar and Chair of Baylor's English Department from 1912-1952. In 1918, Armstrong donated his personal library of books and periodicals by and about Robert Browning to Baylor University Library. He continued to gather together all possible items of interest in connection with Robert Browning for an intensive or extensive study of the poet into Baylor's Browning Collection. When the collection outgrew its home in Carroll Library, Armstrong undertook fundraising to build a library specifically for Baylor's Browning Collection. Construction on the Armstrong Browning Library completed in 1951.

The Victorian Collection includes more than 8,000 letters and manuscripts by or to Browning family members or other prominent, as well as less known, British and American figures. The Armstrong Browning Library acquired some of these items because of either the author's or recipient’s (intended audience’s) connection to the Brownings. In many instances there was a single Browning resource included as part of a group of 19^th century items. The collection includes letters and manuscripts from many notable nineteenth-century authors such as Charles Dickens, William Wordsworth, Samuel Taylor Coleridge, Thomas Carlyle, John Henry Newman, George MacDonald, and John Ruskin. The collection also includes letters and manuscripts from political figures, religious leaders, scientists, artists, art collectors, and explorers. To increase awareness of the Victorian Collection, the Armstrong Browning Library has digitized more than 3,000 of the Victorian Collection’s letters and manuscripts

Victorian Collection in ABL

Download Victorian Collection Workshop Data Here

What is Metadata? The Victorian Collection metadata contains Descriptive, Structural, and Administrative metadata. The metadata also include the full text, where digitized.	Simply, metadata is information about a dataset.
The victorian_table_raw.csv contains the data extract from the Baylor University Libraries Digital Collections. Each row represents a document page.
Descriptive Fields	Title First Line Date Author Recipient Location Envelope Address Physical Description Format Language Notes Books Mentioned People Mentioned Places Mentioned * Transcript (full text of page)
Structural Fields	DI ABLID Physical Location
Administrative Fields	Custodian Rights Resource Type
How documents were transcribed...	Student workers manually transcribing pages.

Click to Launch PowerPoint

Seven Broad Text Data Mining Workflow Procedures

* Workshop focuses on highlighted items

Identify Sources for Corpus
Prepare for Reading and Parsing
Enrich Corpus
Preprocess Corpus
Term Frequencies & Keyword Extraction
Transformations
Visualization & Analysis

This step is optional: Follow along or just watch

Text Data Mining Procedures Covered in this Section:

Prepare for Reading and Parsing
Enrich Corpus
Preprocess Corpus

Python Script Using the Following Libraries:

Click the image below to launch Google Colaboratory

When this segment is completed, you will be able to:

Identify the best uses of Voyant as a TDM tool
Work within the various "skins" of Voyant and change them out as needed
Edit the Voyant stop word list
Understand the difference between a corpora and a corpus and it's importance to your research methodology

Voyant home screen accepts uploads in a variety of languages: Arabic, Bosnian, Croation, Czech, English, French, Hebrew, Italian, Japanese, Portuguese, and Serbian; Auto-Detect is default and a variety of formats: TXT, HTML, XML, PDF, RTF, MS Word, ZIP
5 Voyant "Skins": the default are: Cirrus - word cloud Reading - text being analyzed Trends - top keywords visualized across 10 equal segments of text Summary - key points Contexts - keyword plus 5 words to either side
Available options show on mouse over of upper right of each skin	Visualization URL / Change Tool in this Skin / Options / Help Available skin view / Available skin view / Current skin view / Help
Editing the Stop Word List: Review the words showing in the word cloud to identify any you want to eliminate (you may repeat this several times during your analysis process) Choose the Options button in the Cirrus skin; Check that language is either Auto-Detect or the language you are analyzing Add your chosen stop words, one per line
Corpus: One or several texts saved as a continuous document in a single file will be analyzed as one continuous document
Corpora: Several texts saved individually in a single file will be analyzed as individual texts The example to the right is for an analysis of Tom Sawyer, Huck Finn, and The Prince and the Pauper as a corpora

Visualizations in Voyant

In this segment, you will learn to:

Create and explain visualization formats available in Voyant
Learn to export visualizations for use in presentations, websites, etc.

Go to voyant-tools.org and upload the file

victorian_transcribed_no_metadata

Adding to the Stop Word List

In the Cirrus skin, click on the Options button
Click on the Edit List button next to Stopwords
Let's add the following to the list: dear, mr mrs miss dowden
Click Save and then Confirm the word cloud will recompose
Note that some of the skins are interactive and have changed as well - the Summary and the Trends skin

Let's make a static version of the Cirrus word cloud:

Click on the Export URL tool icon
Choose "export a PNG image of this visualization"
Follow the instructions on the Export PNG window to save the image or to capture it for embedding in a web page.

Changing Skins: Identifying Collocates

Words pairs which occur frequently
In the Summary skin click on the Tools option
Scroll to Corpus Tools and choose Collocates

Examine Trends for Entertainment

Use the search bar in the Trends skin to search for specific terms (review terms to see which you want to truncate): N.B.: all terms must be in lower case
1. book
2. theat*
3. opera
4. music*

An Interactive Visualization of the Trends for Entertainment

To create an interactive visualization in Voyant:

Click on Export URL
Choose Export View (Tools and Data)
Chose HTML snippet and follow the instructions

Explore Lawrence Anthony's Ant Tools and download AntConc https://www.laurenceanthony.net/software.html
Phrase Concordances Allows you to search for a word or a phrase you are interested in from your corpus. It will show you the kind of patterns that it appears in. Identify expressions of love of written works Search for love Search for love* Advanced search love* within 10 words of book(s), essay(s), poem(s)
Concordance Plots Number of, and location of, search results within each document.
Clusters & N-Grams N-Grams are words of a particular number of characters (letters). No search terms accepted. Clusters are based on search terms and show words that cluster around the search term.
Word Lists & Collocates Set Word List Preferences Before Running Collocates Lemma = Base word (Token) and its inflections Stopword = Words to exclude from text data mining Collocate = Strength of association between search word and other words.
Keywords Lists top keywords in each document using variations of TF-IDF (Term Frequency/Inverse Document Frequency) Basically, what words are in each document that defines that document as unique when compared to the words in the entire corpus. Keyness = Keyword Strength First Reset AntConc using File/Clear all Tools and Files
Second, load the letters by Elizabeth Dickinson West Dowden. File\Clear All Files FIle\Open FIle(s) and select all files beginning with DowdenElizabethDickinsonWest. Start

Introduction to Text Data Mining: Armstrong Browning Library's Victorian Collection: Home

Workshop Materials

Director of the Liaison Program Research & Engagement

Curator, Armstrong Browning Library

Workshop Procedures

When this segment is completed, you will be able to:

Identify the best uses of Voyant as a TDM tool

Work within the various "skins" of Voyant and change them out as needed

Edit the Voyant stop word list

Understand the difference between a corpora and a corpus and it's importance to your research methodology

Voyant home screen accepts uploads in a variety of languages:

Arabic, Bosnian, Croation, Czech, English, French, Hebrew, Italian, Japanese, Portuguese, and Serbian;

Auto-Detect is default

and a variety of formats:

TXT, HTML, XML, PDF, RTF, MS Word, ZIP

5 Voyant "Skins": the default are:

Cirrus - word cloud

Reading - text being analyzed

Trends - top keywords visualized across 10 equal segments of text

Summary - key points

Contexts - keyword plus 5 words to either side

Available options show on mouse over of upper right of each skin

Visualization URL / Change Tool in this Skin / Options / Help

Available skin view / Available skin view / Current skin view / Help

Editing the Stop Word List:

Review the words showing in the word cloud to identify any you want to eliminate (you may repeat this several times during your analysis process)

Choose the Options button in the Cirrus skin;

Check that language is either Auto-Detect or the language you are analyzing

Add your chosen stop words, one per line

Corpus:

One or several texts saved as a continuous document in a single file will be analyzed as one continuous document

Corpora:

Several texts saved individually in a single file will be analyzed as individual texts

Visualizations in Voyant

In this segment, you will learn to:

Go to voyant-tools.org and upload the file

victorian_transcribed_no_metadata

Adding to the Stop Word List

In the Cirrus skin, click on the Options button

Click on the Edit List button next to Stopwords

Let's add the following to the list: dear, mr mrs miss dowden

Click Save and then Confirm the word cloud will recompose

Note that some of the skins are interactive and have changed as well - the Summary and the Trends skin

Let's make a static version of the Cirrus word cloud:

Click on the Export URL tool icon

Choose "export a PNG image of this visualization"

Follow the instructions on the Export PNG window to save the image or to capture it for embedding in a web page.

Changing Skins: Identifying Collocates

Words pairs which occur frequently

In the Summary skin click on the Tools option

Scroll to Corpus Tools and choose Collocates

Examine Trends for Entertainment

Use the search bar in the Trends skin to search for specific terms (review terms to see which you want to truncate): N.B.: all terms must be in lower case

An Interactive Visualization of the Trends for Entertainment

To create an interactive visualization in Voyant:

Click on Export URL

Choose Export View (Tools and Data)

Chose HTML snippet and follow the instructions

University Libraries

Director of the Liaison Program
Research & Engagement