Skip to Main Content

Data & Digital Scholarship Tutorials

Workshop Description

Data Viz: Network Visualizations Using Gephi

This workshop will cover the fundamentals of creating networks using Gephi

1. Introduction to networks

2. Importing network data

3. Preparing spreadsheet data for Gephi

4. Modify nodes and edges to visualize networks

5. Measure network attributes, such as degree, diameter, betweeness, and modularity

6. Symbolize visualization using labels and colors

7. Export to sharable image

Gephi SNA Workshop

Tyler Prochnow MEd
HHPR Doctoral Student
Public Health Research Assistant
tprochnow.com

Joshua Been
Digital Scholarship Librarian
https://bit.ly/baylords

(1) Take Workshops, (2) Pass Quizzes, (3) Become a Data Scholar

Interested in becoming a Data Scholar?

 

Takes only six workshops!

Pick any Two Categories Below, Take at Least Two Workshops from Each of Those Categories: (Total of 4)

 

  • Data Visualization
  • Text Data Mining
  • Python Data Scripting
AND
Pick any One Category Below, Take at Least Two Workshops from That Category:

 

(Total of 2)

  • Research Data Management
  • Finding Secondary Data

 

* Workshops are offered every semester. No need to fit all 6 in one semester. Become a Data Scholar at your own pace.

* Becoming a Data Scholar is not mandatory. Take any workshop you like.


https://gephi.org/


If you receive an error related to Cannot find Java 1.8 or higher, head to https://java.com/en/download/manual.jsp. One common cause of this error on Windows computers is the 32-bit version installed instead of the 64-bit. Windows users, make sure to download and install the 64-bit version.

Contents

  • les-mis.gexf
    • Co-occurance network of characters in Les Miserable
  • stateofunion.gexf
    • State of the Union Addresses containing keyword similarity scores
  • attrBGC.xlsx
    • Nodes (attributes) of social network dataset
  • edgelistBGC.xlsx
    • Edges of social network dataset

Launch Gephi and Open les-mis.gexf

 

  • 77 characters
  • 254 pairs of characters in at least one scene together

File/Open and select les-mis.gexf

Explore Overview Tab

Overview

  • Pan and Zoom
  • Appearance Window
  • Layout Window
  • Bottom Tools
  • Context Window
  • Filter & Statistics Window

 

 

Explore Data Laboratory Tab

 

  • Copy Character field into Label

Data Laboratory

Adjust Node Color to Represent Gender Attribute
Label Nodes by Attribute

Arrange Nodes

  • Force Atlas 2
    • Edges approximately same length
    • Minimize edge crossings
  • Noverlap
  • Label Adjust

Gephi Statistics:

  • Degree (number of edge connections per node)

Gephi Statistics:

  • Average Weighted Degree: Weight (scenes together) * Number of Connections

Gephi Statistics:

  • Network Diameter: Longest graph distance between any two nodes on the network
  • Betweeness Centrality: How often a node appears on the shortest path between other node pairs. (Signifies how important or central a node is in the network.)

Head to Data Laboratory (new fields)

  • In Data Laboratory
  • Select Nodes Table
Adjust Node Size Proportionately by Betweeness Centrality
Rerun Noverlap and Label Adjust
Take a quick screenshot

 

Document Similarity

  • Cosine Similarity was calculated for all pairs of State of the Union Addresses from 1796 - 2006.
  • Score 0.0 to 1.0.

 

Source Document: State of the Union Addresses (1790-2006) by United States. Presidents

 

Python Script via Google Colab

TF-IDF: Term Frequency / Inverse Document Frequency

Cosine Similarity: Similarity of the documents based on the TF-IDF values of all terms in the documents.

By default, every node is connected to every other node as the similarity score between all pairs of Addresses were calculated.
Filter pairs of Addresses that have a similarity score of at least 0.3.
Minimize edge thickness

Filter by Degree

  1. Run Average Degree statistic
  2. Expand Edge Weight under Queries in Filter tab
  3. Drag Attributes/Range/Degree to Drag Subfilter Here
  4. Set minimum degree to 2

Size nodes by Betweeness Centrality

 

  • Network Diameter: Longest graph distance between any two nodes on the network
  • Betweeness Centrality: How often a node appears on the shortest path between other node pairs. (Signifies how important or central a node is in the network.)

Layout:

  • Forced Atlas 2
  • Noverlap
 

Run Modularity Statistical tool to identify communities within our data.

  • A community is a collection of nodes that have a concentration of links as compared to the rest of the dataset.
  • Modularity range is from -1 to 1. A positive modularity value indicates well defined communities.
  • Should see 4 communities
Set Node Partition color to Modularity Class
Click Preview tab
Click Refresh  

Labels

  • Turn labels on
  • Turn off proportional labels
  • Font size to 18

Click Refresh

Click Export for image

 

 

University Libraries

One Bear Place #97148
Waco, TX 76798-7148

(254) 710-6702