There are plenty of tutorials on the web that are useful for learning Gephi, but I've encountered a much steeper learning curve for the steps prior—such as extracting and preparing data from literary texts to be used with Gephi. In my example, I am interested in performing a network analysis of the social networks (both real and imaginary) in the works of Enrique Vila-Matas (I'd eventually like to expand this corpus to other authors). Acquiring the digital text and working with Gephi I can do (or the latter learn), but it's the very important intermediary steps of preparing the data (extraction of names and places to getting those in a form—database? spreadsheet?—analyzable by Gephi) that I need help with. Any good reading, tutorials, or other resources out there? Other recommendations? I'm working on a Mac and have a limited knowledge of Python, so any Mac-friendly software would be a big help. Thanks!
How does one prepare and use data for network analysis with Gephi?
(5 posts) (3 voices)-
Posted 4 years ago Permalink
-
Hey Josh,
Just talking to a friend of mine (Justin Joque at the spatial and numeric data library here at Michigan). One thing he mentioned was that Cytoscape might be a little more user- and spreadsheet-friendly...though one strength of Gephi is its ability to animate networks over time, if that's something you're looking for.More generally, he thought it sounded like you were needing to do some named entity extraction, and recommended the Python-based nltk.org for that. They also host a great resource--Natural Language Processing with Python--at nltk.org/book.
Hope that helps!
Posted 4 years ago Permalink -
Oh great, another Python book to read! Just kidding. Thanks, Korey (and Justin)! NLTK looks like the way to go. Just glancing over the extracting information chapter, and from others who have suggested TEI, it's clear that I need to, in some way, go from unstructured to structured text before I do anything else. And now back to my irregularly scheduled Python reading/learning.
Posted 4 years ago Permalink -
Replying to @Josh Honn's post:
Depending on what you are doing, and depending on how much data we are talking about, you may not need to go the TEI route. If you are on a Mac, I've written about setting up NLTK here: http://johnlaudun.org/20121230-macports-the-key-to-python-happiness/. The TL;DR version is here: http://johnlaudun.org/20121230-macports-for-nltk/.
Write back with more info and I'm sure more people will kick in with help.
Posted 4 years ago Permalink -
John: Thanks for the links! As for more on what I want to do, I'll try my best to summarize it (though the idea is still in early formation/I don't necessarily know everything yet I want to do, look for, etc.). Also, the corpus, as it stands, is only 4 novels at around 800 pages, but this will increase as this author's novels are translated into English (and, in the future, I'd also like to add works from similar authors to the corpus). My ideas:
1. Extract names, places, and titles of works from the text
2. Perform some kind of frequency rankings (within works and across corpus)
3. Visualize connections: e.g. authors & their works mentioned in relation to each other, which authors are most mentioned in relation to each other, groupings of real and imaginary authors (something important and often ambiguous in these texts), and, eventually, locate overlaps in the network of this network with similar networks created by other authors.In other words, these novels, especially taken as a whole, embody a fairly vast historical (and sometimes fictional) network of literary figures and their works and I'd like to extract these from the corpus in order to (a) better access and analyze them, while at the same time exposing this labyrinthine network that might get lost (in its overwhelming totality) within each narrative and even more easily across multiple "distinct" works. Other questions: can narrative arrive just from, say, a network of authors? What makes metafiction more than that + annotation? etc.
Again, these are just my initial ideas, and this is an intellectual side project but one that has implications in my daily work as a librarian working in the digital humanities; often times we need a project of our own to really acquire and retain substantive new skills. For more insight into Vila-Matas' metafictions, here's a good essay.
Posted 4 years ago Permalink
Reply
You must log in to post.