I am starting a new project wherein I want to take a closer look at narratives both in terms of topics as well as in terms of morphologies. For the moment, I have narrowed my initial exploration to two collections of treasure legends, one drawn from oral sources (and transcribed by folklorists) and one drawn from materials found on on-line forums. Both collections are, purposefully as I take baby steps here, small: each is only sixteen texts. The texts range in size from 153 to 1025 words for the oral collection and 155-3081 words for the web collection.
I am looking to create a bimodal graph for each of the collections which represents the relationship between the words used in each text, such that I can examine the relationship either between texts, based on words in common, or between words, based on texts in common.
What I need, I think, is a Python script or some other kind of code/application which will work through each collection of plain text files and generate a CSV or network file of some kind that will let me then work in Gephi or Sci2. I would especially like it if the script would allow me to feed in a stopword list of my choosing.