I'm looking for a tool that can generate a network graph that creates nodes for documents and topics where the edge pull is determined by topic weight. In other words, if I have a topic model of 40 topics run on a dataset of several thousand documents, I want to be able to display how strongly each document is pulled toward each topic in the network. I've seen it done in several instances, but I'm wondering what tools people are using to do this and to what degree of satisfaction... I've tried SNA visualization tools (Gephi, NodeXL, and yEd), but those generally require that the edge weight be internally computed.
What tools can be used to create topic model network graphs?
(7 posts) (5 voices)-
Posted 4 years ago Permalink
-
I think you could do this with networkx? As you run the topic model, you can build the graph directly with weighted edges, or any other number of edge attributes.
Posted 4 years ago Permalink -
Replying to @parezcoydigo's post:
That tool looks fantastic because of its flexibility and because it can be worked right into the running of the model. Unfortunately, at this point I don't have the Python scripting ability to really use it right away. Do you know of something with a GUI interface with the same flexibility?
Posted 4 years ago Permalink -
Hi Lisa,
I've written about this sort of thing on my blog a few times - http://electricarchaeologist.wordpress.com/
Take your topic modeling composition data. Create a spreadsheet where you have three columns, source, target, and weight. Put your docs and topics under source and target as appropriate, and then the percentage composition under weight. Save as a csv file.
Then, in Gephi, create a new project. Click on 'data laboratory'. Click on 'edges' under 'data table'. Click 'import spreadsheet'. Navigate to your csv file. Make sure the 'as table' is set to edges table. click next, click finish.
Then, go back to the 'overview' pane, and down the left hand side under layout you can select different algorithms that'll take the edge weight into account.
...is that the kind of thing you had in mind? You can also include a 'type' column in your csv file, with 'directed' or 'undirected' as appropriate.
Posted 4 years ago Permalink -
Shawn,
Yes! That's what I was looking for. I'm sorry that I somehow missed it on your blog, but I'm grateful that you took the time to explain it here. For some reason I couldn't wrap my head around how the .csv file needed to be formatted to get it the way I wanted it in Gephi. I haven't tried it yet, but I'm about to. Thank you for the generous reply!
-LisaPosted 4 years ago Permalink -
Do you really mean that you want nodes for documents and topics, a <dfn>bimodal graph</dfn>? In that case your graph would have a small number of nodes (the topic nodes) with high centrality. And then thousands of small nodes (the document nodes) with low centrality. If this is the case, how are you calculating the topic weight for a document?
It seems to make more sense to me to have nodes for just documents, and edges between documents that share a topic; a <dfn>multigraph</dfn>. Then the greater the number of edges between two nodes, the closer they are in topic. Or alternatively, you could define edge to be a function of the number of topics two documents have in common, which basically amounts to the same thing but alleviates the requirement to be able to represent multigraphs.
As for tools to visualise this, here's some Perl which creates a GraphML from a list of documents titled A, B, C, D, E, F, G, and H which each cover one or more topics, 1, 2, 3, 4, 5, 6, or 7:
#!/usr/bin/perl use strict; use Graph::Easy; my $graph = Graph::Easy->new; my $topics = {}; for (<DATA>) { my ($title, $topic) = split /,/; my $document = $graph->add_node($title); $graph->add_edge_once($document, $_) foreach (@{ $topics->{$topic} }); push @{ $topics->{$topic} }, $title or $topics->{$topic} = [$title]; } print $graph->as_graphml; __END__ A,2 A,4 B,1 B,2 B,6 C,2 C,3 D,1 D,2 D,5 D,7 E,2 E,3 F,1 F,2 F,6 G,1 G,5 G,6 G,7 H,1 H,2 H,4 H,7
I tried importing the output of this in Gephi and it looked basically correct.
By the way, when you say "topic model", are your topics just keywords? Or are you talking about vectors of word frequencies?
Posted 4 years ago Permalink -
Hi boys,
I have many 'topics model' create with Mallet's library, of this type:
TOPIC 1
school 0.3
teacher 0.2
science 0.08
mathematics 0.07
matter 0.05
student 0.03I want to generate a network, where each topic is a node.
I tried to use Gephi, but I do not know how to import all topics into csv file.
I gently ask if you can help.
Thanks in advance...Gaetano
Posted 4 years ago Permalink
Reply
You must log in to post.