I've installed MALLET 2.0.7 on my Windows 7 machine, installed Java & ant, and set classpath variables. Built MALLET with ant and saw the build successful message. Then I imported my data into mallet format using bin\mallet import-file --input data/mytext.txt --keep-sequence --stoplist-file stoplists/en.txt --output data/mytext.mallet
The problem is, when I go to train topics, it doesn't appear to be working. So, when I enter bin\mallet train-topics --input data/mytext.mallet --output-topic-keys data/mytext/mytext.keys
I usually get 1000 runs of 10 groups of words. I'd hesitate to call them "topics" because, intuitively, they don't show any actual correlation from my text, and they don't repeat when I re-run it (I get totally new groupings).
Also, the second number, what I believe is supposed to be the Dirichlet parameter, doesn't change at all -- each visible iteration of the program, the parameter for each topic is 5. Except every now and then when I add more initial variables (---num-topics 25, --doc-topics-threshold 0.1, and others) the parameter starts at 5, starts going up into the 80s (varying among the topics! so some in 10s, 20s, etc), and then blows up with a Java exception. Or sometimes it stays stable at 2, and the topics still don't make sense as topics!
A colleague of mine has this running on the same data, only he installed on a Mac and got topics that made sense, but he installed a while ago and has no idea what's going wrong with my version.
Any idea what I'm doing wrong?
Edited to add: uncompressed & installed version 2.0.6, built with ant, and did clean imports of my data, and I'm seeing the same thing.