Hi Everyone - I am using the TopicModeling tool / Mallet to process a large data corpus (~ 40000 articles) and I am receiving the following errors on output, with the end result of the CVS and DOC directory files *not* being created, eg, these directories are empty.
Has anybody else experienced this issue with large amounts of data and Mallet, or perhaps would be able to give me an idea of what is happening - it would appear that the index array is simply growing too large for memory, causing the exception to be thrown, but I am unsure as I am new to Mallet/TopicModeling tool.
--------
Total time: 26 minutes 52 seconds
java.lang.ArrayIndexOutOfBoundsException: 1
at cc.mallet.topics.gui.CsvBuilder.dtLine2Csv(CsvBuilder.java:159)
at cc.mallet.topics.gui.CsvBuilder.buildCsv2(CsvBuilder.java:199)
at cc.mallet.topics.gui.CsvBuilder.createCsvFiles(CsvBuilder.java:233)
at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.outputCsvFiles(TopicModelingTool.java:625)
at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.runMallet(TopicModelingTool.java:581)
at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener$1.run(TopicModelingTool.java:446)
java.lang.ArrayIndexOutOfBoundsException: 1
at cc.mallet.topics.gui.HtmlBuilder.buildHtml2(HtmlBuilder.java:178)
at cc.mallet.topics.gui.HtmlBuilder.createHtmlFiles(HtmlBuilder.java:293)
at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.outputCsvFiles(TopicModelingTool.java:629)
at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener.runMallet(TopicModelingTool.java:581)
at cc.mallet.topics.gui.TopicModelingTool$TrainButtonListener$1.run(TopicModelingTool.java:446)
Mallet Output files written in C:\Users\sjturner1\Desktop\Cuba Output ---> C:\Users\sjturner1\Desktop\Cuba Output\output_state.gz , C:\Users\sjturner1\Desktop\Cuba Output\output_topic_keys
Csv Output files written in C:\Users\sjturner1\Desktop\Cuba Output\output_csv
Html Output files written in C:\Users\sjturner1\Desktop\Cuba Output\output_html