Digital Humanities Questions & Answers » Topic: Text mining tools that work with RTL texts?

Digital Humanities Questions & Answers » Topic: Text mining tools that work with RTL texts? http://digitalhumanities.org/answers/topic/text-mining-tools-that-work-with-rtl-texts Digital Humanities Questions & Answers » Topic: Text mining tools that work with RTL texts? en-US Sun, 08 Nov 2015 21:45:51 +0000 http://bbpress.org/?v=1.0.2 <![CDATA[Search]]> q http://digitalhumanities.org/answers/search.php slh@ens-lyon.fr on "Text mining tools that work with RTL texts?" http://digitalhumanities.org/answers/topic/text-mining-tools-that-work-with-rtl-texts#post-2172 Sat, 03 May 2014 07:52:10 +0000 slh@ens-lyon.fr 2172@http://digitalhumanities.org/answers/ Replying to @sinai.rusinek@gmail.com's <a href="http://digitalhumanities.org/answers/topic/text-mining-tools-that-work-with-rtl-texts#post-1912">post</a>: Hi Sinai, I suggest you give a try to <a href="http://textometrie.ens-lyon.fr/?lang=en">TXM</a>. We haven't designed the GUI with RTL writing systems in mind but UTF-8 RTL encoding appears to be globally well supported by default technology, with a notable exception concerning concordance contexts that are interchanged from left to right. The current state of the software and possible evolutions concerning writing systems is described here (in French): <a href="https://groupes.renater.fr/wiki/txm-info/public/specs_langues?s=%C3%A9criture" rel="nofollow">https://groupes.renater.fr/wiki/txm-info/public/specs_langues?s=%C3%A9criture</a>. If there is sufficient interest, we could make things evolve more rapidly with respect to RTL. Mind that GUI management of RTL display is independant of the word segmentation/tokenization process of raw text which can also have a deep impact on usability of textual analysis software. Even if one can alaways use software on character strings, it is much better to use them on words or lexical items. For TXM we begin to address semitic language word tokenization with Arabic. See here the current state: <a href="https://groupes.renater.fr/wiki/txm-info/public/specs_import_annotation_lexicale_auto#etat_de_l_art_pour_l_arabe" rel="nofollow">https://groupes.renater.fr/wiki/txm-info/public/specs_import_annotation_lexicale_auto#etat_de_l_art_pour_l_arabe</a> If there is sufficient interest, we could include Hebrew in our roadmap. For example wih the MorphTagger software: <a href="http://www.cs.technion.ac.il/~barhaim/MorphTagger" rel="nofollow">http://www.cs.technion.ac.il/~barhaim/MorphTagger</a>. sinai.rusinek@gmail.com on "Text mining tools that work with RTL texts?" http://digitalhumanities.org/answers/topic/text-mining-tools-that-work-with-rtl-texts#post-1912 Thu, 07 Mar 2013 09:05:15 +0000 sinai.rusinek@gmail.com 1912@http://digitalhumanities.org/answers/ Tried AntConc with Unicode format Hebrew Texts. It works, but the results come out left-to-right. Any recommendations on how to solve this, or tools more adapted to it?