<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.2" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>Digital Humanities Questions &#38; Answers &#187; Topic: Text mining tools that work with RTL texts?</title>
		<link>http://digitalhumanities.org/answers/topic/text-mining-tools-that-work-with-rtl-texts</link>
		<description>Digital Humanities Questions &amp; Answers &#187; Topic: Text mining tools that work with RTL texts?</description>
		<language>en-US</language>
		<pubDate>Sun, 08 Nov 2015 21:45:51 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.2</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://digitalhumanities.org/answers/search.php</link>
		</textInput>
		<atom:link href="/rss/topic/text-mining-tools-that-work-with-rtl-texts/index.xml" rel="self" type="application/rss+xml" />

		<item>
			 
				<title>slh@ens-lyon.fr on "Text mining tools that work with RTL texts?"</title>
						<link>http://digitalhumanities.org/answers/topic/text-mining-tools-that-work-with-rtl-texts#post-2172</link>
			<pubDate>Sat, 03 May 2014 07:52:10 +0000</pubDate>
			<dc:creator>slh@ens-lyon.fr</dc:creator>
			<guid isPermaLink="false">2172@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;&#60;em&#62;Replying to @sinai.rusinek@gmail.com's &#60;a href=&#34;http://digitalhumanities.org/answers/topic/text-mining-tools-that-work-with-rtl-texts#post-1912&#34;&#62;post&#60;/a&#62;:&#60;/em&#62;&#60;/p&#62;
&#60;p&#62;Hi Sinai,&#60;/p&#62;
&#60;p&#62;I suggest you give a try to &#60;a href=&#34;http://textometrie.ens-lyon.fr/?lang=en&#34;&#62;TXM&#60;/a&#62;.&#60;/p&#62;
&#60;p&#62;We haven't designed the GUI with RTL writing systems in mind but UTF-8 RTL encoding appears to be globally well supported by default technology, with a notable exception concerning concordance contexts that are interchanged from left to right.&#60;/p&#62;
&#60;p&#62;The current state of the software and possible evolutions concerning writing systems is described here (in French): &#60;a href=&#34;https://groupes.renater.fr/wiki/txm-info/public/specs_langues?s=%C3%A9criture&#34; rel=&#34;nofollow&#34;&#62;https://groupes.renater.fr/wiki/txm-info/public/specs_langues?s=%C3%A9criture&#60;/a&#62;.&#60;/p&#62;
&#60;p&#62;If there is sufficient interest, we could make things evolve more rapidly with respect to RTL.&#60;/p&#62;
&#60;p&#62;Mind that GUI management of RTL display is independant of the word segmentation/tokenization process of raw text which can also have a deep impact on usability of textual analysis software. Even if one can alaways use software on character strings, it is much better to use them on words or lexical items. For TXM we begin to address semitic language word tokenization with Arabic. See here the current state: &#60;a href=&#34;https://groupes.renater.fr/wiki/txm-info/public/specs_import_annotation_lexicale_auto#etat_de_l_art_pour_l_arabe&#34; rel=&#34;nofollow&#34;&#62;https://groupes.renater.fr/wiki/txm-info/public/specs_import_annotation_lexicale_auto#etat_de_l_art_pour_l_arabe&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;If there is sufficient interest, we could include Hebrew in our roadmap. For example wih the MorphTagger software: &#60;a href=&#34;http://www.cs.technion.ac.il/~barhaim/MorphTagger&#34; rel=&#34;nofollow&#34;&#62;http://www.cs.technion.ac.il/~barhaim/MorphTagger&#60;/a&#62;.
&#60;/p&#62;</description>
		</item>
		<item>
			 
				<title>sinai.rusinek@gmail.com on "Text mining tools that work with RTL texts?"</title>
						<link>http://digitalhumanities.org/answers/topic/text-mining-tools-that-work-with-rtl-texts#post-1912</link>
			<pubDate>Thu, 07 Mar 2013 09:05:15 +0000</pubDate>
			<dc:creator>sinai.rusinek@gmail.com</dc:creator>
			<guid isPermaLink="false">1912@http://digitalhumanities.org/answers/</guid>
			<description>&#60;p&#62;Tried AntConc with Unicode format Hebrew Texts. It works, but the results come out left-to-right. Any recommendations on how to solve this, or tools more adapted to it?
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
