This question has been floating on my mind this week during conversations about the future of Juxta. The traditional diff algorithm would look at strings of text linearly. So if you have two series, A) a b c d f g h j q z, and B) a b c d e f g i j k r x y z, the diff algorithm recognizes that they share C) a b c d f g j z. In a series of versions of a given text, where an author and his collaborators have moved/transposed several blocks of text from one version to the next, the diff algorithm fails to recognize these moves, even though internal to those blocks the diff still applies. So the question is: Given a series X) a b c [d f g h j] q ... n, and a series Y) a [d f i h j] b c q ... n, how can we get the computer to recognize that Z) [d f h j] corresponds to each other and has moved? If I could solve this puzzle, we could teach Juxta to recognize moves automatically, or at least, suggest possible matches to make the scholar's work easier... I think.
One of the problems is that I don't know what kind of problem this is in terms of computer science or math: Is it a Matching problem? A String searching algorithm problem? I'm randomly walking in the Math and Computer Science department next week, but I figure I'd run it by the DH community just in case.