It seems they did some comprehensive testing to make sure there are no bugs. Wha...

lscharen · on April 27, 2011

Understand their code.

rogerbraun · on April 27, 2011

They understood it enough to:

  - implement it, although it is extremely complicated
  - test it
  - fix a bug in the algorithm
  - use it to make part of their software 100x faster

lscharen · on April 28, 2011

I think I came away with almost the opposite impression from reading the article.

From my perspective, they did not implement it. Mark Miller and Robert Muir were able to implement the algorithm for the N=1 case, but were stuck until they found the existing Moman code. They did not implement their own code using Moman as a reference implementation, but just used Moman's code to generate the required tables.

From the article, "Not really understanding the Python code, and also neither the paper, we desperately tried to write our own Python code to tap into the various functions embedded in Moman's code". This sounds to me like they did not have a good understanding of the algorithms they were trying to implement.

They did do a fair bit of testing and did uncover a bug in the Moman code base, but, again, they did not fix this bug themselves, but appealed to Jean-Phillipe who then quickly fixed his code -- in effect, they were relying on a third-party.

And, yes, they did apply the end result to make fuzzy searching a lot faster, which is a good and practical end. It took a lot of effort on the Lucene's team part to get this feature implemented, but that does not mean that anyone has a good understanding of the end result.

In short, I don't get the impression that anyone on the Lucene team could give a 1 hour talk on implementing the Klaus Schulz and Stoyan Mihov paper to a formal language and automata audience.