Advertisement

Computer scientists admit 40-year-old algorithm is perfect, cannot be improved upon

Decades-old formula proves to be as good as it gets in terms of efficiency

At the upcoming ACM Symposium on Theory of Computing, MIT researchers are expected to report that the Wagner-Fischer algorithm cannot be improved upon because it’s as good and efficient as it gets. 

A quad genome
This is pretty big news because computer scientists are constantly improving upon existing algorithms and formulas to find more efficient methods. The Wagner-Fischer algorithm, which was published some 40 years ago, has been worked on time and again by some of the best scientists, researchers, and mathematicians the world over, but to no avail. That’s because over and over, the algorithm has proven to be as good as it gets. 

The Wagner-Fischer algorithm is a formula used to determine how much two sequences of symbols has in common; this is also referred to as the “edit distance” between the symbols. Should a widely held theory about computational complexity prove correct, then the problem of measuring things like the difference between two genomes, texts, speech samples—basically anything that can be represented as a string of symbols—cannot be solved more efficiently than the way in which this algorithm does. 

“This edit distance is something that I've been trying to get better algorithms for since I was a graduate student, in the mid-'90s,” says Piotr Indyk, a professor of computer science and engineering at MIT and a co-author of the STOC paper. “I certainly spent lots of late nights on that—without any progress whatsoever. So at least now there's a feeling of closure. The problem can be put to sleep.”

What’s interesting, Indyk notes, is that even though the paper proving the algorithm’s perfection has not even been published yet, the announcement of its impending publication has already spawned to follow-up papers that apply its approach to related problems. 

“There is a technical aspect of this paper, a certain gadget construction, that turns out to be very useful for other purposes as well,” Indyk says.

So, the obvious question is—what makes this algorithm so perfect? To begin, it’s important to note that edit distance is the minimum number of edits (deletions, insertions, and substitutions) necessary to turn one string into another. The Wagner-Fischer algorithm assigns each symbol of one string to a column in a giant grid, and each symbol of the other string to a row. Then, beginning in the upper left-hand corner and moving diagonally across the grid, it fills in each square with the number of edits required to turn the string that ends with the corresponding column into the string ending with the corresponding row. 

Now, to measure algorithmic efficiency, computer scientists approach it as computation time relative to the number of elements the algorithm manipulates. Given the fact that the Wagner-Fischer algorithm must fill in every square of its grid, its running time is in direct proportion to the product of the lengths of the two strings it’s considering; that is, double the lengths of the strings, and the running time quadruples. Using computer jargon, the algorithm runs in quadratic time. 

A layman reading this explanation could point out that, on paper, it doesn’t sound like a very efficient algorithm. The benefit though, is that quadratic time is much better than exponential time. Per the latter, running time would be proportional to N2 , wherein N is the number of elements the algorithm manipulates. So, if a quadratic-time-algorithm-based machine were to take, say, a hundredth of a second to process 100 elements, an exponential-time-algorithm-based machine would take 100 quintillion years. 

Taking a second to speak to theoretical computer science, experts in this field are particularly concerned with a type of problem referred to as NP-complete. Specifically, they believe that NP-complete problems take exponential time to solve; the issue, though, is that it’s a theory right now, and has yet to be proven. In the aforementioned paper, Indyk and his student, Artūrs Bačkurs, demonstrate that if it’s possible to solve the edit-distance problem in less-than-quadratic time, then it’s possible to solve an NP-complete problem in less-than-exponential time. With that being stated, most experts in the computational-complexity community will take it as strong enough evidence (certainly the strongest to date) that no subquadratic solution to the edit-distance problem actually exists; therefore, the Wagner-Fischer algorithm cannot be improved upon. 

“This is really nice work,” says Barna Saha, an assistant professor of computer science at the University of Massachusetts atAmherst. “There are lots of people who have been working on this problem, because it has a big practical impact. But they won't keep trying to develop a subquadratic algorithm, because that seems very unlikely to happen, given the result of this paper.”

And in regards to the proof that Indyk and Bačkurs paper is based upon, that NP-complete problems cannot be solved in sub-exponential time: “It's a very widely believed conjecture,” Saha explains. “And there are many other results in this low-polynomial-time complexity domain that rely on this conjecture.”

Read their paper: “Edit Distance Cannot Be Computed in Strongly Subquadratic Time (unless SETH is false)

Via MIT

Advertisement



Learn more about MIT

Leave a Reply