Researchers from MIT have developed the precursor of what can essentially be thought of as an artificial intelligence’s immune system. Presenting their findings at the Association for Computing Machinery’s Programming Language Design and Implementation conference this month, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have created a system that autonomously repairs software bugs by importing functionality from other program in a form built-in “digital organ transfer,” if you will.
In seeking the necessary snippet of code to repair the recipient, the system, dubbed CodePhage, reaches across multiple programming language without requiring access to the prospective donor program’s source code, a feat which could potentially save developers thousands of hours on debugging. Manually proof-reading code to identify exactly which line needs to be edited is very time consuming, and often wastes company resources that can otherwise be spent on future R&D.
On contrary, CodePhage evaluates the execution of the application its borrowing from, categorizing the types of security checks performed. This process then allows CodePhage to import the checks to provide an additional layer of analysis that will guarantee the bug is repaired — essentially, a patchwork of solutions across broad sources.
“We have tons of source code available in open-source repositories, millions of projects, and a lot of these projects implement similar specifications,” explains Sidiroglou-Douskos, a CSAIL research scientist and lead developer of CodePhage. “Even though that might not be the core functionality of the program, they frequently have subcomponents that share functionality across a large number of projects.”
How does it work?
CodePhage begins by scouring its large database of applications for one that can accept a series of inputs related to the bug. Once this connection is established, it feeds the donor program a series of safe inputs (non-crashing inputs) in order to track the sequence of operations the donor undergoes in running the input. As this is occurring, CodePhage records the operations using a symbolic expression, a string of symbols that defines the logical constraints imposed by the operation.
For example, CodePhage may compare the input’s size to some threshold, and if it falls below it, the system will add a string to its growing bank of symbolic expressions representing the condition of being below that threshold. Next, CodePhage feeds the donor a crash-inducing variable while simultaneously recording the associated symbolic expression leading up to the crash. This divergence between the functioning and non-functioning input is interpreted as a possible security check missing from the recipient software.
CodePhage then begins analyzing the recipient, looking for locations at which the input meets most of the constraints described by the new symbolic expression; the order of operations performed by the donor has no bearing on analysis because the symbolic expression describes the post-processing state of the data, not the processing itself. In other words, the end result is all that matters. Any residual discrepancies in the input’s effect on the recipient software are then translated into the same programming language of the recipient software and inserted into its source.
Next, the system re-runs the recipient with the same crash-causing input; if no crash occurs, then the bug is resolved. But if not, CodePhage will move on to the next potential location and begins building a new symbolic expression before another discrepancy arises, repeating the process until the bug is removed.
Its legacy
CodePhage may be the forerunner of self-repairing, bug-free software — or, from the futurist perspective — the foundation upon which we build the self-learning software that encompasses an AI. Nevertheless, sound grand things are at work that could drastically reduce the amount of time software developers spend testing code.
“The longer-term vision is that you never have to write a piece of code that somebody else has written before,” states Martin Rinard, MIT professor of computer science and engineering and coauthor of the research, “the system finds that piece of code and automatically puts it together with whatever pieces of code you need to make your program work.”
Learn more about Electronic Products Magazine