Led by Seny Kamara, Microsoft experts have been able to successfully obtain a substantial amount of data from health records stored in CryptDB (PDF), a database technology that uses multiple layers of encryption, allowing users to search through it without exposing the content.
Developed at MIT, CryptDB functions as an addition to a standard Structured Query Language (SQL) database and allows applications to interact with encrypted data. Similar to an onion with multiple layers, CryptDB combines old forms of encryption, some of which allow calculations to take place. Covering the database in different layers allows users to peel away to get to the encryption that allows mathematical operations, such as analysis algorithms. All points of the data are encrypted with different keys.
While CryptDB was designed to protect against compromises to the database server application, it was not created to shield against an attack on the claims being used to access the information. It does partially constrain this type of attack by limiting the breach of data accessible by the keys that may be compromised. In order to break the encryption, there is also data required for the SQL server to process. Intercepting queries sent to the server could reveal data, depending on how they are structured.
To crack the encrypted database, Microsoft experts went after the weakest links in CryptDB: the Order Preserving Encryption (OPE) and Deterministic Encryption (DET or DTE) structures. OPE makes it possible for SQL queries such as “ORDER BY” to perform, while DTE allows databases to search for matching values “by deterministically generating the same cipher text for the same plaintext,” as described by the developers in their research paper. “This encryption layer (DTE) allows the server to perform equality checks, which means it can perform selects with equality predicates, equality joins, GROUP BY, COUNT, DISTINCT, etc.” These two schemes are the most prone to data leakage in CryptDB.
Kamara, Muhammad Naveed of the University of Illinois-Urbana Champaign, and Charles Wright of Portland State University used one simple, old trick to crack the encryption: frequency analysis. Using a similar data source to that of the targeted content, the experts analyzed the rate of recurrence of characters within the text and matched that against the regularity of data within DTE-encrypted columns of data. They also used three new attacks of their own, drawn from centuries-old frequency analysis, as written about in the research paper:
Lp-optimization: is a new family of attacks we introduce that decrypts DTE-encrypted columns. The family is parameterized by the lp norms [an analysis of the expected difference between values] and is based on combinatorial optimization techniques.
Sorting attack: is an attack that decrypts OPE-encrypted columns. This folklore attack is very simple but, as we show, very powerful in practice. It is applicable to columns that are “dense” in the sense that every element of the message space appears in the encrypted column. While this may seem like a relatively strong assumption, we show that it holds for many real-world datasets.
Cumulative attack: is a new attack we introduce that decrypts OPE-encrypted columns. This attack is applicable even to low-density columns and also makes use of combinatorial optimization techniques.
To test these attacks, the researchers used real patient data from the National In-patient Sample (NIS) database of the Healthcare Cost and Utilization Project (HCUP), encrypting data using the OPE and DTE. The frequency analysis and Lp attack revealed “mortality risk and patient death” attributes “for 100 percent of the patients for at least 99 percent of the 200 largest hospitals,” and100 percent of disease severity data for 51 percent of the 200 hospitals in the data set. Aside from this information, Microsoft experts were able to obtain the admission month, mortality risk, and admission type for the majority of patients in the 200 hospitals.
Former CryptDB developer, Raluca Ada Popa, responded by saying that the OPE and DTE schemes were intended for “high entropy” values, where the data wouldn’t reveal much, rather than information such as percentages of mortality in large sets of patients. “Everyone I was in touch with that used CryptDB was careful about the use of OPE,” he noted.
Source: ArsTechnica
Learn more about Electronic Products Magazine