This is one piece in our series covering our ediscovery chapter of a legal informatics textbook. In this series, we’re covering the ediscovery basics, including the history of the EDRM, core technical ediscovery concepts like cloud computing, the technologies powering ediscovery, as well as the future of ediscovery.
Today we’re diving into a modern technology powering ediscovery: encryption.
Encryption is the process of converting information into a form that is unreadable by anyone not authorized to do so. As one would imagine, historians have traced its origins as far back as secrets go, with the first recorded use nearly 3500 years ago in ancient Mesopotamia to encrypt the recipe for a particular pottery glaze. It is only in the last few decades, however, that truly secure encryption technology has been accessible to the public, not just government and military organizations.
Modern ediscovery would be impossibly burdensome without encryption. Encryption allows organizations to securely share highly-sensitive internal information with outside counsel for review, even from remote locations. It also makes it possible to send a relevant subset of that sensitive information—indeed, often the most sensitive information, given its relevance in litigation—to opposing counsel with reasonable certainty that it will not be intercepted or stolen along the way.
When it comes to applications hosted remotely, it is common to distinguish data that is encrypted in transit from data encrypted at rest. The distinction matters because—unlike, say, paper—digital information is often not transmitted on the same medium in which it is stored. Data may therefore be encrypted for transmission between two machines (e.g., from a website on a cloud-based server to a web browser on a remote client computer), regardless of whether it is encrypted when stored at either the source or destination. Ideally, you want your data encrypted both where it is stored and while it is being transmitted; in other words, both at rest and in transit.
Encryption in transit is essential to protect against so-called “man-in-the-middle” attacks, such as eavesdropping. Without encryption in transit, someone sitting near you in the same cafe or on the same wired network could in theory listen in on your communication—and expose sensitive documents. However, when the data is encrypted, the interloper has a much harder task without the proper decryption keys.
Encryption at rest is just as important. Even though your data may be stored “in the cloud,” it still physically resides somewhere, usually in a hosting provider’s data center. Encryption at rest can be seen as the final step in providing physical security: even if an attacker managed to bypass the surveillance equipment, armed guards, authentication mechanisms, and secured environments to find the server hosting your cloud data, she would be unable to do anything sensible with it without the absent decryption keys. Encryption at rest also protects against accidental data exposure, for instance if your provider’s virtualization layer does not adequately isolate your data from other tenants.
There are two primary types of encryption, symmetric (also known as private key) and asymmetric (also known as public key). In both types, the original data, or plaintext, is encoded using a specific algorithm into ciphertext, which is then incredibly difficult to decode without the encryption key. In symmetric encryption, the same key is used to both encrypt and decrypt the data, so the ciphertext and key are both transmitted to the recipient (albeit separately), and the recipient uses both to convert the data back to plaintext. In asymmetric encryption, the encryption and decryption keys are different: the public encryption key is published for anyone to use to create ciphertext, which can then only be turned back into plaintext by the holder of the secret decryption key.
Asymmetric encryption has the advantage of not requiring that the sender and recipient exchange secret keys, but symmetric encryption is more secure and faster in both encrypting and decrypting data. That is why, on the internet, both are used together to maximize convenience and speed. When you point your browser to a secure website (i.e., one with a URL that begins with HTTPS), your browser and the website’s server first use the more convenient asymmetric encryption to authenticate that the website is indeed the one you intended to visit and to share the secret keys for subsequent communication via symmetric encryption. After authentication, the rest of the session is conducted via the more efficient symmetric encryption.
In our next post, we’ll cover machine learning and its application in ediscovery.