Near Duplicates and Email Threading – So Simple It Should Be Standard

Of all of the many tools in our eDiscovery Analytics arsenal, one more complex than the next, near duplicates and email threading are two of the simplest tools to implement. So simple they should be the standard.

Near Duplicates Identification and Email Threading (ND and ET) are separate but complimentary processes, and are therefore, often run together. They are also available and relatively simple to implement in most standard eDiscovery tools. While there are some nuances with respect to how the technology is run on the backend, the end result is the same – expedited review by identifying near duplicate documents and the most complete email chains.


Benefits of Near Duplicates and Email Threading

Near duplicates identification and email threading is like having an insurance policy for consistency and quality control during document review.

Near duplicates processing involves the grouping of two or more documents that have a certain percentage of similarity of text within those documents. The key benefit of near duplicates processing is the quick identification of textually similar documents. On the other hand, email threading technology gathers and arranges related email discussions in chronological order.  The important benefit of email threading is that it identifies the most inclusive or complete email – the one that was last sent and contains all the prior exchanges or conversations in the chain. Applying email threading adds full context to what otherwise would be a disjointed set of email messages and responses.


A Simple Insurance Policy

For instance, a typical document review project may task several reviewers with coding sets of documents. In a non-ND/ET scenario it is likely that similar documents can be assigned to different reviewers who may code them differently. Likewise, different parts of an email thread might be coded differently by reviewers. In some cases even the same reviewer may code a document differently because the document showed up in an earlier or later review set. Reviewers will typically code documents based on their current understanding of the overall issues in the matter or the context presented by surrounding documents rather than precise memory of how they coded a similar document weeks before.

Near duplicates and email threading alleviates this issue by easily identifying near duplicates and grouping email threads.  One reviewer can look at similar documents versus splitting them up or review all the emails in one thread. This method increases coding accuracy and consistency as well as quality control. Think of your privileged documents! Email threads that are coded and redacted differently will undoubtedly raise a red flag with opposing counsel and the courts. Still not convinced that ND and ET are a no-brainer?

In 2013 in reference to 502(d) orders, U.S. Magistrate Judge Andrew Peck said:

“In my opinion it is malpractice to not seek a 502(d) order from the court before you seek documents. That doesn’t mean you shouldn’t carefully review your material for privileged documents before production, but why not have that insurance policy?”

I think the same applies to near duplicates processing and email threading – not using them is almost malpractice.

As lawyers, we are expected to be zealous advocates for our clients, and their pockets.  It is our duty to apply easy-to-implement, defensible, commonsense strategies to obtain the best results possible.  Technologies like near duplicates identification and email threading are available in most tools and are so easy to implement that there’s no compelling reason not to use it.