I recently wrote about Technology Assisted Review (“TAR”) and the importance of the combination of humans and technology in Let’s Make a Deal: The Document Review Version. This blog is a continuation of that topic.

In Judge Peck’s recent Da Silva Moore, et al., v. Publicis Groupe, et al. opinion (2/24/12), he emphasized the importance of having a well-thought out workflow and process when using TAR. In that case, the parties had more than three million emails and they agreed to the use of predictive coding. (Predictive coding is a particular type of search technology that generally relies on Latent Semantic Indexing, a statistics-based search methodology.)

The parties are now in a side dispute over the “scope and implementation” of that use of predictive coding technology in the eDiscovery portion of the case. This highly public and contested argument over which protocol and workflow to use when deploying TAR is the centerpiece of the dispute.

Perfection is never a goal of any technology assisted review software: rather, the goal is to find a reasonable and cost effective way to identify and produce relevant documents while identifying and withholding privileged material. The exact workflow and process has until now been the producing party’s purview. Here, the parties agreed to the defendant’s use of predictive coding, but now disagree on how to use it and the number of documents to be review from a statistically significant standpoint.

In Da Silva, the defendant agreed to turn over the seed set of non-privileged documents used to “train” the system. This disclosure would allow the plaintiff to see the decisions and coding calls made on the seed data. Seven iterative rounds of review were scheduled for the training phase. Once complete, the defendant was to sample the documents not marked relevant by the machine. The sample size the defendant’s were using for this phase was 2,399 documents. The plaintiffs questioned and disagreed that this protocol and workflow would work.

The Court reminded the parties that computer-assisted review “works better than most of the alternatives, if not all of the [present] alternatives. So the idea is not make this perfect, it’s not going to be perfect. The idea is to make it significantly better than the alternatives without nearly as much cost.”

The court agreed to use that workflow as described above, but with the caveat that the parties may need to revisit the proposed workflow if things aren’t stabilized or working as first thought. The fundamental disagreement is over whether the seed set of data used to train the predictive coding technology is representative of the entire data set.

All of these things are mere details of a workflow, which is used to decide how to actually locate and confirm a document is relevant (and not privileged) and marked to be turned over in a document production.

Judge Peck went on to say that the Daubert standard is “not applicable to how documents are searched for and found in discovery.” He also said that “it is the process used and the interaction of man and machine that the courts need to examine.” That interaction is the workflow and process behind what was done with the technology by the humans. That workflow is the cautionary tale from the Da Silva Moore decision. That workflow is what we need to be paying attention to when we are putting together an effective eDiscovery plan.

I consider the next to last sentence of this opinion to be the most use: “As with keywords or any other technological solution to ediscovery, counsel must design an appropriate process, including use of available technology, with appropriate quality control testing, to review and produce relevant ESI while adhering to Rule 1 and Rule 26(b)(2)(C) proportionality.” In the end, workflow matters.

Over the past year or so, quite a few judges have wrestled with the infamous “ediscovery costs” question: “Can successful litigants seek reimbursement from the losing party for ediscovery-related expenses?”  While decisions vary by jurisdiction—and often, pretty significantly—most courts refer to the same two statutes: Federal Rule of Civil Procedure 54(d), which generally states that prevailing parties may recover costs, and 28 U.S.C. § 1920 (4), which allows for the recovery of “[f]ees for exemplification and the costs of making copies of any materials where the copies are necessarily obtained for use in the case.”

Just last month, in Race Tires, et al v. Hoosier Racing Tire Corp, et al, No. 11-2316, (3rd. Cir. March 16, 2012), the 3rd Circuit denied almost all costs relating to ediscovery.  More specifically, it applied a narrow interpretation of § 1920 (4) and held that the plain language of the statute, as applied to ediscovery, only allows reimbursement for scanning and file conversion, i.e., tasks akin to modern day copying.  But nationally, a slight majority of federal district courts have held otherwise.

For those of us in California, several district courts have expressly approved recovery for a broad range of ediscovery tasks.  These have included: collection, processing, and reproduction in preparation for document review.  See Parrish et al v. Manatt, Phelps & Phillips, LLP, et al, No. C 10-03200 WHA (N.D. Cal. April 11, 2011) at 4.  Project management has also been held to be a recoverable cost.  See Jardin v. Datallegro, Inc., et al, 08-CV-1462-IEG WVG (S.D. Cal. October 12, 2011) at 11.

The courts that have addressed these issues generally tend to recognize the utility of ediscovery tasks, i.e., that the underlying tasks were “necessarily obtained for use in the case” rather than for mere convenience—a threshold showing per § 1920 (4).  The Parrish court stated that such costs were “necessary expenditures made for the purpose of advancing the investigation and discovery phases of the action.”  And in Glenn Tibble et al v. Edison International et al, CV 07-5359 SVW (AGRx) (C.D. Cal. August 22, 2011), the court noted that ediscovery costs are necessary because litigants are “required” to produce electronically stored information, unless they can demonstrate undue burden or cost per FRCP 26(b)(2)(B).

Another issue that has come up is “excessiveness”.  In ediscovery, it’s easy for a litigant to say that the opposing party’s ediscovery costs were excessive—ediscovery is expensive generally.  However, being proactive about selecting a vendor can minimize this potential argument.  For instance, in Tibble, the court allowed costs, in part, because the requesting party selected its vendor based on vendor expertise and a competitive bidding process.  (See Tibble at 9.)  (As an aside, the court did not state that the litigant chose—or was required to choose—the lowest bidding vendor.)  As an aside, counsel should be aware that the specific role a vendor plays, and even the language used in invoices, may also impact the recoverability of costs.  This relates to the distinction between attorney’s fees and costs.  If a request for vendor costs appears to describe tasks that involve strategy or other activities “typically entrusted to lawyers”, a court will likely deny cost recovery.  (See Jardin at 11).

In the end, no matter what jurisdiction is involved, to minimize the impact of costs, counsel should consider the full breadth of strategies available, including protective orders and cost shifting.  Further, and perhaps most importantly, even though federal district courts in California have a tendency to allow for the recovery of ediscovery costs, counsel should keep one point in mind as a guiding principle: ediscovery is technical.  Never assume that the court “gets it.”  Counsel should make an effort, at every opportunity, to educate the court and explain the many nuances when dealing with electronic evidence.  It will undoubtedly impact the bottom line.

Parrish: http://law.justia.com/cases/federal/district-courts/california/candce/3:2010cv03200/229842/100/

The US Court of Appeals for the Federal Circuit recently adopted a model order for patent cases which is disconcerting, to say the least. Typically, model orders cover a large set of circumstances and provide effective guidance to help parties move a case forward effectively and efficiently. This model order does neither and may actually be harmful to the 26(f) process and ESI agreements that are being negotiated now.

The parts of the order that concern me the most are:

6. General ESI production requests under Federal Rules of Civil Procedure 34 and 45 shall not include email or other forms of electronic correspondence (collectively “email”). To obtain email parties must propound specific email production requests.

One of the main purposes of the 2006 update to the FRCP was to define ESI (Electronically Stored Information) which includes email. Thereby arguments over email could be eliminated: whether they were about specifically identified eDiscovery requests or whether they should be turned over like any other relevant evidence, regardless of whether it is electronic or not. Yet, the model order requires email to be requested in a specific “email production request.” This seems to be a major step backwards.

5. General ESI production requests under Federal Rules of Civil Procedure 34 and 45 shall not include metadata absent a showing of good cause. However, fields showing the date and time that the document was sent and received, as well as the complete distribution list, shall generally be included in the production.

The model order also calls out metadata specifically (not just that related to email) and says that production requests should not include metadata (unless it is send and receive metadata). This shows a fundamental misunderstanding of ESI, and how it is used in litigation. Metadata exists as part of an electronic document or email. Keeping metadata out of productions is like turning over half a piece of paper without relevant text on it or not turning over the pencil markings on a (paper) memo. Metadata is a crucial part of ESI, even if it isn’t immediately obvious from the face of a document. Moreover, it is the foundation for keyword and other searching techniques.

10. Each requesting party shall limit its email production requests to a total of five custodians per producing party for all such requests

Limiting an email production request to five custodians demonstrates an arbitrary decision which – more than likely – has little relevance in a case. Every case differs and may involve 5 inventors or a team of 50 inventors. The opposing party generally doesn’t or can’t know who the most relevant and “effective’ custodians without a review of a custodian’s data. Some custodians may have more relevant data than others. Narrowing a search and review strategy to collect and review data from those “priority” custodians is key, and it is key to do this early. Arbitrarily limiting email production requests to 5 people completely misses the point of early case assessment and evaluating the actual data to figure out who are the most relevant custodians.

11. Each requesting party shall limit its email production requests to a total of five search terms per custodian per party… Indiscriminate terms, such as the producing company’s name or its product name, are inappropriate unless combined with narrowing search criteria that sufficiently reduce the risk of overproduction. A conjunctive combination of multiple words or phrases (e.g., “computer” and “system”) narrows the search and shall count as a single search term. A disjunctive combination of multiple words or phrases (e.g., “computer” or “system”) broadens the search, and thus each word or phrase shall count as a separate search term unless they are variants of the same word. Use of narrowing search criteria (e.g., “and,” “but not,” “w/x”) is encouraged to limit the production and shall be considered when determining whether to shift costs for disproportionate discovery…”

Where to start with this provision? There are many disconcerting things here so I will focus on my fundamental concern: the importance of developing a search strategy.  If you have participated in a search strategy session where the objective is to scope a data set for collection or review, there are many, many ways to filter and cull data and there are many, many ways to search for relevant or privileged data. Most of these methods do not involve keyword search AT ALL. The most effective methods for that data set may not even use keyword search at all. I’m not saying that you wouldn’t use keyword search as part of your search strategy, but it certainly isn’t the ONLY method to use. Having a model order suggest a technology and then dictate how to count the searches performed with that technology seems misinformed and fraught with the likelihood of costly motion practice because of discovery abuses.

9. Email production requests shall identify the custodian, search terms, and time frame. The parties shall cooperate to identify the proper custodians, proper search terms and proper timeframe.

What I appreciate about the model order is that parties are required to cooperate with regard to email production requests. The model order misses the point that eDiscovery requires a well thought out search strategy to find the right data, not just run a limited set of 5 keyword searches over a limited set of 5 custodians’ data. A more effective model order would have required cooperation for all ESI requests and not tried to dictate how data should be requested or searched. A search strategy is privileged work product but part of any strategy is cooperation with the requesting party to properly limit the scope of the request and production.