eDiscovery in 2015: Predicting the Unpredictable

I’ve been mulling over my eDiscovery predictions for 2015 and I realized that they represent more of a “wish list” than anything. Suffice it to say, I feel like we have a lot of ground to cover with respect to innovation and evolution in eDiscovery-related products and services. Here’s my road map for the journey forward:

1) The contrast between document review and document investigation will become more stark, especially as we continue to confront the exponential growth of ESI. Finding text-based documentary evidence is more than creating different ways to categorize and read piles of electronica, simple information search and retrieval, and/or large scale data analytics. There are forensic linguistic investigative methodologies that can offer real solutions to finding out the ‘who, what, when, and why’ in a collection of ESI. And they don’t require huge platforms, armies of human power, or a lot of time and money, but rather the right expertise and expert tools. Why set up huge review efforts to find evidence when you can consult experts who just go in and find data-driven answers to your questions? There are significant areas of expertise that eDiscovery has yet to properly tap, but I expect this will change in the coming months and years.

2) Big, clunky, “everything but the kitchen sink” review platforms are going to go the way of the dinosaur. We’re becoming a more a la carte (a la app?) industry, and society in general, with respect to technology. We have a specific enterprise and we want a tool that accomplishes it. On the horizon I see lean, efficient, agnostic expert tools and solutions, and expert users manning the tools which provide these solutions.

3) eDiscovery tools and solutions are going to evolve beyond simply reacting to large quantities of data. And in the same vein, we will start to move past focusing primarily on eDiscovery processes for culling or reducing data. Rather than exclusion, I predict a move toward quite the opposite end of the spectrum: Inclusion. Let’s stop hand-wringing over reducing size and start concentrating on identifying and collecting the most qualitatively relevant, robust data set possible, using the most valid and reliable methods available. When your objective is to identify and produce the most qualitatively valuable data set, the idea of culling becomes a moot point. We simply have to stop letting the fact that we’re dealing with a lot of data be the driving force behind adopting eDiscovery tech solutions.

4) Much to my dismay, predictive coding is going to continue to make headway into relevancy identification. This should not imply that I am not a fan of predictive modeling, because in the right context, it is a hugely powerful and productive methodology. In fact, I love using predictive models in some  investigative contexts. However, I’m going on record to say that relevancy/responsiveness identification and collection is not the right context for this methodology. And when you use this methodology in an improper context, you will get unreliable and uninterpretable outcomes. It’s a risky venture.

5) Large corporations and businesses will move to handle more eDiscovery processes internally. There’s a pervasive conversation happening about information governance and management. eDiscovery is a natural extension of this conversation. If large companies advance toward principled, data-driven information governance, and truly get a handle on their ESI, then they are going to have to create teams representing the right combinations of technical experts and subject matter experts. When these expert teams settle into place and develop/implement smart information management processes, then wholesale out-sourcing of things like relevant/responsive document identification and collection will not be necessary. As a result, the data flowing outside of a company will be contained to the most relevant information to the task at hand, whether it is in the context of litigation, arbitration, or compliance, etc. In sum, good corporate information governance will have a have a huge impact on eDiscovery.

There you have it. My stab at prognosticating about the future of our field. And now instead of just writing about all this fun stuff, I’m off to practice what I preach.

Posted in Uncategorized | Tagged , , , , , , | Leave a comment

Technology and eDiscovery: State of the Union.

I rounded out 2014 by looking back over the last 15 plus years in eDiscovery, focusing on the historical context that influenced, and continues to influence, the legal profession’s alliance with the tech industry.  I want to ring in 2015 with a conversation about what has been happening since these two love birds came together, moving from a shaky partnership born of necessity, into a relationship that has endured and matured, and even spawned a new industry: Legal technology.

As the legal tech industry has evolved over the years, there’s been a range of tech solutions that have come on to the market for legal professionals.  A relatively young industry, legal tech is sort of like the wild west in a lot of ways, eager to accommodate eDiscovery homesteaders in particular with a variety of products and services.  Large-scale hosting platforms abound. Data analytics that help you make sense of your massive amount of ESI are customary offerings. All manner of technology, tools and processes are available to assist you in wrangling that nebulous, Everest of electronica in the context of eDiscovery, and beyond.

To be sure, there is nothing more daunting than being faced with scaling this Everest of ESI in the context of eDiscovery.  And there is nothing more comforting to have an expert show up and tell you they’re going to outfit you with everything you need to make this herculean task manageable. Or better yet, easy.

Be that as it may, there’s no elixir vitae in legal tech that meets all of the demands of eDiscovery. In fact, the field relies on a staple of “a la carte” technologies and technical expertise to accommodate different aspects of the spectrum of eDiscovery processes.  And as we all know, there are a lot of various stages and aspects to eDiscovery, many requiring a gamut of technical expertise.  Although technology and technical expertise in eDiscovery is paramount, it is an ambitious task navigating all of our options under the circumstances in which we’re operating: The legal infrastructure in which we endeavor places an extra layer of complexity on a series of processes that are already highly complex.

As with every consumer product or service, there’s a usual “buyer beware” caveat and legal tech is no exception. All tech is not created equal. Products may appear similar, but may be very distinct if you peek under the hood. Many times, these distinctions are singular and far-reaching. This makes it even more complicated when choosing the right tool/process/application to suit your eDiscovery needs, which vary and shift at every stage of the game.

Thus, finding and employing the *right* technology and technical expertise is important. Assessing and understanding what this program does, or what that automated method achieves, is key in legal tech. But it’s challenging for a number of reasons.

First, it’s hard to understand the constraints or drawbacks of a highly technical product or service because of the nature of the data we’re working with. Here’s something that everybody who works with unstructured, text-based natural language needs to understand: It is the most complex, varied and ever-evolving data-type going. It is not like structured data in that it offers a neat one-to-one correspondence between form and function. Language is just not like that, as any linguist will tell you.  Language, and the text used to graphically represent Language, is infinitely variable, innovative and changes every day. This is an empirical fact that needs to be accommodated with every single tech solution in eDiscovery. Period.

(Garden path alert! Skip this paragraph if you value continuity.) In the early days of auto-correct, I used to ponder the statistical methods used as the foundation of these potentially useful predictive algorithms. These algorithms were originally developed to work on structured, numerical data. They were designed to work on data that had a consistent one-to-one relationship between symbol and value. But as with everything interesting and useful, folks wanted to expand predictive modelling algorithms to work in different contexts, on different data types.  As previously mentioned, text-based natural language data does not superficially manifest into tidy, regular form/meaning relationships. In their incipience, auto-correct programs were pretty terrible, making it obvious to me that whatever data was used for beta-testing wasn’t a representative sample of the kind of text-based language that characterizes much of our every-day, computer-mediated communication.  I would think to myself: Did they even consult an applied/empirical/corpus linguist in the research and development of these applications? It irritated me. But then we had all those awesome buzzfeed tributes to auto-corrects gone awry, which amused me greatly. I’m conflicted. But I digress…

Secondly, employing a technical solution on huge collections of complex data is an even bigger adventure.  Margins of error look very different depending on how much data you have. Not only that, but it becomes more complicated to systematically assess a tool or automated method’s validity and reliability in large collections of unstructured, text-based natural language data. Complex data and lots and lots of it. Double whammy. This makes it harder for a non-expert to empirically verify if a particular program or application is doing what it should, or what it purports to do. To be sure, it’s doing something. But what are its limitations? What isn’t it doing? What sort of information is it eliminating or not returning? What sort of information does it privilege?

Third, it is our nature to think that when we find a tech solution that works in one context, that maybe we can use it successfully in another, seemingly related context. While one tool/process/application may work well in one particular context, that doesn’t mean that it is a perfect fit for another, related but slightly different one. Forging ahead anyway will often produce uninterpretable or inconsistent results, unbeknownst to the user. I’ve seen this with a lot of borrowed technology that has made its way into eDiscovery, namely, predictive coding programs.  Predictive coding has mostly been used for large-scale, automated categorization on produced ESI.  Now predictive coding is being considered as “proven technology” for relevancy identification and review in pre-production stages (see DaSilva Moore v Publicus). In order to assess whether or not this is a good idea, you really need to have a nuanced understanding of how predicting coding algorithms work. And specifically, how these algorithms, that again were developed for structured data, react when used on unstructured, text-based natural language. I actually have a good deal to say about this, but at the risk of another garden path, I’ll save it for its own post.

Here’s a quick example that illustrates a mash up of all these points. You’re tasked with document review. You’re using a large document hosting platform. It has a complex search feature. The search feature operates by indexing the words in the collection. This index is created and stored in a database, as is standard fare in searching tools on large-scale document hosting platforms.  Nothing interesting to see here.  Well, not to most people anyway. But I have questions, namely: Does the tool’s methods of indexation involve dropping the “noise” words to expedite the process? Noise words are those words that occur frequently in a language and are thought to impart little to no information. They include words like the, a, of,  and well, and.  These words occur at very frequent rates and make up a large percentage of an English language corpus (in linguistics, corpus is another word for collection). Most indexers drop them because it cuts down tremendously on the amount of text the indexer will be processing and sifting through, ultimately making the search feature faster.

My inquiry about knowing whether the indexer you’re working with does “noise word” elimination is something that probably nobody ever using these tools, or purchasing these tools for use, ever really thinks about, or cares about. On its face, it seems not at all worth considering.  Is it even important? Maybe not if you’re just using the search tool for organizing content, or making broad inquiries in your collection, with the goal of then organizing big piles of text into smaller piles of text. But what if you’re trying to use this search feature to conduct any sort of fine-tuned, forensic text investigation? What if you’re looking for very specific answers to very specific questions? What if you’re investigating to find a smoking gun document? Then you better believe that noise word elimination matters.

Consider the difference between the following statements, “I have noticed a problem” versus “I have noticed the problem.” Now consider that the first statement was written in an email by one scientist to another regarding the analysis of endpoints in a clinical trial conducted by a big pharma company. Now consider the second email in the same context. Is there a difference? You bet. A problem could be any old problem, but the problem has obviously been identified before, as now it warrants a definite article to qualify it. It has evolved from being just a random problem, to being the problem. It also indicates that both emailing parties are aware of said problem, as it warranted no further explanation.  So much meaning and contextual information all wrapped up in one little “noise” word. And one of the most frequent noise words in the English language at that. A word only a linguist could love.

But seriously, if you’re still using a document review search tool for any sort of document investigation, stop reading this and get in touch with me right now. I can tell you definitively that document investigation is NOT the same as document review. Different processes, different objectives.

Well, this was a long one. What can I say? I’m a linguist. I love the words. If you’ve made it this far, you’re kindred at best, and at the very least, you’re a rebel in this virtual theatre of short attention spans.

Next week I’m going to do what everybody else is doing as we head into a new year and talk about the future of technology and technical expertise in eDiscovery. Like 99.9% of my blogging compatriots, I have some predictions. And I’m gonna lay them out for you! Before then, I’ll get to work on this year’s resolution: Brevity.

Happy New Year, folks.

Posted in document investigation, document review, predictive coding | Tagged , , , | Leave a comment

The ghost of Discovery past.

I’m going to start out today’s post with an exercise in stating the obvious: Technology is a central part of Discovery in the legal profession. Indisputably so. Take away computers and hardware and software and automated processes and all the rest of it, and you’re left with banker boxes full of physical documents, a highlighter, hundreds of hours of reading, and a whole lot of paper cuts on your fingers.

Computers and automation literally transformed the discovery process in the legal profession (it gave it a prefix!), and it seems like it happened in the blink of an eye, which is not a temporal pace one usually associates with legal. And it’s not just that the legal profession had to adjust to the fact that computers and automation and advances in tech permeated every facet of every industry. That’s a speeding train that everybody has had to board. It’s that computers and automation and tech advances profoundly changed the quality and quantity of the artifacts of every industry, as well as the artifacts of personal documentation and communication. And these artifacts are the very center pieces of discovery processes. A discernible pile of physical documents became a limitless and ever-expanding universe of electronically stored information. Banker boxes were traded for hosting platforms, highlighters for radio buttons, and paper cuts for carpal tunnel. Discovery became eDiscovery.

Interesting aside: Here you have a profession that operates on the very notion of precedence, historical relevance and traditional methods, having to quickly adjust and evolve due to broad external technological forces, and doing so at a speed that could be construed at “uncomfortably terse” and at a pace that shows little or no pause for precedence and history.

Regardless of the legal profession being traditional by nature, eDiscovery has had to operate in a cutting-edge manner as it is a driver of technology in the field, as much as it is also the result of technology in the field.

And if technology is a foundation of eDiscovery, then data is the massive metropolis erected on top of this foundation.

The legal profession has been dealing with, and reacting to, “Big Data” (or Big ESI, as it were) long before it was a regular part of the tech lexicon. It could be argued that the legal profession in general, and eDiscovery in particular, has been at the fore in adopting practical tech solutions to deal with large quantities of computer-generated data, or what I referred to upstream as the artifacts of doing business in our computer-mediated world.

Now, I am getting to the point (finally!) that I want to frame a larger discussion of technology, technical expertise and eDiscovery: I believe technology and technical expertise in eDiscovery proper has been driven almost solely by the fact that our computer-mediated world produces a lot of data. I want to reiterate this point because it’s an important one: In eDiscovery, the drivers of incorporating technology, and adopting tech solutions, have been primarily a reaction to the quantity of ESI, as much as to the digital environment in which it is produced. And even if you don’t completely agree with this assessment, it’s an interesting idea to ponder, nonetheless.

And with that, I’ll conclude this historical look at technology and eDiscovery. I’ll end this last post of 2014 by looking ahead to next week’s first post of 2015, which will be a present-day technology/eDiscovery state of the union, segueing into a look to the future. Until then, I hope your holidays have been uncommonly merry and undeniably bright.

Posted in Uncategorized | Tagged , | Leave a comment

eDiscovery and Linguistics: Strange bedfellows?

If there’s one thing that a dozen years in eDiscovery has shown me, it’s that there is a range of expertise that bolsters this industry. There are the usual suspects: IT expertise, digital forensic science expertise, legal tech expertise, legal knowledge of the Discovery process under FRCP and the amendments therein, etc. All of which contribute to the eDiscovery process, all of which are necessary ingredients in a comprehensive eDiscovery strategy.

Something that is not often talked about, however, is this: If you are tasked with knowledge discovery, and the data set you are working with is a collection of some sort of natural language, and in the case of eDiscovery, unstructured text-based natural language, you need to have language/linguistic expertise. And I am not saying that you need to have perfect command of your native language. I’m not saying that you have to be a grammar maven or a clever orator or rhetorician. I am saying that you need expertise about linguistic processes at work when people communicate ideas, sentiments, and information through Language.

Linguists and language experts have training in areas such as lexical semantics (word meaning and word relationships), pragmatics (how context informs how we impart and interpret meaning), language variation and change (all of the different ways we can express one idea and how this evolves over time), and all of the other linguistic and extra-linguistic variables that characterize human communication, in both written and spoken language. If you don’t consult this particular area of expertise, especially when you are engaged in finding evidence in huge collections of ESI, which by its very nature is linguistic evidence, then you are overlooking a valuable resource in eDiscovery in particular, and knowledge discovery in general.

I liken it to this: If your car won’t start and is in obvious need of repair, you don’t call the engineers who designed the engine. You don’t take it to the guy next door who has extensive knowledge of antique cars. You don’t take it to a body shop. And you certainly don’t assume that because you can change your own oil and spark plugs, you can diagnose the problem and have the tools and expertise to fix the problem. You take it to a mechanic. What a mechanic is to the inner workings of your automobile, so is a linguist to the inner workings of human communication and Language.

This assertion may not be received well by people in this profession, but it’s a fact: We use language to communicate. We use language to express ideas and opinions. If you’re trying to discovery critical information buried in an Everest of computer-mediated communication, the medium is, in fact, the message and the message is linguistic in nature. You should be leveraging this area of expertise in any and all eDiscovery-related tasks.

Now that I’ve made this assertion, I’ll go a step further and say it’s not just any linguist you need, but an empirical linguist who works in an industry setting and one who works with language data. Particularly, one who works in the legal industry and understands that the goal of employing linguistics in this framework isn’t to develop a model of Language, or a model human communication as it were. Rather the goal is to use knowledge of Language, linguistic processes and models of communication to extract ideas, information and patterns imparted in the language people use to communicate them.

I’ll put a quick example out there, and one that is central in my practice of large-scale forensic text investigation in eDiscovery. Redflag language is language that *may* lead to some interesting investigative trajectories in the context of a larger legal narrative. It’s language that represents a little warning signal, or something that may warrant a further look. I always evaluate entire collections to see if there’s higher linguistic norms of the stuff than there should be. Or lower. I investigate how redflag language clusters around certain dates or subject matter. I generate reports of content-of-interest that is statistically correlated with redflag language in a collection because I want to see what other language regularly co-occurs with it. Evaluating redflag language in a collection is a great way to see where the trouble spots in a business or corporation lie.

See there? Warning signal, trouble spots. I used two great examples of redflag language to define redflag language. These phrases impart the concept of “problem.” And the concept of a “problem” can be expressed in a huge variety (I’m talking thousands) of ways. You can have a serious issue or are dealing with a serious matter. Or maybe you face a challenge. You’ve bumped up against an obstacle. You’ve encountered a tricky situation. Or you sense something is amiss and you email your colleague and ask “what is going on here.” They respond that they have some bad news.

If you’re trying to find documentary evidence to support a legal narrative, you want to incorporate redflag language into your investigation. However, you shouldn’t just rely on your own intuition about how people talk about problems. Or how they express negative sentiment, or give advice or imply causation, or threaten a co-worker, or sexually harass a co-worker. A linguist will use data-driven, principled methods to uncover all of these areas of investigation in a consistent and comprehensive way. It’s what we do.

Recently I was talking with an individual who is involved in risk mitigation in a Fortune 500 company. He told me that they do email sweeps and investigations into various areas by relying on keyword searches. I asked him who came up with the keywords and the query algorithms? What methodology did they use to validate their term list(s) and what data did they validate it against? What was their margin of error when conducting these keyword sweeps? I had a lot questions. He said, “No, no, no. We literally have a list of several dozen of words and do searches for each.” He then told me the people on his team (and it’s a vast team, to be sure) came up with the terms based on internet research. I was stunned. I could not believe in this day and age, with all of the unbelievable expertise out there, a company of this size and stature could not do better than this.

For the record, if you’re trying to figure out if somebody is sending sexually explicit and harassing emails to a co-worker, simply searching for “sex” does not do the trick. Likewise, if you’re trying to uncover evidence of fraud in a huge collection of ESI, searching for “fraud” isn’t going to turn up anything useful. But I can tell you from experience, people engaged in harassing other people through computer-mediated communication, as well as people engaged in defrauding another person or an entire entity, these folks use specific language and patterns of linguistic behavior that leave an electronic trail of their misdeeds. You can uncover these trails. Do you want to know how? Then you should seek out linguistic expertise.

Posted in Uncategorized | Tagged | Leave a comment

Document Review versus Document Investigation: Apples to Oranges.

In my last post I asserted that document review and large-scale,‭ ‬discovery-driven text investigation are not the same thing. I said that comparing the two is like equating combing the beach with metal detector with an archaeological dig. Now I want to flesh out this analogy a bit to underscore the sentiment behind it.

In the first endeavor, you show up at the beach with your tool, maybe having mapped out a long stretch for exploring, or maybe breaking up the search into smaller, defined areas, or perhaps you just planning on walking as long and as far as you can in the time that you have. Then you start making passes and sweeps across the sand with your detector, hoping you find something interesting. You may find a coin or a ring, but you never know what sort of treasures you’ll dig up and put in your pocket. If your lucky, you’ll stumble across something valuable. If not, the most you can say for your day is you got some exercise.

In the archaeology example, you use your expertise and experience to carefully pinpoint the location of the dig. You understand the nature of the terrain, which in turn informs the types of your tools and your methods of excavation. You have an expectation of what you’re going to uncover. You know where to look for certain artifacts. You have a point of reference for what you’ll find because you know what type of site you’re excavating before you ever disturb the soil. For example, if you’re excavating a dwelling structure, you immediately recognize a pottery shard as such and you set out to uncover the rest of the pieces in the immediate area in order to reconstruct the entire artifact. You proceed in a principled manner, recording your findings and placing them in a larger context of discovery. Finally, you assess the artifacts you’ve uncovered in a larger context, one of an entire culture or a historical point in time.

If you’re tasked with finding evidence in a large produced collection, no day should go by where you don’t uncover valuable information that supports you case narrative in a meaningful way. Document review isn’t an exercise in reading, just as combing the beach in search of treasures is not an exercise in, well, getting exercise.

Finding documentary evidence in a produced collection of ESI should be a dynamic, flexible endeavor that represents the intersection between various tools and methods and the right kind of expertise. Discovering case-winning information should not simply be a linear process by which a memo or an email or a sales slide is read, categorized by checking a box, and moved off of one pile onto another. Categorizing documents as “hot” or “super hot” is not the same as deriving facts and intel by way of meaningful, data-driven forensic investigation.

I have seen it happen all too often that huge review endeavors begin with a certain set of expectations and objectives only to uncover information months down the line that changes the course of everything, rendering efforts up to that point counterproductive. Large, resource-intensive review efforts may or may not be what is needed to uncover winning documentary evidence, but regardless, where the review team is the army, the forensic text investigators are the scouts. We ride out ahead of the army and find critical intel and facts that inform overall strategy in the most productive manner possible. We uncover information and find investigative leads quickly, which can transform the very nature of your case. We find the story in the documents that underscores your legal narrative.

Look, document review and large scale analytics (predictive modeling, etc) may be a valuable part of your eDiscovery strategy, but if you want to be ahead of the curve, as well as save time and money, you should recognize that you have options. You can hire investigative experts that will tackle your collection to find out: What did they know (and who *they* are), when did they know it, and what did they do about it. And we will have a variety of tools at our disposal, as we will use any and all applications or processes (often making our own tools) that we need in order to extract critical, case-building information.

Knowing the difference between reviewing and investigating, and including both in your eDiscovery strategy, is going to ensure that at the end of the day, you have the most valuable documentary evidence you need, uncovered in the most efficient way, and in the quickest turn around possible.

Posted in Uncategorized | Tagged , | 1 Comment

Notes from the eDiscovery trenches: Every collection tells a story.

When you’re tasked with uncovering critical documentary evidence in a produced collection of electronically stored information, or ESI, you’re doing so to support a legal narrative that is the center piece of a case. In an eDiscovery setting, everything you do with respect to collecting, managing, and assessing ESI should be directed at finding the evidence you need to inform your legal narrative in a meaningful way. And to be certain, there is a story to be told in your collection that correlates to the theme(s) of your case. There always is.

I’ve been doing large scale forensic investigation in produced collections of computer-mediated communications and other business related electronica for over a decade. I’ve worked on some of the biggest class action civil suits in that time frame, related to everything from shareholder fraud to product liability, and if you know how and where to focus your investigation, you can find case-building evidence. Always.

When I first started investigating the communications and documents of Fortune 100 companies, I was amazed that smart and informed business people would go on the record and say the most brazen things. And not just rank and file employees of a large corporation, but higher ups, upper management, CEOs, people that one would think should know to exercise caution. The fact remains, there is a trove of evidence in a company’s electronic data. The digital artifacts of doing business in a computer-mediated world are full of critical information.

People talk, in writing, a lot. You can hang your company communication policy on the wall in front of them, but it won’t matter. Millennials in particular have grown up online. They have grown up communicating with people in every time zone, in every part of the world, about, well, everything. They write it all down. All of their opinions, feelings, all of their business dealings, all of their personal business. It all goes into a computer-mediated communication of some kind intended for an audience of some sort. Let’s face it, nowadays writing is really text-based speech and not simply a formal, graphical linguistic representation of a Language. And this communication genre creates the linguistic timeline of our lives.

Millennials entering the work force seemed to usher in a change in the very the nature of computer-mediated communication and documentation in a business setting. For example, I have witnessed an evolution in company email communication patterns over the years: In the late 90s, just as email started to really become the method of casual business communication, these communications resembled written letters. You would have a formal greeting, an introduction paragraph laying out the topic(s) at hand and consecutive paragraphs dealing with said topic(s), and a formal closing. You could expect standard punctuation conventions, as well as perfect spelling and grammar.

Now? Even formal business emails resemble a tweet: No greeting, topics laid out in a bulleted list containing around 100 characters, mixed content covering a range of both personal and business related topics, all rolled up into one communication, usually to several people at a time. Maybe capitalization and some punctuation, or maybe not. Actually, you’re more likely to see capitalization to indicate emphasis than to indicate a sentence boundary. Contextualizing IM-ese abounds, replete with lols and smhs (or LOL if the participants are really excited). All of these “non standard” writing conventions proliferate contemporary business communications at every level.

Forget the saying “never put anything in an email that you don’t want your mother to hear in a trial” (a 21st century version of a famous Sydney Biddle Barrows quote). Millennials put everything in writing somewhere, whether it’s on social media, their blog, in a comment section on a news site, or in a company-wide email. Their moms have already read it all. They talk with text and you can’t censor or “policy-away” these communication habits, even in a professional setting. Especially in a professional setting.

At some point, contemporary corporate information governance will have to accept these facts and deal with them effitively, but that is a post for another day. Suffice it to say, that in today’s eDiscovery ventures, you are going to have to deal with not only a large quantity of data, but data that varies mightily in quality as well.

During my tenure in the legal profession working in eDiscovery, I’ve learned 4 key pieces of wisdom about doing large scale text-based investigation for civil litigation support. I’ll be fleshing each point out over the next week or so, but here they are in a nutshell:

1) Document review and large-scale, discovery-driven text investigation are not the same thing. Comparing the two is like equating combing the beach using metal detector with an archaeological dig.

2) Language/linguistic expertise. You need it. This may not be received well by people in my profession, but it’s a fact. We use language to communicate. We use language to express ideas and opinions. Knowledge discovery at its very core linguistic in nature.

3) Technology and technical expertise is paramount. BUT all tech is not created equal and just because one tool/process/application works well in one particular context does not mean it will work in another. Specifically, you have to understand the limitations and margins of error with every tech solution you consider.

4) The future of eDiscovery is going to require innovation, ultimately resulting in a true marriage of human expertise and technical expertise. Oh, do I have a lot to say about this particular point…

Stay tuned for a discussion on each of these things individually. In the meantime, feel free to muse about any and all of these points in the comment section.

Posted in Uncategorized | Leave a comment

Illocution Inc.

We are expert language investigators. We work with law firms to discover critical information quickly. We combine expertise, experience, and technology to uncover the “who, what, when, where and why” in large, electronic document collections.

Posted in Uncategorized | Leave a comment