The price of data storage and sophisticated analytics are dropping. This will herald a sea-change in how employment cases are litigated, including both class actions (initially) and individual cases (eventually). Employment lawyers will need to:
- be involved in data governance;
- apply basic statistical methods to the results of data mining; and
- get deep “under the hood” of the data scientist’s work product.
Some credible estimates are that ninety percent of the world’s data was generated in the last two years alone. Much of this data is in the hands of employers.
While these pools of data contain incredible opportunities for increasing knowledge and enhancing the management of organizations, this also means that plaintiffs and defendants — and employment-related regulatory agencies — will have access to unprecedented amounts of data that can be mined, uses, interpreted and exploited in increasingly less expensive ways.
What do employment lawyers need to do and know?
First, where possible, lawyers must be involved in data governance. I’ve discussed this extensively in two related posts: ESI and Data Governance Part 1 and Part 2. As data mining enters employment litigation, data governance decisions will have an enormous impact on defense costs.
Second, lawyers will need to know enough statistics to unpack claims predicated on data analysis. All serious data mining is, at base, a combination of statistical and programming work. Accordingly, the old saying “lies, damned lies and statistics” should be a lawyer’s mantra in this area. Being able to ask the right questions about a Big Data result will go a long way to dealing with adverse claims, especially as the defense of such claims potentially involves the expense of experts.
Last, lawyers will need to know enough about how data miners work, including programming, to be able to be active participants in examining their results, both before and after experts get involved. As we all know, a data miner (nowadays called a data scientist) doesn’t just plug-in questions to get answers. They pick datasets, design algorithms for statistical pattern recognition and use those algorithms via software (such as Hadoop) on the selected pools of data. Along the way they make assumptions about the data sets, impose structure on that data, deal with missing data, make judgment calls about the questions to ask of the data and interpret results. This is a process ripe for errors. Put differently, “data mining requires human craftwork at every step.” (See James Grimmelmann at “Discrimination by Database.”) And where there is human craftwork, bias and errors — whether intentional or not — can creep in.
In a very near-future post, I will discuss this third element in detail, outlining the precise places where error and biases to creep into the data mining process. [Ed. Note: It is here]. In the meantime, please take a look at Solon Barocas’ presentation at an FTC conference “Big Data: A Tool for Inclusion or Exclusion.” Barocas’ engaging comments start on page 15 of this transcript (.pdf) or at the 15-minute mark of this video.