I was just at the 2015 ABA National Symposium on Technology and Employment Law and had the privilege of participating in a panel about big bata and bias in employment law. The conference was really, hour for hour, two of the most valuable days of my legal career.
I sincerely appreciated the opportunity to participate in a confer conference where employment lawyers know how important it is for them to be involved in every single step of the big data puzzle, from creating the questions, to picking the datasets, to supervising the programming and the statistics and to interpreting and using the results.
I thought I would share two of the questions (in very, very summary form) I was asked and my thoughts on them.
Can big data play a part in eliminating bias in the hiring process?
People analytics – the focusing of big data on human resources – is a tool for finding those factors that correlate with workplace success. Done right, people analytics can be an in powerful tools for testing and understanding our human resources practices, including understanding bias.
Big data is a neutral tool, not an end in-itself. The tool is only as good as its users: it is people who select, use, manipulate, study and interpreting the data. If the people using the tool work hard to keep out bias, the tool can be an extraordinarily powerful tool for creating a bias free workplace. If there is any weakness in the methodology, however, the results can be devastating.
James Grimmelmann in “Discrimination by Database” puts it succinctly: “data mining requires human craftwork at every step.” And where there is human craftwork, bias and bigotry — whether intentional or not — can creep in.
Unlawful bias can find its way into the analysis of big data in many places. While the full details and explanations can be found here, employment lawyers will need to ask some of the following questions about big data to uncover (or disprove) bias:
- What question was asked of the dataset?
- Who was asking the questions? Was the data scientist tenacious (or creative) enough to pursue and craft the right questions?
- How were the dataset(s) originally created and for what purpose? What omissions, errors (including discriminatory ideas) might have crept into the initial dataset?
- How were the datasets selected? Who selected them? On what basis?
- Is the data quality high enough? Is it relevant, complete, correct, well structured, valid, and timely.
- Do the datasets accurately represent the population being studied?
What data was missing and how was its absence handled by the data scientist?
- What data was discarded?
- What proxies were used?
- Was the model trained properly? Was the model used to examine the data trained on the correct data points? Was the tested data sufficiently close to the training data?
The moderator asked us to consider the example from Malcom Gladwell’s book Blink of an orchestra transitioning to a “blind” audition format — and then seeing a five-fold increase in women winning professional orchestra seats. The Gladwell example is very instructive about bias and big data. If performance is the sole criterion for hiring a musician, then eliminating all other information is merely cleaning up your dataset to eliminate noise. However, most human resources activities aren’t reliant upon such a single criterion. Accordingly, the more noise (irrelevant criteria) in the set, the more opportunity for bias to creep in. The goal, as with the Gladwell example, is to ensure that our noise filters are fine tuned to the task at hand.
How much attention are regulators legislators paying to Big Data?
Numerous agencies, including the FTC, EEOC, DOL, SEC, etc, have begun to express an interest in big data analytics.
What worries me most is that big data is extremely sophisticated. And while I have nothing but respect for EEOC lawyers, I worry that actions will be predicated on bad interpretations of data and bad legal analysis of the data. For instance, I am just not convinced, as at least one EEOC lawyer who recently spoke at an FTC conference is, that disparate impact is the right tool for analyzing the results of big data.
By definition, a good use of big data should produce results that are able to withstand any sort of validity analysis seeking to show that there is a quantitative relationship between the selection method and on the job-related skill.
I wonder if the best traction that plaintiffs will get is through pattern and practice claims. Since it is nearly universal that pattern and practice claims must be brought by the government or classes, here is where government agencies will gain their most traction.
In these cases, the government must prove much more than the occurrence of single discriminatory acts: it must prove that unlawful discrimination was the company’s standard operating procedure. Unlike in a typical individual disparate treatment suit, “a plaintiff’s burden under the pattern-or-practice method requires the plaintiff to prove only the existence of a discriminatory policy rather than all elements of a prima facie case of discrimination—but under the pattern-or-practice method, only prospective relief [is] available, unless the plaintiffs offer additional proof.”
And a company’s standard operating procedure, is, of course, following a people analytics-based methodology.