The Headlong Rush Into Analytics

Rushing Down (at the Tate Modern) - © 2016 Brenton B. Miller
Rushing Down (at the Tate Modern)

In a recent Forbes Tech article, How Big Data is Disrupting Law Firms and the Legal Profession, Nik Reed (co-founder of Ravel Law) is quoted as saying:

[T]he days when lawyers were all English Literature or philosophy majors are behind us now, my classmates included a lot of people from finance and one who had a PhD in biochemistry from MIT. These are people who are familiar with quantitative analysis and datasets, and they are yearning for richer information sources and better analytics technologies. It probably wouldn’t have gone down very well 30 years ago with the kind of people who were lawyers back then.

Because it’s been about 30 years since I last did legal research as an associate, I think I’m pretty qualified to reply that if cool and effective research aids like Ravel were available and affordable 30 years ago, we would have happily abandoned all of that manual treatise and digest reviewing, the pulling of countless court reporters and advance sheets and the oh-so-tedious manual Shepardizing! The visual strengths of the analytics tools now coming online would have been just as obvious to us back then as they are today. 

A Little Analytics Experiment

I also find it to be a bit ironic that a champion of “quantitative analysis and datasets” relies on personal anecdotal evidence of shifts in undergraduate majors to explain why lawyer hatchlings of today are supposedly more receptive to better analytics technology than lawyers were in the past. That just didn’t sit right with me, so I decided to do a little analytics experiment. I started by locating relevant data on undergraduate degrees of law students posted by the Law School Admissions Council and conveniently saved in Excel files. The data is limited to the years 2001-2014 and is not broken down by specific law school, so I couldn’t look specifically at Stanford, where Reed got his JD. Nevertheless, the data permits us to examine the general trends in undergraduate majors across all US law school attendees over a recent 14 year period. For my analysis I decided to try out three different tools – Excel pivot tables and charts, Tableau Desktop and IBM Watson Analytics. (Note: lots of other analytics tools are available to try for free as well.)

Excel, of course, is an old familiar friend, but it’s worth reminding ourselves that it’s capable of performing a number of analytical tasks, even if it lacks some of the visual bells and whistles of more contemporary tools. Because it’s the only one of the three tools I’ve used previously, I was able to quickly build a pivot table and filter on the four specific undergraduate majors mentioned by Reed (Biochemistry, Finance, English and Philosophy). The Excel chart below shows how Reed’s anecdotal observation of fewer English majors in law school is supported by the data but his other observations are not. Philosophy is steadily ticking up, Biochemistry is tapering off slightly and Finance’s overall slide has been interrupted twice by temporary jumps that seem to correlate with recessions (i.e., Finance majors gravitate toward law school when Wall Street isn’t hiring).

Excel Example – Selected Undergraduate Majors of Law School Attendees

Next, I uploaded the same data into Tableau Desktop. It was pretty easy to construct a visualization in Tableau comparable to the one I produced from the Excel pivot table. Tableau has a nice user interface and looks like it can be mastered relatively easily, considering I didn’t once crack open the Help link or “Getting Started…” stuff on the site. Here’s a rendering from Tableau for the same data with some trendlines added:

tableau example
Tableau Example – Selected Undergraduate Majors (with Trendlines) of Law School Attendees

IBM Watson Analytics’ claim to fame is its natural language querying capability. When you’ve uploaded your data and are ready to start working with it, Watson invites you to start from one of the many data visualizations it has proposed for your data. Alternatively, you can just type a question and Watson responds with whichever visualizations it thinks best addresses your question. I stumbled a bit with finding the right starting point, so the “assistance” being offered turned out to be a mixed blessing for me. Based on my rather limited exposure, I’m skeptical that Watson Analytics truly bypasses the need to have a firm grasp of your data and visualization requirements before you engage with it. I can tell that Watson Analytics will sometimes hit home runs, but when it strikes out with your analytical task, you’ll need to configure things yourself. That’s fine, but only if the user interface is no more difficult than others that don’t offer the neat up-front guidance. Unfortunately, I didn’t find the UI to be particularly intuitive and easy to navigate. It took me a little longer to adapt to the “Exploration” interface in Watson Analytics than I did with Tableau perhaps because Tableau more closely mimics how things are done in Excel pivot tables.

Nevertheless, I eventually found the right suggested option for what I was trying to do with the Watson test, which was to look at the trends for broader groupings of undergraduate majors to see whether humanities as a group (including English and Philosophy, of course) was trending down and the more data-centric majors in the natural sciences (including Biochemistry) and business administration (including Finance) were trending up. See for yourself:

IBM Watson Analytics Example - Groupings of undergraduate majors for enrolled law students over time
IBM Watson Analytics Example – Groupings of undergraduate majors for enrolled law students over time

Once again, the data doesn’t appear to support the hypothesis that millennial lawyers are more comfortable with analytics technology than preceding generations (if at all) just because of some shift toward more data-centric undergraduate majors.

The Analytical KM Connection

I could well be reading too much into the Reed quote, but the message seems to be that the success (or failure) of the new crop of legal analytics tools depends to some extent on increasing the data savviness of end users. I think that misses the point of where things are going with analytics. Tools like those I just tested are doing more of the heavy lifting and presenting the data in easily understandable ways. That trend toward more graphical/intuitive and augmented interfaces should continue like we’ve seen with search tools and computer applications generally.

The real challenge and need for data-related savviness along with, dare I say it, humanities-related savviness remains at the backend in the systems and processes that originate the data. If data is well-defined, well-structured, consistent, and complete to begin with, then the front-end execution of analytics tasks becomes much easier. In my little analytics experiment described above the source data was broken up into 14 separate files (one for each year’s worth of data). The number of enrolled law school attendees for each undergraduate major was included but the associated percentage of total law school enrollees for each undergraduate major was not. Manually combining the files, adding a Year column and a calculated column for % of Total Enrolled before uploading the data significantly simplified the front-end visualization task. Without the backend prep, I’m sure I would have really struggled trying to build the visualizations shown above.

Let’s also consider for a moment the data being used by the current crop of legal analytical tools like Ravel and Lex Machina. They focus almost exclusively on well-defined case law data. The cases themselves are uniquely named and codified, as are the jurisdictions, courts and judges. Parties/roles and names of counsel, litigants and other participants have been vetted. Even softer metadata like issues, actions, and outcomes have been successfully extracted and disambiguated. Of course, all of this high quality data is continuously supported by a stable court system and a large private publishing infrastructure. The result is a target-rich content environment for analytics and one that accommodates relatively simple user interfaces. Unfortunately, the same does not hold true for legal work not directly or completely circumscribed by court filings (and to a lesser extent, certain regulatory proceedings). Factor in all of the ancillary activity inside of the law firms related to this sometimes rich but more often impoverished external data, and you have the kind of complexity that can’t be untangled by analytics alone.

In my first post, I referred to Analytical KM as one of the major BigLaw KM waves and noted that the visible face of analytics is the lesser challenge for BigLaw KMers. Most of us already have or can acquire reasonably easily the skills necessary to configure and operate legal practice analytics tools like Ravel, law firm enterprise system analytic tools and generalist tools like Tableau or IBM Watson Analytics. It’s the backend stuff that comes before the analysis – the people, processes and tools required to cultivate and condition the internal law firm metadata – where the real struggle takes place. Until we get better and more complete metadata about a firm’s clients, matters, documents, timekeepers and activities, the promise of analytics in BigLaw will remain largely unfulfilled. While the cool tools like Watson race ahead and the data-adept BigLaw associates hanker to exercise their undergrad-acquired quant skills, someone first needs to plow the fields, pull the weeds and harvest the data. Who better to do it in BigLaw than KM?

(The next post will look at one of the biggest of BigLaw’s metadata challenges. Stay tuned!)