Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Four Areas of Legal Ripe for Disruption by Smart Startups (lawtechnologytoday.org)
88 points by krambs on Dec 16, 2014 | hide | past | favorite | 67 comments


The article mentions some interesting things, but I have a few quibbles:

> But even with today’s modern communication tools, both customer experience and lawyer workflow have remained stagnant.

At a large firm, legal practice is unrecognizable compared to even 10-15 years ago. Everything is electronic: filing and docketing, document collection/scanning/OCR, legal research, document management (DMS + version control). Everyone communicates almost exclusively via e-mail, and remote work facilities are ubiquitous.

To the extent that technology is available that's not getting adopted, it's because it's not good enough. Predictive coding can be very helpful, but it also has a fixed setup and training overhead that makes it less efficient for smaller matters. That's why arguably the biggest shift in discovery in the last 15 years hasn't been to automating it, but outsourcing it to contract lawyers.

In the area of research, Westlaw and Lexis still rule because of their completeness and accuracy. If I need a copy of some statute enacted in 1873 I can not only find it, but I can get original scans so I can verify the text is free of OCR errors.

Moreover, things that are easy on the rest of the web are not easy when it comes to legal (or scientific) research. PageRank, for example, works great when everyone searching for "skiing near Tahoe" is looking for the same popular pages. But when you're doing legal research, a lower-court case that directly addresses your issue but isn't widely cited is much more valuable than a highly-cited Supreme Court case that doesn't address your issue. And computers still don't really understand either what issue you're looking for or what issue a case is about. So ancient technology (search for this word near that word) still rules the day.

> There are good reasons for this, as law firms tend to be cost agonistic (since they pass costs directly to their client)

This is oft-stated, but economically fallacious. Price is a function of supply and demand. The client cares about total cost for a particular legal service; she doesn't care about how that cost is broken down. If the client's budget for a matter is $300,000, every dollar that goes to costs is a dollar that doesn't go to the law firm. This is true even if you're billing by the hour, because in the long run, a firm will raise rates until hours x rate = client budget.


> PageRank, for example, works great when everyone searching for "skiing near Tahoe" is looking for the same popular pages. But when you're doing legal research, a lower-court case that directly addresses your issue but isn't widely cited is much more valuable than a highly-cited Supreme Court case that doesn't address your issue.

This is true where you are doing fine-grained research, but there are plenty of lawyers who are constantly delving into new areas of law that are adjacent to or tangential to their primary area of interest or practice. When this happens, it can be phenomenally useful to get up to speed on the area of law by seeing which decisions have the highest PageRank, or PageRank weighted by certain factors, etc.

In this way, one way to think about the startup that implements a PageRank algorithm is that it is not disrupting electronic legal research, but rather disrupting legal textbooks. In my experience, the fastest way to get abreast of a new field is to find the leading textbook. One could imagine a sophisticated database which could altogether remove the need to consult a textbook by painting a picture of the leading cases, key pieces of legislation, which sections are most often referred by which cases in which context, etc.


> And computers still don't really understand either what issue you're looking for or what issue a case is about. So ancient technology (search for this word near that word) still rules the day.

I think this is the key. The big revolution in legal research that I'm waiting for is the ability for the computer to understand something like: all cases where a) Claim X was brought as a counterclaim and not as the original claim; b) counter-claimant made Y argument citing to case Z but not case W; d) requested relief was R; and e) court made its decision based on Factor F. Until then, "search for this word near that word (and boolean operators)" is extremely powerful and can get me most of the way pretty quickly.


FWIW: This is entirely solvable, it's just that the worlds search experts are not working on legal search, because it's not a large enough market :)


I wonder if pieces of the Watson API would work for something like this?


Can you recommend software packages for case management at a small firm? I just started working with a small public-interest law firm that uses Google docs to track their caseload, an Access database to track their customers, and Drupal to run their public site, which is basically a library of case summaries and briefs. It's a mess.

They're looking for a solution that will let better manage case and customer info internally, and push public info to the public site. The idea is, only enter information once.

Custom development is of course an option, but first I want to see if there are any good available products. Are there any good off-the-shelf or hosted law firm management software packages that provide an API? Thanks for any advice you can spare.


What's your document management story? If you're drafting/keeping track of versions in Google Docs,[1] then you should opt for a cloud-based solution that can integrate. Clio, RocketMatter, and MyCase are the big ones. They run $50/$70 per month per seat. Clio integrates with DropBox too. HoudiniESQ looks neat. Also has Google Docs integration. Self-hosted or cloud hosted. Pricing is pretty reasonable $1280 for ten licenses up front, then $192 per user per year.

[1] If they don't have a DMS story, they need one!


One lawyer I talk to says that every single filing he makes at court has to be paper. He bought a wheely suitcase to carry all this documentation around to court.

If everything is electronic, his firm and the judges they deal with didn't get the message...


This is jurisdiction-dependent.

I work for a very large company with three dozen law offices across the US. Listening to what is considered "normal" in other offices is always interesting. I am, unfortunately, in an office that practices in courts that are paper-dependent, even though there are rules that allow for electronic filing and service in many circumstances, all it takes is one litigant (or the judge!) to say "eh, I'm not comfortable with this, we're going to do this the old fashioned way" and we're done.


I recently went through dealing with some standard legal agreements for a film project. I never met our lawyer in person, it was all telephone and email.

But when it came to signing documents, that had to be done on paper. And notarized, which was a simple trip to the UPS store. And the originals mailed around, which was a hassle. The parties involved don't even all live in the same hemisphere, so there were some annoying shipping times.


I think at bigger firms ( the kind that can actually afford in house IT) they have runners, paralegals, various secretaries and assistants of sorts, etc to deal with that. Lawyers and IT types don't deal with mountains of paperwork. I do know of a solo practitioner who deals with tons and tons (literally) of paperwork. I have a feeling people are projecting their big(ish) law experiences on the profession as a whole.


In some courts there is a requirement to both file an electronic copy on the docket and provide paper courtesy copies to the Judge's chambers. Some things are easier to work with when they are printed, and the courts put the onus to provide paper copies on the litigants.


While it is very common to bring printed docs to court, it was probably the last step of the process where he printed everything. Everything before that was probably electronic and even if he received printed docs, those would be scanned and OCR'ed.


This really is jurisdiction dependent. At larger law firms that have to deal with paper courts still have runners that handle the docket and other details of filing (assuming the court allows it). At small firms this is the lawyer's responsibility.


No, it's the courts that still require documents to be submitted in hard-copy. Update: rapidlegal.com is one such service.


Law firms may be electronic, but sadly some courts are not.


Or, software is maybe too good, but law firms that bill by the hour are not interested in efficiency improvements because more hours means more money for them. Fortunately, with fixed fees becoming more popular this is changing.


Four Areas of Legal Ripe for Automation

Automating these aspects of legal practice wouldn't disrupt the way that industry functions. It would just make a lot of paralegals and young lawyers obsolete (and make legal services a lot cheaper).


Thank you for pointing this out. I get annoyed when people use the term "disrupt" improperly. A true disruption, in the sense that Clayton Christensen used it, would be a cheaper but poorer AI replacement for lawyers.


I currently write software for an e-discovery company. Most tasks that our software is expected to be able to perform are simple-sounding tasks, at first glance, such as ...

1. extracting documents from within other documents (attachments out of an email, files out of a zip, embedded excels out of a word doc, images out of a powerpoint, etc)

2. convert all said documents to some kind of standard media format so that the native viewing applications are not needed (all said document types to png, or pdf, or tif)

3. allow full-text searching across all electronic files

With these kinds of tasks available as an automated feature, the real product would just allow a bunch of attorneys to review the documents and apply tags or labels to them. Once they've gone through all the documents, there is generally an output from the system that summarizes their work and provides the relevant documents, notes, etc.

Over the years of writing this kind of software, we've encountered a never-ending amount of complicates with file types, feature requests, etc. The real complexities with this kind of software is making your software work for a large number of customers. Every customer probably has a different idea about what they want this kind of tool to do for them.


The other issue is the absolutely bonkers number of attorney-hours racked up in reading all of those documents once they've been systematized.

That's where I see potential in this market. Ediscovery is a pain-point for law firm clients - especially large corporate clients who are constantly involved in complex litigation. Document review has to happen in order to effectively litigate (gotta find the smoking gun!) but when a bill comes through with hundreds or thousands of attorney-hours devoted to reading your opponent's old emails, ouch. The client hasn't even seen a work product yet.


The other thing is that you spend 10+X the initial effort of the feature for error handling. I sometimes envy all these "Internet" programmers who only have to deal with the web and don't have to worry about esoterica like, e.g., \x80 being the space character in old WordPerfect files.


Feature requests in e-discovery bloat your original software out of all proportion. Nevermind getting past the original part of effectively searching large troves of data in different formats.


Amen


I am quite interested to know more about Judicata. It was cofounded by Blake Masters (coauthor of Zero to One). Anyone have any info on it?


Last I spoke with the gang---unfortunately, some time ago---they were working on legal search. Great group of people. They've been stealthy of late.

FYI, Blake also blogged the notes on Peter Thiel's start-up lectures at Stanford.


> Blake also blogged the notes on Peter Thiel's start-up lectures at Stanford.

Which became source material for Zero to One.


As somebody doing a very similar thing to Judicata elsewhere in the world, I have looked into Judicata quite a bit. Although they are in stealth mode, there is some information out there and I have been able to infer a bit about what they are doing. Essentially, they are seeking to use NLP to retrieve certain information from legal texts. The primary piece of information they seek to retrieve is the legal claim, and certain elements surrounding it (e.g. 'breach of s X of Y statute'). They also seek to retrieve a number of other bits of information from case law using NLP.

In line with Peter Thiel / Palantir's philosophy that the human brain is an amazing machine not to be supplanted by computers, but one that should be used to its fullest, augmented by computers, Judicata's software involves using NLP as much as possible, then feeding or 'striating' that information to lawyers or legally trained people, depending on the complexity of the information extracted, for their confirmation. This is in any case necessary because NLP cannot get close to 100% accuracy for the information they are trying to extract, and you need 100% accuracy in the legal domain (e.g. it would be unacceptable to get the legal claim wrong, c.f. Google search).

One consequence of structured legal texts is improved search. What many don't realise is the degree with which structured search on legal texts will improve legal research. E.g., if I want to find all cases in the last 10 years where the plaintiff claimed breach of duty in an occupiers' liability suit, I simply cannot. To find that batch of cases (accurately) would take me hours. If the legal claim was a structured piece of information, I could just search for it. As an ex-lawyer and ex-legal researcher, the number of hours that could be saved per lawyer per year could easily be in the hundreds, and this is at charge out rates of $300-$1k per hour. This is, similarly to the above comments, in line with Thiel's investment thesis to 'improve something 10-fold' or 'make a quantum advance to cause adoption / change consumer behavior'. I think most people seriously underestimate how significant of an improvement structured search would be.

The other thing that Judicata are flying under the radar about, a little bit, is the ability to use structured legal information for other purposes. High on the list is analytics, which Itai Gurari mentioned at the end of a talk, but merely in passing as if it was inconsequential. I think this is pretty clearly a multi-billion dollar market waiting to be made. If you look at what similar firms are doing in niche areas of law, e.g. Lex Machina, and look at what they are charging, and extrapolate the types of questions you can answer with structured legal information, the potential becomes clear. Again, this is in line with Thiel's investment thesis to 'create a market a dominate it, rather than compete in an existing one'.

The primary difficulty for Judicata or somebody undertaking to do the same thing is that the task is mammoth in just about every respect. As such the optimum strategy is likely to attack a niche jurisdiction and then build out the product. You can't go 'full-lean', because you need at least a semi-complete data set, but you can start 'small'. Hence, Judicata have been working on a niche jurisdiction of law as their first project: Californian Employment Law. While I am not in that jurisdiction (not even in America), that seems to me to be a very reasonable area of law to start with given that most legal claims (I think) are found in California's employment law statute (as opposed to other areas where the legal claims are found in Judge-made common law). Furthermore, there are a ton of neat pieces of information in employment law which you can structure, e.g. in a discrimination case, what factor was the plaintiff allegedly discriminated on - race, age, gender, etc. Finally, uptake would be high among employment lawyers who research at reasonably frequent intervals and have a practical need for more accurate search; compare this to constitutional law for example.

While part of the reason they are operating in stealth mode is simply because it takes so long to build up a semi-complete data set, I think the other part of the reason is because the biggest risk for such a firm is that Lexis, West or Bloomberg will start doing something similar. Imho, it's likely they will eventually but the risk of Judicata catalyzing that process is pretty small.

There are a few other firms operating in this space but with fundamentally different philosophies. My view is that these other firms are simply taking the wrong approach and simply want to release a product and build on it now in the lean tradition. Judicata's product is the type of product where the question is not whether there will be adoption, but rather whether or not you can actually build the product on your budget and in the time frame required. Imho, if Judicata can successfully create what they are planning on, it will flatten their competition. The real question is whether they can.


This paralegal I know told me 2 years ago "I wish someone would automate discovery because it sucks right now". I wish I knew absolutely anything about it.


I know some about it. As it stands right now, law firms need to find, trust, and pay forensic investigators. The firm has to leave some data collection up to the investigator, and then have some data turned over. By data, I mean hard drives, images of hard drives, images of network shares, email exchanges, etc.

The law firm has to pay for the data collection, the disk space to store the data, the transmission of data back and forth (investigator found something good, buys external HD, ships it via the mail), the data analysis, the investigation and reporting, and then sometimes the expert witness.

Sometimes, the law firm does not know what they're looking for, in other words, there is no smoking gun piece of data. Sometimes, the goal is to find something, anything, that would hint, point, or prove a goal.

What this means is that a retainer can either be here is 10hours worth of analysis/investigation to find the email we know was sent that contains this particular text. They do not plan on the analysis and investigation to exceed that and usually the result is we found it/didn't find it and we did it in the time allotted or under the time.

It can also be, "we're looking for evidence that this type of event has occurred". This is where the billed-hours start stacking up. Its hard enough digging through other people's emails, documents, and pictures looking for something, let alone digging without having something in particular to look for.

The point is, I believe that they, the law firms, want to, and need to, bring this in house. This greatly improves the process. But now they need security, real security, because its not their data being stored, it is their client's client's data and so forth. It becomes very sensitive.

They need infrastructure. They need to be able to forensically acquire data. Forensically store data. Forensically analyse data. Forensically share data. The infrastructure needs to be fast, easy, and effecient. We're seeing 6TB hard drives now... shares of much, much larger size. And they do not want to be storing their client's data in someone else's cloud.

Then they either need technicians and investigators or the ability to hire and grant access to their data on their network to the tech/investigator. They need technicians to provide solutions to the inevitable problems run into (i.e. how can I acquire each of the 2 drives in this FusionDrive raid and return to the lab and build the raid on something other than osX?) And they need investigators experienced at honing in on relevant data while digging through vast troves of data.


I'm a computer forensics and eDiscovery guy. The forensics part of eDiscovery is really overblown. It's the bogeyman, and not much more.


There are these guys:

http://logikcull.com/

(And they're welcome.)


And they're thankful.


Are you they? (They host the local Ruby Users Group, too. :)


Yes I am they....re CEO =)


This is called eDiscovery, and people have been working on it for a good while, certainly longer than two years (I worked at a startup in this space, 2009-2010).


It's interesting to note the first book on eDiscovery came out in 2004. Didn't sell many copies as you can imagine, and was pretty ahead of its time.

Pretty shocking when you think the first book on digital evidence didn't come out until 2004. You want a legal field ripe for disruption? It's definitely the eDiscovery space.

Here's the book I was referring to:

http://legalsolutions.thomsonreuters.com/law-products/Treati...


No mention of Smart Contracts?


Doesn't look like it. Perhaps you'd like to fill us in on it? If my guess of what it sounds like is anything what it actually is, then I'd be really curious about knowing more.


You can use a bitcoin-like system to build an agent that holds money in escrow and then automatically delivers it when a condition is met[1]. This is the simplest form of a "Smart Contract."

Most contracts are credible because their terms can be enforced by sanctioned violence. "Smart Contracts" are credible because they are enforced by distributed verifiable cryptographically-secure automation.

[1] For example, when the DMV database confirms that a certain car is now registered in my name, you automatically get 4 bitcoin.


[deleted]


You're right in that smart contracts won't help with the qualitative aspects of our legal system but they will still help to organize the paper trail and will definitely improve how fuzzy conditions and exceptions are managed.

Smart contract systems are based on cryptographic signatures and chains of transactions (Merkle Directed Acyclic Graph). Instead of contracts being spread across many different databases with many different kinds of authentication mechanisms, they would all be in one format and they would all link to each other in a provable way. This could be an entirely centralized system and still have the same benefits.

However, using a Bitcoin-like system with decentralized trust removes even more of the legal overhead as many contracts can be entered in to and settled without the participants even needing to know who they're dealing with.

At the very least we should at least be moving to a Merkle DAG based system of contracts, centralized or not!


What happens if the DMV makes a mistake and incorrectly changes the car registration to your name? Do I still get to keep those 4 bitcoins forever?


This list isn't actually about disruption. It's about automating the stasis quo. Smart contrasts would be genuinely disruptive.


ethereum?


I hear http://thoughtly.co moved into e-discovery for visualization and summarization. Are there any other machine learning startups in the space right now?


There's a long history of search/knowledge management/AI startups recognizing that eDiscovery was their last, best hope for revenue. Recommind is one example (founder invented latent semantic indexing), and Autonomy (yknow, the folks who cratered HP) is another.


I'd be interested in seeing an IDE for contract drafting. Or something like an Excel mapper that can show me how all the provisions and definitions in a contract are interrelated.


More lawyers would first have to understand that Microsoft Word is a regression from plain text first!

Contracts are code. Many of the concepts are similar (defined terms / includes / gotos) so the tools don't need to be changed that much. The challenge is getting others to adopt them.


Contract physics simulator would be a huge boon. I know smart contracts are supposedly to take us some small steps along that path, but the real value is in being able to visualize and perform such mappings in existing contracts.


Or perhaps a Github repo with code to generate the most frequently used legal documents.


Definitely interesting, and in a few cases I have built this internally for myself before and it generally is useful (key phrase) when it can be built. I did this primarily to keep track of (1) defined terms (each place a defined term is used, and when a defined term includes another defined term), (2) section references (where any section is referred to elsewhere in the document so that if that section changes, what other things need to be impacted) and (3) payment mechanics (a spreadsheet that uses the defined terms of payment mechanics and allows you to plug in real numbers to see how the money flows). This is helpful to understand a specific concept or mechanic.

Usually though, the moment you move into a transaction of even medium complexity, while this might be helpful it can't be ultimately relied on - if you were to track a defined term or a section reference, a change to that term or section reference would impact not only the places where that term or section reference is specifically used, but also where a concept depends on such term or section reference. A basic example of the change of an actor from singular to plural (originally there was one purchaser, now there are two purchasers) - then, every pronoun and verb would need to be changed to plural form. This is why you sometimes find contracts wonkily sticking with a plural defined term when really there is just one entity/person described by the term, or vice versa.

This mostly points to what I always say about the challenges to true legal disruption: common law, statutes and contracts all depend on language subject to interpretation (and in the case of common law, interpretation IS the entire name of the game - see every supreme court case of the past 30 years to see how much interpretation varies), and unless I am missing some major breakthrough, we have not come to a point where language is understood systematically enough to truly "hack" complex legal concepts as currently drafted (aka. using language).

I could, however, imagine a new legal system that depended entirely on data, numbers and systems instead of historical language, but it would first require us to all agree that the system would govern and agree on the rules (or lack thereof) of interpretation, which given the vested interests most of society has in the current system, would be pretty challenging to implement.

That being said, you can see how this spectrum works by comparing the common law system (like US/UK, which relies heavily on interpretation of judicial opinions) and the civil law system (France, Korea, etc., which relies heavily on a more formulaic interpretation of statutes) - way more costly litigation and more politics in common law systems when compared to civil law systems. The trade off is that common law is (arguably) more dynamic (judges can overturn statutes unilaterally - civil rights, etc.), whereas civil law systems require legislatures to make changes to statutes.

Full disclosure: I used to be a lawyer, so I am biased.


I'm intrigued by your idea of a new legal system dependent on data.


I think an interesting place to start would be trying to break the UCC down into a numbers/data based system.

Similarly, it would be interesting to try and take a relatively conventional transaction (incorporation documents, convertible notes, Series A) and place the key economic terms in a spreadsheet, and then incorporate the other legal boilerplate (indemnification, securities exemptions, etc.) by reference to a standard T&Cs-type document - I know some incubators already do this by having the key terms as a "fill-in-the-blank field (I think AngelPad does this..). The agreement would then just be a spreadsheet-like form, with the other terms incorporated by reference.


eDiscovery is not ripe for disruption. That ship has sailed.

eDiscovery has largely been solved for most corporate environments. There are tools to collect data in a defensible manner, to "process" (i.e., index) it, and to review it. There are even some products that aggregate these functions together, however, it must be well-noted that each of these functions has a different user/customer and occurs at a different timeframe in the discovery process.

Many of the dominant tools do have their warts. But the money that was once in this space--the eDiscovery collection product I wrote sold for a couple million to its first customer--is no longer there. Prices have dropped dramatically and its now a commoditized market. So you'd have to work very hard for very little gain to displace any of the dominant players.

Note that TFA was written by investors in a new eDiscovery startup and TFA seems mostly like latent marketing for them. I don't know anything about them--good luck and all that--but I'm very familiar with the space and I don't envy them.


It's hard to get excited about software for lawyers, and I think that's why Disco has flown under the radar a bit, but I think these guys are going to be huge. They've made exponential improvements in e-discovery software.


Having worked in e-discovery for many years, "exponential improvements" is a huge overstatement. What they have is in pretty much any e-discovery product on the market.

And they are missing a huge piece--predictive coding and advanced analytics (email threading and near-dup are EXTREMELY common). If you are in NLP, ML and/or IR, the legal industry is probably one of the most exciting places to be. Huge datasets, available annotators and tons of money. It's a red-hot lab of state of the art techniques being tried in the real world instead of on the Reuters, 20-newsgroups, Enron, and other "canned" datasets.


The other thing is that lawyers don't get re-trained. Relativity has won the current generation of lawyers, and Relativity is a good product (unlike the products it replaced, Concordance and Summation). It will not be easy to displace.


On the research end of things, NIST's TREC has a legal track, and that's probably the best place to look for what's happening in the "applied research" space of the field.


TREC HAD a legal track. It's a shame it hasn't been done in 3 years.


Do you know of any companies doing that in the Seattle area?


Interesting. What type of advanced analytics do you mean?


Any and everything to do with text analytics so that lawyers can find relevant documents faster. Concept searching (LSI, LDA, other topic models), query expansion, clustering, network analysis (emails), etc.


I suppose you could say made exponential improvements if you take a very small exponent. What they have done is to make a nice interface and apparently fast search.

One concern that I have is this "Secure Infrastructure--no installation required". This suggests the cloud. The other is "Upload your data by FTP".

Think about e-discovery for a moment. The toxic material stored is pretty much a superset of all known toxic material--PCI, PII, HIPA, and so on. Many customers require physical control of where this information goes. Zero installation means that it is not a server under your control.

There are many pieces to the EDRM process (http://www.edrm.net/resources/edrm-stages-explained) and it is not clear that DISCO covers all of them. 'tucros3141 in a parallel comment mentions some of them.


Your definition of exponential needs-sum-splaining.


PlainSite (http://www.plainsite.org), which I run, is tackling the research end of things.


How many people are going to be put out of work by this.


Still seems quite a challenging disruption here. For one, you'd need to know some aspect of the lawyer's daily job and two you'd have to know how to sell to lawyers. The thing that scares me most is that these people also hold a trigger to suing the crap out of you because they can. It's exciting and scary at the same time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: