My first three posts on this Substack addressed AI and the legal profession. In those posts, I expressed optimism that AI could improve the speed and accuracy of both judging and lawyering.
In a recent, widely-reported event, a lawyer in New York submitted a filing with citations to non-existent cases as a result of his blind reliance on ChatGPT, resulting in a sanctions hearing. In the wake of that incident, Judge Brantley Starr of the Northern District of Texas now requires lawyers to certify that “no portion of any filing in this case will be drafted by generative artificial intelligence or that any language drafted by generative artificial intelligence—including quotations, citations, paraphrased assertions, and legal analysis—will be checked for accuracy, using print reporters or traditional legal databases, by a human being before it is submitted to the Court.” Magistrate Judge Gabriel Fuentes of the Northern District of Illinois now requires that “[a]ny party using any generative AI tool to conduct legal research or to draft documents for filing with the Court must disclose in the filing that AI was used, with the disclosure including the specific AI tool and the manner in which it was used.” Judge Stephen Vaden of the Court of International Trade now requires a similar disclosure as well as a “certification that the use of such program has not resulted in the disclosure of any confidential or business proprietary information to any unauthorized party.” Standing orders from other judges will undoubtedly soon follow.
(If Van Gogh desired to depict computers in the legal profession, Dall-E says this is what he would paint.)
In light of these developments, I’d like to offer a few thoughts on the regulation of AI in the legal profession. My basic conclusion is that restrictions on the use of AI, at least right now, are unwise.
What exactly are we worried about?
Why would judges (or legislatures, or rules committees, or state bar associations) want to regulate lawyers’ use of AI? I can think of at least seven reasons:
AI will make mistakes.
AI lacks the discernment of humans.
AI will lie.
AI creates privacy concerns.
AI will be too good, and displace humans, yielding unpredictable but possibly bad outcomes.
It is desirable, for sentimental reasons, for humans to do certain things, even if AI is better at it.
AIs’ incentives are not aligned with human incentives, leading to existential risk.
As I’ll explain below, I don’t think any of these arguments provide a strong basis for restricting the use of AI by lawyers, except for possibly #7.
I’m afraid I can, actually, do that.
Let’s start with the first point: AI will make mistakes. ChatGPT made many mistakes in the New York case—it hallucinated non-existent case law—which is the concern that prompted Judge Starr and Judge Fuentes to issue their standing orders.
Judge Starr’s order states that parties “will be held responsible under Rule 11 for the contents of any filing that they sign and submit to the Court, regardless of whether generative artificial intelligence drafted any portion of that filing.” Judge Fuentes’s order includes similar language. I emphatically agree with these portions of the orders. Indeed, Rule 11 already works that way. If a partner at a law firm signs a filing that contains an error, the partner is responsible for the error. The partner cannot pin blame on an associate or paralegal. And, by the same token, the partner should not be able to blame AI.
The question is whether ChatGPT’s propensity to hallucinate justifies additional regulation of lawyers’ use of AI, beyond the existing requirements of Rule 11. No, in my view.
While the episode in New York is funny, it does not suggest that additional regulation of AI is needed. Obviously one shouldn’t rely on ChatGPT to do legal research. ChatGPT isn’t trained on caselaw; there is zero reason to expect it to answer legal questions correctly. One would have to be a completely inept lawyer to unthinkingly rely on the output of ChatGPT, and indeed, the sanctions hearing revealed the lawyer to be completely inept. The legal system has well-worn tools to deal with bad lawyers—sanctions hearings and public shaming.
This problem is not likely to recur frequently. First, legal AI tools that are trained on case law already exist. While these tools are not perfect, they do not hallucinate cases, and their output is passable. Indeed, currently-existing software produces legal writing that is likely far better than the legal writing that the New York lawyer, trying his absolute hardest, could produce. The most ethical thing the lawyer could do for his clients is to stop writing briefs by himself and instead use currently existing AI tools that aren’t ChatGPT.
Moreover, AI tools continue to improve rapidly. GPT-4 easily passes the bar exam, scoring at either the 90th percentile or the 68th percentile of test-takers, depending on when it takes the exam. GPT-4 also gets an A in Bryan Caplan’s Labor Economics midterm exam and a B in Scott Aaronson’s Quantum Computing final exam. Of note, GPT-4 does way, way better than GPT-3.5 (which is what the free version of ChatGPT currently uses).
There is much stimulating debate on how much, and how quickly, AIs are going to improve. Some experts in the field think that within the next few decades, AIs will become vastly more intelligent than humans, in the same way that chess AIs are now vastly better than the best human chess players. Others think that AIs will plateau. Still, even under pessimistic trajectories, there will continue to be significant short-term improvements in AIs’ capabilities. What are the odds that, within a decade, an AI will score in the 99th percentile on the bar exam and consistently score A+ answers on law school exams? I’d say 70%. On a 30-year timeline I’d say 90%.
As such, the problem encountered in the New York case will soon start to seem quaint. Even setting aside specialized legal AI tools, I suspect that even general-purpose AIs like ChatGPT will stop hallucinating cases a generation or two down the road. Any rule intended to protect against AIs’ hallucinations would long outlive its justification.
Judge Starr requires parties to certify that “any language drafted by generative artificial intelligence—including quotations, citations, paraphrased assertions, and legal analysis—will be checked for accuracy, using print reporters or traditional legal databases, by a human being before it is submitted to the Court.” On its face, this requirement looks unobjectionable, but in my view, it is misguided. Existing AI tools, trained on legal materials, can pull quotations and verify citations with virtually 100% accuracy. Requiring a human to verify the accuracy of a quotation provided by one of these tools is like requiring a human to use pen and paper to verify the accuracy of a computer’s multiplication of two numbers. Of course, some AIs produce inaccurate quotations, but any lawyers incompetent enough to rely on those AIs will not be deterred by a local rule barring them from doing so. Even as to “paraphrased assertions” and “legal analysis,” this requirement is premised on a concern that AIs are inherently unreliable, which I view as unfounded. Moreover, in the adversarial world of litigation, parties have an incentive to find their opponents’ errors, further protecting courts from being misled by AIs’ errors.
Judge Starr’s requirement also has costs. One of the chief advantages of AI is that it produces work product far more quickly and far cheaper than a human would. AI therefore brings the promise of speeding up litigation, saving money, and improving access to justice. These advantages lessen when parties are required to manually verify every single quotation and citation produced by an AI designed to produce accurate quotations and citations. In my view, courts should watchfully observe how lawyers’ use of AI plays out, while enforcing Rule 11 as written in appropriate cases, rather than strangling AI tools in their crib.
Should courts or bar associations ban or restrict parties from using AIs unless the AIs pass some kind of test? I could live with a requirement that AIs be required to pass the bar exam before engaging in substantive legal tasks. GPT-4 already passes the bar exam, so this would be unobtrusive. But I would object to additional, AI-specific competency testing. Who would draft such a test? What would be in it? I cannot begin to imagine the answers to these questions.
There would be no reason for an AI-specific test unless AI was unreliable in some way that the bar exam couldn’t root out. I cannot understand how committees of lawyers could achieve any kind of reasoned consensus on what that unreliability is or how it could be tested.
ChatGPT has never seen a beautiful sunset.
The second point is that AI lacks the discretion of humans. This argument posits that human lawyers have some ineffable quality that makes them better than AI lawyers. Perhaps a sufficiently well-trained AI can accurately cite caselaw and produce facially plausible legal answers. But AI cannot detect subtle fault lines, make delicate strategic judgments, or do all the other things that make the practice of law so stimulating.
I am skeptical this is true, and even more skeptical that this is a basis for regulating AI. First, there are as many ways of practicing law as there are lawyers. Some lawyers are bombastic, others are coy. Some stand their ground when faced with hostile questioning, others back off. A human’s disagreement with an AI’s strategic decision does not establish that the AI is incompetent or should be regulated. I don’t see any systematic reason to question AIs’ competence in making strategic judgments.
Second, we have to think at the margin here. What will the AI be replacing? Sophisticated, wealthy clients will continue to hire highly skilled lawyers who will incorporate AI into their practice. This will improve the quality of their work: a highly skilled lawyer who is able to use AI wisely will produce better work than a highly skilled lawyer who is stuck with traditional tools. Meanwhile, less-skilled lawyers may rely on AI more heavily, but those are the types of lawyers who may benefit more from an AI’s strategic advice. I don’t see how, in either scenario, AI will make strategic decisionmaking worse.
Third, and most fundamentally, I think this reasoning reflects a kind of sentimentality that is not empirically grounded. Why should we assume that humans are inherently better at writing legal briefs than AIs? For years, people said the same thing about chess—chess engines might have massive opening libraries but they couldn’t make subtle mid-game tactical judgments, which require human intuitions. Then came along AlphaZero, which, merely by playing chess with itself over and over again for four hours, became the best chess player in the history of Earth. Oh.
Relatedly, I would not require that parties disclose whether or how they use AI in the preparation of their briefs. Judges generally do not require disclosure of the manner in which briefs are prepared, and for good reason. Judges are required to put aside extraneous considerations and decide disputes based on the legal arguments presented to them. The manner in which a brief was prepared—what materials the lawyers reviewed, how many drafts they went through, how many reviewers offered redlines, and so forth—is an extraneous consideration. What matters is the legal analysis in the final product.
These principles shouldn’t change when lawyers use AI. Disclosure requirements imply that a party’s use of AI in drafting a brief is relevant to the court’s analysis of the arguments in the briefs. If this information wasn’t relevant, why require it to be disclosed? Yet the quality of an argument doesn’t turn on whether an AI was used to generate it. The judicial oath requires judges to attest that they “will administer justice without respect to persons, and do equal right to the poor and to the rich.” This means that they are supposed to evaluate arguments based on the quality of the argument, rather than the identity of the person making the argument. AI isn’t a person, but the same principle holds. If an argument is persuasive, it shouldn’t be any less persuasive because an AI came up with it.
Also, these requirements will artificially deter the use of AI. The orders do not specifically state how the judges will account for the fact that a lawyer used AI. Implicitly, however, these orders are saying that the judges will be more skeptical of AI-assisted briefs. Lawyers will know that they are better off not using AI, even if using AI would save money for their clients. This is a poor outcome, in my view.
AI: Programming rather than principle.
Point #3 is that AI will lie. Or, as Judge Starr’s order puts it: “While attorneys swear an oath to set aside their personal prejudices, biases, and beliefs to faithfully uphold the law and represent their clients, generative artificial intelligence is the product of programming devised by humans who did not have to swear such an oath. As such, these systems hold no allegiance to any client, the rule of law, or the laws and Constitution of the United States (or, as addressed above, the truth). Unbound by any sense of duty, honor, or justice, such programs act according to computer code rather than conviction, based on programming rather than principle.”
True enough, neither programmers nor AIs take the oath that lawyers take. On the other hand, AIs are trained on legal texts written by judges who do take oaths. If the judges who write the decisions are ethical, then why would the AI mimic the output of an unethical lawyer?
Indeed, I suspect that on average, AI will be less likely to lie than humans. Lawyers may have personal or professional incentives to misrepresent facts or law in a brief. A programmer, lacking an incentive to win any particular case and building an AI that can be used by both plaintiffs and defendants, would have no incentive to build an AI that lies.
But this is all highly speculative. Why not simply observe the AI’s output empirically rather than make assumptions about the AI’s ethics? If, on average, the use of AI improves the accuracy of case and record citations, there is no reason to disfavor AI because its programmers do not take an oath.
AI never forgets.
Judge Vaden’s standing order expresses concern about Point #4—privacy concerns. Judge Vaden observed: “Users having ‘conversations’ with these programs may include confidential information in their prompts, which in turn may result in the corporate owner of the program retaining access to the confidential information. Although the owners of generative artificial intelligence programs may make representations that they do not retain information supplied by users, their programs ‘learn’ from every user conversation and cannot distinguish which conversations may contain confidential information.”
This is a solvable problem. My understanding is that certain legal-AI platforms offer the option of “forgetting” certain inputs so that they do not train the model. If I am wrong about this, it would be trivially easy to program this functionality.
I don’t think standing orders are necessary to solve the privacy problem. In negotiating protective orders, parties can agree that, if confidential information is ever given to the AI, the parties will use AI with heightened privacy features. If this functionality is not available, the protective order can bar the parties from giving confidential information to the AI. There is a risk that parties will violate the protective order, but that is always true.
It’s hard to make predictions, especially about the future.
Point #5 is that AI will be too good, and displace humans, yielding unpredictable but possibly bad outcomes.
Let’s say that AI continues to improve rapidly, and GPT-6 gets a perfect score on the bar exam and gets A+ on law school exams. Clients might find that GPT-6 writes better briefs in five seconds than a team of humans could write in 50 hours, and so they would rationally turn to GPT-6 to write their briefs.
In some ways, this is great. If a brief that previously cost $100,000 now costs $10, that is good. Really good. Cold fusion good. I understand that some people might find themselves out of a job, but all significant new technologies have that effect.
If a lawsuit that previously took three years could be expedited into three months, that is also good. Litigation delay is horrible. Faster trials are both more accurate and bring closure more quickly.
Of course, people will react to the increased speed and efficiency of litigation by doing … different things from what they are doing now. What exactly will happen? And will the things that happen, whatever they are, be good or bad?
These are hard questions to answer because (a) it’s extremely hard to predict what happens next, in view of AI’s cross-cutting effects, and (b) given widespread disagreement on the merits and demerits of the status quo, it’s equally hard to assess whether these hypothetical, as-yet-unknown changes will be good or bad.
To give an example … will this hypothetically awesome GPT-6 increase or decrease the amount of litigation? I can think of at at least five reasons it would increase litigation:
Lawyers will get cheaper, so plaintiffs will hire more lawyers to bring more cases.
People will bring more pro se cases, knowing that GPT-6 gives them a fair shot of winning.
Lawyers will make fewer procedural mistakes, increasing the amount of merits litigation.
GPT-6 could make it easier and cheaper for plaintiffs to conduct massive fishing expeditions for evidence in large bodies of documents. Courts would be more willing to allow such fishing expeditions because they wouldn’t be burdensome; the AI would do the work. This will incentivize plaintiffs to bring more speculative lawsuits.
GPT-6 will redirect lawyers who would otherwise be grinding through documents or researching caselaw into more entrepreneurial activity of finding lawsuits to file.
I can think of at at least five reasons it would decrease litigation:
Litigation will become more efficient as AIs handle briefing and discovery. As such, cases that previously took three years to resolve will take six months to resolve, decreasing the total amount of time and money spent on litigation.
The increased efficiency of litigation will discourage litigants from grinding away at hopeless cases. Many litigants, both plaintiffs and defendants, litigate hopeless cases knowing that the delays of litigation will induce the counter-party to settle. If AIs cause lawsuits to speed up, they won’t bother.
If both parties have access to AIs that can quickly and confidentially advise them as to the strength of their case, they will be able to settle their case more quickly.
To save money, parties may agree to arbitrate cases with the assistance of AI.
The availability of AIs will make the legal profession less attractive and remunerative, lowering the supply of lawyers and hence reducing the amount of litigation.
I tend to be an optimist (as my first three posts make clear) but there is a high degree of uncertainty.
Perhaps most importantly, this list doesn’t address the effect of GPT-6 on the types of disputes that arise in the first place. Will GPT-6 increase or decrease the number of crimes? Discrimination claims? Who knows?
Even if we could figure what was going to happen, we wouldn’t know what to do with that information. Let’s suppose we could consult an oracle that could predict with 100% certainty that GPT-6 will increase the number of lawsuits by 30%. Is this good or bad? Depends on who you ask. Plaintiff’s lawyers will celebrate increased access to justice, defense lawyers will bemoan the increase in burdensome lawsuits.
To sum up, it is impossible to predict how GPT-6 will affect the legal system and it is impossible to predict whether these changes will be good or bad. As such, I don’t think we should enact rules right now that depend on these very uncertain predictions. Given AI’s promise in increasing efficiency and reducing cost, I think we should generally have an optimistic attitude and encourage (or at least not deter) the use of AI. If unexpected bad outcomes occur, we can course-correct. But we should not impose prophylactic rules deterring or restricting AI based on speculative concerns about an uncertain future.
It’s beautiful when people draft punchy intros.
Point #6 posits that it is desirable, for sentimental reasons, for humans to do certain things, even if AI is better at it.
About certain things, I agree. I suspect that most people would have the intuition that juries should be composed of humans. Even if it could be shown empirically that AI, on average, is more likely to reach correct decisions than human jurors, most people would be squeamish about having a person’s liberty or livelihood turn on what an AI spits out of an algorithm. I have that intuition too. Why? Because, well, it feels like the kind of thing people should do. For similar reasons, I suspect most people would prefer, e.g., human clergy and psychotherapists, even if AI could produce excellent sermons and offer excellent mental health advice.
But lawyering? Are there really important aspects of the job that just intrinsically “should” be performed by people, even if AI was demonstrably better at those tasks?
Maybe I’m just an unsentimental person, but no. In my experience, clients are extremely unsentimental about lawyers. Clients hire lawyers because they have no other choice. If there was a way to shift a task from a lawyer to a non-lawyer without decreasing the quality of the work, clients would do so in a nanosecond, with zero emotional injury. I have pleasant relationships with many clients, but it is always implicit (often explicit) in the relationship that the clients would much rather be spending their money on something else. Good lawyers understand that and govern themselves accordingly.
Of course, incumbent lawyers, who enact professional rules of conduct, are particularly likely to have a sentimental attachment to ensuring lawyers keep doing things that could be done more efficiently by AIs. I am not the first to observe this conflict of interest.
At a minimum, if we are going to reserve certain tasks to lawyers because lawyers “should” do them, we should define these categories of tasks clearly and narrowly. Words such as “inherent,” “bedrock,” and “fundamental” should be avoided.
Are we doomed?
That brings me to Point #7: existential risk.
The Center for AI Safety recently came out with the following statement: “Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.” The statement is accompanied by a spectacular list of signatories, headed by iconic figures such as Geoffrey Hinton, Yoshua Bengio, Demis Hassabis, Sam Altman, and Dario Amodei.
Opinions differ on how this extinction might come about. One line of thinking is that AI creates extinction risk based on mechanisms we currently understand—blind reliance on faulty algorithms, deepfakes, election manipulation, and so forth. Another view, the Eliezer Yudkowsky view, is that AIs will become much smarter than humans and will have misaligned incentives, inevitably leading to doom. There is much disagreement on this issue—Yudkowsky did sign the Center for AI Safety’s statement, but I suspect his views on the “risk of extinction” differ dramatically from the views of signatories closer to the top of the list.
Personally, I find Tyler Cowen’s writings and statements on this issue to be particularly trenchant. (See, e.g., here, here, and here.) Still, I have read the views of Yudkowsky, Scott Alexander, Zvi Mowshowitz, Scott Aaronson, etc., and I know enough to know I’m not qualified to offer a reasoned opinion. I will remain ecumenical on this issue and suggest that it is probably unwise to immediately turn over the keys of the legal system to AI, a position on which I suspect I’d get 100% agreement.
This, perhaps, provides some some rational underpinning for the intuitions addressed in #6. If we are concerned about existential risk and p(doom), juries start to sound like a better idea. Juries are sometimes justified as the last bulwark against government corruption. In particular cases, juries may be befuddled or incompetent. But the jury system ensures that the government cannot, no matter how corrupt it gets, incarcerate any of its citizens without the consent of twelve random citizens unaffiliated with the government. By the same token, juries protect against misaligned AIs. No matter how much we come to rely on AI, juries will serve as a kind of circuit breaker, ensuring that people keep their liberty unless other human beings decide otherwise. Of course, maybe AIs will become so brilliant that they will convince us to abolish juries, but we should at least try to preserve the parchment barriers for now.
This is a very abstract and speculative discussion (as discussion about AI alignment tends to be). Even if eliminating human juries and judges increases p(doom), it’s hard to see how transferring control over lawyering from humans to AI will increase p(doom).
Maybe it will, if the AIs conspire to skew the briefs on both sides in a particular pro-AI direction? I don’t know. That said, I also doubt lawyers have much insight on what particular refinements of the legal system will decrease p(doom). Our fate likely rests with regulators (and non-regulators) at a higher pay grade.
My overall take: I subscribe to Professor Cowen’s view for now. I will keep an open mind on the more exotic versions of the AI alignment problem and am prepared to revise my views if warranted both on expected outcomes and on appropriate regulations.
One final note.
I suspect that as AI develops, procedural rules will change to account for AI—but they won’t ban or restrict AI. Instead, they’ll assume parties will be using AI, and might even force them to use AI.
I’m not suggesting these proposed changes are good ideas now, perhaps not ever, but this is where I see this debate going.
Altering timing and pleading requirements to account for parties’ use of AI. If an AI can prepare a magnificent first draft in five seconds, courts may conclude that there’s no need to give parties, e.g., 90 days to file an opening brief. Why not give 14 days? Likewise, if an AI can easily identify all legal claims arising from a particular set of facts, why freely grant leave to amend complaints or construe pleadings liberally? Courts will alter rules under the assumption that parties are using AI, which in turn will de facto render the use of AI mandatory.
Mandatory use of AI in preparing briefs. Judge Starr requires litigants to certify that “any language drafted by generative artificial intelligence—including quotations, citations, paraphrased assertions, and legal analysis—will be checked for accuracy, using print reporters or traditional legal databases, by a human being before it is submitted to the Court.” This reflects the view that humans are more careful than AIs, and therefore should check AIs’ work. What if the truth turns out to be the opposite? Pretty soon we’ll be talking about the opposite rule: “Any language drafted by human beings—including quotations, citations, paraphrased assertions, and legal analysis—will be checked for accuracy by a generative AI before it is submitted to the Court.” And not only legal analysis: “Any factual statements drafted by human beings will be checked for accuracy by a generative AI trained on the documentary record before it is submitted to the Court.” The court could require that a particular query be used, something like: “Check this document for citation errors, legal analysis that is contrary to binding precedent, and factual statements that are inconsistent with the record.”
The rule won’t require that we must implement the AI’s suggestions—that sounds too much like 2001: A Space Odyssey—but instead that we’ve at least asked the AI for suggestions. In many cases the suggestions will be good ones and the lawyer will gladly take them. Also, this rule will make it harder for lawyers to argue that their errors were inadvertent. If they make a false statement in the brief, the AI points it out, and the statement stays in the brief anyway, the court will know that the lawyer knew what he was doing.
Mandatory use of AI in mediating cases. What’s the #1 reason that parties can’t settle cases? They can’t agree on the strength of the case. One, often both, parties irrationally overestimate their chances of victory. A neutral mediator can’t help them because their approach to the mediation is to convince the mediator that they’re right.
Courts may require litigants to submit mediation statements to multiple AIs (or a single AI on multiple occasions) to get multiple candid assessments on the strength of the litigants’ case. These assessments can be kept secret. I suspect that if multiple AIs repeatedly tell a litigant that it is taking a risky position, the litigant will listen.
AI is going to take some getting used to.
See you all next time!
I agree that it's hard to make a GPT-based model forget what it has already learned. But I think it's easy to direct a GPT-based model not to learn from new information. In other words, the GPT-based model can be put into sandbox mode in which some new input isn't used to train the model and is promptly forgotten.
Contrary to your assumption (or what I understood it to be), making a GPT-based model “forget” something it has already “learned” isn’t easy at the moment, and the path toward making it easy isn’t clear. Depending on the level of accuracy you’re looking for (and at least sometimes the law will want 100%), it can require retraining the model from scratch, which is prohibitively expensive...
That’s one of the reasons why the early judgment in Italy regarding ChatGPT and GDPR must be regarded as important, IMO.