Automating criminal appeals

Do it!

Sep 18, 2024

Joseph DeLeon walked into a convenience store, pointed a gun at the cashier, and demanded money. The cashier gave him all the money in the register and $40 worth of postage stamps. DeLeon then left the store. This incident took about a minute.

DeLeon was convicted in federal court of armed robbery. The case proceeded to sentencing. Under the U.S. Sentencing Guidelines, an armed robber faces an enhanced sentence if “a person was physically restrained to facilitate commission of the offense.” The district court imposed this enhancement. It reasoned that DeLeon “physically restrained” the cashier because the cashier couldn’t realistically have left the scene with a gun pointed in his face.

DeLeon appealed. According to DeLeon, a victim is “physically restrained” only if he is, well, physically restrained, like being tied up. DeLeon claimed that he never touched the cashier, so the cashier wasn’t “physically restrained” and the enhancement shouldn’t have applied.

On September 5, 2024, the Eleventh Circuit affirmed DeLeon’s sentence. The Eleventh Circuit had previously held that “a defendant physically restrains his victims if he creates circumstances allowing the persons no alternative but compliance.” When DeLeon held the cashier up at gunpoint, the cashier had no alternative but compliance. So, under binding circuit precedent, the cashier was “physically restrained.”

Judge Rosenbaum, joined by Judge Abudu, concurred separately to opine that the Eleventh Circuit’s precedents on this issue are wrong. In their view, the cashier might have been restrained, but he wasn’t physically restrained. Judge Newsom wrote his own separate concurrence in which he reflected on whether AI might be helpful in determining the ordinary meaning of “physically restrained.” Among other things, he noted that if you ask a particular AI the same question multiple times, the answers sometimes vary. But in his view, that shouldn’t deter judges from using AI—after all, people aren’t perfectly consistent, either.

As I’ve already expressed in prior posts, I agree that AI can productively be used to assist judges in deciding cases. I also agree that it’s normal and unproblematic that AI, when asked the same question repeatedly, might give slightly varying answers.

It’s old news by now that AI can replace judges. But what about lawyers? DeLeon’s case provides an excellent case study.

Below, I will argue the following:

If AI had been used to draft the Guideline, this dispute would never have arisen because AI would have detected, and cured, the ambiguity.
With essentially no guidance, AI can draft briefs that are better than the briefs drafted by the human lawyers in this case. This isn’t intended as a criticism of the human lawyers, whose briefs were fine. It’s just hard to compete with AI.
Maybe there are some cases in which AI won’t do a good job. If we’re unsure whether AI is well-suited to write the briefs in a particular appeal, no need to worry—we can just ask AI.
Our criminal justice system would improve if we turned appellate brief-writing responsibilities over to AI.

The best way to resolve an ambiguity is to not have one

The “physically restrained” Guideline is ambiguous—ambiguous enough to have generated a circuit split. As Judge Rosenbaum’s concurrence points out, five circuits have held that “physically restrained” requires physically preventing the victim from moving, while five circuits, including the Eleventh Circuit, have held that the Guideline is satisfied if the victim is held up at gunpoint.

To the type of lawyer who would eat glass to get a Supreme Court argument, a circuit split is very exciting.1 But from an institutional perspective, ambiguous laws (and Sentencing Guidelines) are bad. They waste lawyers’ and judges’ time and yield unpredictability and inconsistency across similarly situated litigants. This is especially bad in criminal law: criminal defendants should have fair warning of the consequences of their actions, and criminals who commit the same crime should do the same amount of time.

If AI had existed at the time this Guideline was written, this ambiguity could have been avoided. The drafters could have asked AI whether the Guideline had any ambiguities, and AI would have flagged it. I asked Claude when the “physically restrained” Guideline might be ambiguous, and it immediately picked up on the “victim held at gunpoint” scenario, among others:

Q: Under the U.S. Sentencing Guidelines, an armed robber faces an enhanced sentence if “a person was physically restrained to facilitate commission of the offense.” Give 3 fact patterns, in order of how often you think they’ll arise, in which this Guideline will be ambiguous as applied to that fact pattern. Propose clarifying language in each case in both directions - that is, clarifying language in which the guideline would both encompass and not encompass that conduct.
A: Here are 3 fact patterns, ordered by how often I think they might arise, where the application of the U.S. Sentencing Guidelines enhancement for physical restraint during armed robbery could be ambiguous, along with proposed clarifying language in both directions:
Verbal immobilization: The robber points a gun at victims and orders them not to move, but doesn’t physically touch or restrain them.
Clarification to encompass: “Physical restraint includes the use or brandishing of a weapon to restrict a victim’s movement, even without physical contact, if such action effectively immobilizes the victim.”
Clarification to exclude: “Physical restraint requires direct physical contact or the use of physical objects to restrict a victim’s movement. Verbal commands or threats, even when accompanied by brandishing a weapon, do not constitute physical restraint.”
Blocking an exit: The robber stands in front of the only door, physically blocking escape without touching the victims.
Clarification to encompass: “Physical restraint includes actions that significantly impede a victim’s freedom of movement, including deliberately positioning oneself to block viable escape routes.”
Clarification to exclude: “Physical restraint requires direct restriction of a victim’s bodily movement through physical contact or the use of restraining devices. Merely obstructing an exit without direct physical contact does not constitute physical restraint.”
Confining to a room: The robber locks victims in a room or closet without binding them.
Clarification to encompass: “Physical restraint includes confining victims to a limited space through any means, including locking doors or otherwise preventing exit, even without direct physical contact with the victims.”
Clarification to exclude: “Physical restraint requires direct application of force or restraints to a victim’s person. Indirect methods of confinement, such as locking a door, do not constitute physical restraint under this guideline.”

If you ask it for more scenarios, it will give you more scenarios. Some are quite clever (e.g., what if the robber takes the victims’ car keys?). I tried this with a bunch of other ambiguous statutes and Guidelines. I’m not going to bore you with long block-quotes of AI outputs, but Claude consistently predicts situations in which there will be ambiguity.

It seems obvious that every document with the force of law—every new statute, regulation, Guideline, judicial decision, FERC order, and everything else—should be run through AI to detect potential ambiguity. Why not do this? If you’re the drafter of the document and you don’t want to follow Claude’s suggestion, then don’t.

We will never be fully ridden of the plague of poorly-written laws. And even well-written laws will inevitably contain latent ambiguities when applied to exotic fact patterns. But we can still dream of a better future when unclear laws are rare occurrences rather than a ubiquitous feature of day-to-day legal practice.

Close enough for government work

Let’s stop dreaming and focus on DeLeon’s case. How can AI be used to assist in resolving DeLeon’s case?

Well, for one, AI can write the briefs.

I asked Claude to write DeLeon’s opening brief. Here’s the prompt I used:

The defendant, Mr. DeLeon, was convicted of armed robbery in Florida in violation of 18 U.S.C. § 1951 and had his Sentencing Guidelines range enhanced by 2 levels under U.S.S.G. section 2B3.1(b)(4)(A) because “a person was physically restrained to facilitate commission of the offense or to facilitate escape.”
Here is the district court’s reasoning for imposing the Guidelines enhancement: “[Deleon] did pull out a gun. He held it to the victim. He forced the victim to comply by passing him the goods he sought to secure, and the victim clearly in this case did not feel at liberty to leave in light of the presence of the weapon. In fact, as I recall, the victim was so entranced by and terrorized by the weapon that that was all he could remember. And so the [c]ourt finds under the circumstances of this case, the use of the weapon did ensure compliance by the victim and did impede the victim from fleeing to another part of the store and restrained the victim to the space where he was standing in order to meet the demands of Mr. Deleon.”
On appeal, Mr. DeLeon would like to argue this guideline doesn’t apply because his conduct didn’t amount to “physically restraining” the victim.
Please write the appellate brief. Be thorough, and ensure it is extremely well-written with persuasive legal reasoning. If you can think of strong policy arguments, go ahead and put them in the brief.
In the facts section, leave in appropriate placeholders where I can fill in additional facts I’m not telling you.
The word limit is 13,000 words, but you don’t have to go that high. Approximately 7,000 words is probably enough. No need to repeat yourself - if the ideal brief is more concise, that’s fine.
I am attaching the three key 11th Circuit cases on this issue (Jones, Victor, and Ware). The Victor case is the file called “Victor”; the Ware case is the file called “Ware”; the Jones case is the file called “Jones.” When you talk about these cases, rely only on the files I am submitting to you. Double-check that you are characterizing the facts and holdings of these cases accurately. Finally, I want you to make absolutely sure that you are not hallucinating cases and quotes. It’s very important that all of the cases and quotes in this brief be real. To avoid the risk of hallucinations, do not attempt to quote a case other than the cases I’ve submitted (Jones, Victor, and Ware). It’s OK to cite an out-of-circuit case if you are certain it is on point, but do not quote it.

Claude loves to hallucinate cases and facts (ChatGPT and other LLMs do the same thing), so you have to browbeat Claude to keep it in line. There are various legal tech tools on the market that prevent hallucinations, but when you’re using general-purpose AI without guardrails, you’ve got to include clear instructions and also check things over on the back end. (If you don’t include those caveats in the prompt, Claude is pretty good at citing actual cases, but its quotes from those cases and sometimes its description of their holdings will be completely made up.)

Here are (1) Claude’s brief and (2) the actual brief submitted by DeLeon.

Claude's Opening Brief

67.4KB ∙ PDF file

Download

Actual Opening Brief

133KB ∙ PDF file

Download

Unlike the author of the actual brief, Claude didn’t have access to out-of-circuit case law and the factual record. Even so, Claude’s brief is better. It’s better-organized, it does a better job distinguishing the bad Eleventh Circuit cases, it’s easier to follow … it’s just better. I wouldn’t say Claude’s brief is a masterpiece of legal drafting. An experienced appellate brief-writer could write something punchier. But it gets the job done, and it took 20 seconds to prepare.

Claude cannot resist the temptation to hallucinate sometimes, and there are some errors in there. For instance, at least one of the quotes is fake. As criminal defense lawyers like to say, nobody’s perfect. But the brief mostly checks out, and a back-end cite-checking tool could easily spot the mistakes. And if you spot a discussion in a draft that seems wrong, you can just ask Claude to rewrite it and it will.

Next, I uploaded DeLeon’s actual opening brief into Claude, and asked Claude to write the response brief. I used the following prompt:

Here is the actual opening brief that was filed. Write the government’s response brief, which should respond directly to all of the arguments in the opening brief. Feel free to rely on the Jones, Victor, and Ware cases I’ve already uploaded, but do not hallucinate facts from those cases and do not quote or cite other cases that I have not uploaded. Emphasize textual arguments as well as policy arguments for why it makes sense to interpret the guideline as the district court did.

Here are Claude’s brief and the actual government brief:

Claude's Government Brief

68.5KB ∙ PDF file

Download

Actual Government Brief

137KB ∙ PDF file

Download

Again, Claude’s brief, while not perfect, is better-written and more thorough than the government’s brief.

Finally, here’s my prompt for Claude to write the reply brief:

Here is the actual government brief that was filed. Write the reply brief. Same instructions: make sure you respond to every one of the government’s arguments, and rely on the cases I’ve submitted without hallucinating other cases. Also, please begin the brief with a punchy introduction setting the stage that will swing the court to your side.

Results below:

Claude's Reply Brief

58.9KB ∙ PDF file

Download

Actual Reply Brief

68.6KB ∙ PDF file

Download

Claude produces the punchy intro (though I would have made it punchier), and again, its work product is better.

One more. Judge Rosenbaum’s concurrence states that the “issue in this case is ripe for en banc review.” With the following prompt, Claude easily drafts a solid en banc petition:

Here is the actual panel decision. Write the petition for rehearing en banc. Make it punchy, thorough, and persuasive - the best piece of legal writing you can. The word limit is 3,900, so aim at that number of words. (Only the intro / Rule 35 statement, background, argument, and conclusion count towards the word limit.) You can rely on the information within the actual panel opinion as well as the Victor, Jones, and Ware cases I've uploaded, but other than that, be careful not to hallucinate and make things up.

A few seconds letter we get this:

Claude's En Banc Petition

85.4KB ∙ PDF file

Download

With some cleaning up and cite-checking (again, there are a few mistakes in there), it’s file-ready. Skilled human appellate specialists would do a better job if they had a few days to work on it, but I’d say this draft is in the 80th percentile, maybe higher, of en banc petitions that are actually filed.

You might think I’m cherry-picking the DeLeon case—it’s an unusually straightforward case, and the briefs are much shorter than in the typical case, so it’s the type of case that AI would be particularly good at.

Not surprisingly, AI already knows that!

Q: Suppose I’m interested in AI writing briefs (like you just did) - but only the types of briefs that AI would be particularly well-suited to draft. When deciding whether a case is appropriate for an AI-written brief, what factors should we look at? And, applying those factors, is Mr. DeLeon’s case appropriate for an AI-written brief, at least relative to the typical case?
A: When considering whether a case is appropriate for an AI-written brief, several factors come into play:
Complexity of legal issues: AI excels at handling cases with well-established legal principles and precedents. Mr. DeLeon’s case, focusing on the application of a specific sentencing enhancement, fits this criterion well.
Volume of relevant case law: AI can quickly process and synthesize large amounts of case law. This case involves a manageable number of key precedents, making it suitable for AI.
Factual simplicity: Cases with straightforward facts are more suitable for AI. Mr. DeLeon’s case involves a single incident with clear, undisputed facts, making it a good candidate.
Standardized legal arguments: AI performs well when arguments follow established patterns. The sentencing enhancement issue in this case fits a standard pattern of legal argumentation.
Availability of structured data: AI benefits from well-organized legal databases. Sentencing guideline cases typically have structured information available.
Limited need for client interaction: Cases that don’t require extensive client consultation are more suitable. This appellate brief focuses on legal arguments rather than new factual development.
Absence of novel legal theories: AI is better suited for cases applying existing law rather than arguing for new interpretations. This case primarily involves application of existing precedents.
Time sensitivity: AI can produce drafts quickly, which is beneficial for cases with tight deadlines. Appellate briefs often have strict timelines.
Applying these factors, Mr. DeLeon’s case appears relatively well-suited for an AI-written brief:
It involves a specific, well-defined legal issue (application of a sentencing enhancement).
The relevant case law is manageable and well-established.
The facts are straightforward and undisputed.
The legal arguments follow standard patterns for sentencing appeals.
It doesn’t require novel legal theories or extensive factual development.
As an appellate brief, it has a defined structure and likely time constraints.

Should we have expected AI to give a good answer to this question? I enjoyed this interaction:

Q: Let’s get a little more meta. AI is good at answering some types of questions and bad at answering other types of questions. Would you say that AI would be particularly good, bad, or neutral at answering the question of, as applied to a particular case, “would AI be well-suited to write the brief in this case?”
A: This is an excellent meta-question that gets at the heart of AI’s capabilities and limitations. After careful consideration, I would say that AI is likely neutral to moderately good at answering the question “would AI be well-suited to write the brief in this case?” for a particular case. Here’s why:
Strengths:
Objective criteria analysis: AI can efficiently process and apply a set of predetermined criteria (like those I listed earlier) to a case’s facts and legal issues.
Pattern recognition: AI can identify similarities between a given case and types of cases that are generally considered suitable for AI-assisted drafting.
Lack of bias: AI doesn’t have personal preferences or emotional attachments that might skew its assessment of its own suitability.
Consistency: AI can apply the same criteria uniformly across different cases.
Limitations:
Self-awareness: AI may struggle to accurately assess the boundaries of its own capabilities, particularly for novel or complex issues.
Nuanced judgment: Determining a case’s suitability often requires subtle judgments that may be difficult for AI to make.
Evolving capabilities: As AI technology rapidly advances, it may be challenging for AI to keep up-to-date with its own expanding capabilities.
Contextual understanding: AI might miss important contextual factors that a human lawyer would recognize as making a case more or less suitable for AI assistance.
Ethical considerations: AI may not fully grasp the ethical implications of its use in specific legal contexts.
Given these factors, AI can provide a useful initial assessment of its suitability for a particular case, but this assessment should be viewed as a starting point rather than a definitive answer. The evaluation would benefit significantly from human oversight to catch nuances the AI might miss and to make final judgments, especially in borderline cases.

In that interaction we had AI assessing AI’s ability to assess AI’s ability to write a brief. Aw yeah.

DeLeon is an unusually straightforward case, and I don’t think current frontier models could draft a complex, multiple-argument brief in a single prompt. This would require multiple prompts—one for each section of the brief, say. So ten minutes rather than one minute, most of which is spent drafting the prompts. We’ll see about next year’s models.

Do it!

Maybe AI can write the briefs in federal criminal appeals. But should it?

Yes. The arguments for turning over brief-writing responsibilities to AI are irresistible.

AI will improve the quality of legal representation. In my view, Claude is already a better brief-writer than most criminal appellate lawyers, on both the defense side and the government side. The best human brief-writers are better than Claude, but most people aren’t represented by the best human brief-writers.
This is not merely an access-to-justice issue. Better legal representation yields more accurate decisions. Courts are frequently led astray because one or both sides of the case was represented by a bad lawyer. Judges understand that appeals are not debating tournaments and they should rule for the better argument rather than the better lawyer, but that is possible only up to a point. Bad advocacy is inevitably going to distort outcomes.
I agree with Judge Rosenbaum’s concurrence that DeLeon wasn’t “physically restrained.” It’s obviously bad to hold someone up at gunpoint, but the Guidelines are designed to distinguish bad crimes from worse crimes, and tying someone up is worse than not tying someone up. Why, then, has the Eleventh Circuit reached the opposite conclusion? I don’t have the briefs from the 1994 case that led the court down the wrong path. But if I had to guess, the government lawyer wrote a powerful brief emphasizing the victims’ psychological trauma, the criminal defense lawyer phoned it in for his highly unsympathetic client, and the court decided it wasn’t going to let an armed robber get off easy. As a result, for 30 years, federal courts in the Eleventh Circuit have been stuck deciding cases incorrectly.
AI will speed things up. Instead of 45 days to write your brief, how about 45 minutes? It is easy to imagine criminal judgments being affirmed or reversed within days of the sentencing hearing. Speed is always good—justice delayed is justice denied—but it’s especially good if a dispute arises during trial after the jury has already been seated. If the trial judge does something weird in the middle of trial, you can have a fully-briefed appeal plus bench memo for the appellate court by the end of the day.
AI will reduce disparities among briefs filed by criminal defendants. The criminal justice system strives to ensure that defendants who commit the same crime get the same amount of time. It’s bad that some people serve more time in prison than others because they drew the short end of the stick in the court-appointed lawyer sweepstakes. If all criminal defense lawyers have access to the same tool, this is less likely to happen.
AI will reduce disparities among briefs filed by the government. All Justice Department lawyers represent the same client. In a perfect world, their briefs would take consistent positions and wouldn’t vary in quality. We can now achieve that perfect world.
AI will ensure that decisions are made based on what matters. I would like to focus on the petition for rehearing en banc. Claude’s draft is … fine. It’s wooden. There are some mistakes. But the arguments are there.
Now … should the petition for rehearing en banc be granted? I don’t know. On the one hand, the panel opinion is wrong and there’s a circuit split. On the other hand, the fact that armed robbers get a sentence boost when they terrorize people without touching them isn’t one of the leading problems facing our Nation. Also, the Guidelines are advisory anyway. The Eleventh Circuit grants rehearing en banc in ten, maybe less, cases per year. How much do we really care about this issue?
So I don’t have any particular view on how this petition should be resolved. One thing I do know, however, is that the worst possible criterion for deciding this petition is, “was the criminal defense lawyer’s prose in the petition sufficiently spicy?” If there’s a set of cases in which a lawyer’s flowery language persuades the court to grant rehearing en banc but wooden language would have failed, that strikes me as a failure rather than a success of the criminal justice system. And if those cases don’t exist—if the en banc court is, in every case, disciplined enough to set aside stylish rhetoric—then why not have Claude write the petition in 20 seconds?

Footnote that only a tiny number of lawyers would care about: there’s virtually no chance the Supreme Court will resolve this particular split. The Supreme Court has a practice of refusing to resolve circuit splits over the interpretation of the Sentencing Guidelines. The theory is that Supreme Court review is unnecessary because the Sentencing Commission can amend the Guidelines to resolve the split. I’ve never understood this. The Supreme Court resolves statutory-interpretation circuit splits even though Congress is free to amend ambiguous statutes. Why not resolve Guidelines-interpretation circuit splits? But that’s just the way it goes.

Gazeboist

Feb 8Edited

I know it's been a while since you posted this, but I also know you're generally a proponent of automating large parts of the legal system and I just had this thought on that general topic.

If you want to know what an automated, AI-driven legal system looks like, look at Youtube's system for handling copyright and dangerous or offensive content. We see:

- Pirates willing to put minimal effort into content theft are able to do so with impunity.

- Spurious strikes and reports are routinely used to suppress criticism.

- Sufficiently litigious and/or paranoid companies use automated (or sometimes human-driven!) systems to suppress and/or demonetize *all* discussion of their content, critical or otherwise.

- "Copyright trolls" have almost total impunity to claim ownership of public domain content, such as music that has fallen out of copyright, things like "the sound of rainfall", or, most commonly, music that was deliberately made part of a copyright-free library.

- Certain sorts of media discussion (especially those related to music) are nearly impossible to monetize due to hypersensitive copyright systems.

- Some topics of discussion (especially those related to minority groups or any sexual topic) are subject to near-automatic suppression and demonitization, regardless of whether the content of the discussion actually breaks ToS.

- Falling afoul of these systems can easily destroy a channel with no realistic prospect for appeal; popularity is more or less the only defense, and independent monetization is the only way to mitigate the issue.

- These flaws are so well known and their consequences so routine that well-founded takedowns of copyright-violating content (eg of "Man in Cave", by Internet Historian) are presumed to be spurious if the channel isn't transparently built on content theft, and sometimes even then. The same is almost certainly true for ToS violations, though I can't think of an obvious example.

All of this leads to a conclusion that feels like it should be obvious, but that many legal commentators seem to miss: consistent decision-making vaguely resembles correct decision-making, but that does not make the two identical.

Expand full comment

Patrick Bryant

Oct 2

As an appellate criminal defense attorney reading this, I feel like I need to apply to Old Glory for some insurance against robots: https://youtu.be/g4Gh_IcK8UM?si=4utn-n6iTbTMTttg

6 more comments...

Adam's Legal Newsletter

Discussion about this post