On October 16, 2024, the Supreme Court heard oral argument in Bufkin v. McDonough. The question in Bufkin, roughly speaking, is whether veterans seeking benefits for service-connected disabilities should get slightly more process so that they prevail slightly more frequently.
In this post I will offer three views on Bufkin:
The entire VA claims adjudication process should be replaced by AI.
Bufkin turns on the following question: Is it more likely that Congress enacted a useless law or a badly-written law? It’s a close call.
In deciding Bufkin, the Court might be tempted to attempt to predict, and then account for, the practical implications of its decision. The Court shouldn’t yield to that temptation. It should ignore the future.
Don’t give up. Don’t ever give up.
The Department of Veterans Affairs, better known as the VA, compensates veterans for service-connected disabilities. To obtain benefits from the VA, the claimant must show that (1) he is, in fact, disabled, and (2) the disability is connected to his military service.
Sometimes it’s unclear whether a veteran is entitled to benefits. Suppose a veteran comes to the VA claiming he has back pain. Maybe his back pain isn’t really disabling; maybe it is disabling but it arises from his car accident after he left the service.
In such cases, Congress has decided that it’s better that an undeserving veteran get benefits than a deserving veteran lose out. Thus, Congress has instructed that the veteran be given the benefit of the doubt. Or, in legalese: “When there is an approximate balance of positive and negative evidence regarding any issue material to the determination of a matter, the Secretary shall give the benefit of the doubt to the claimant.”
Suppose a veteran applies for benefits, and the VA concludes that, even taking into account the benefit-of-the-doubt rule, the veteran isn’t entitled to them. Does the veteran have any recourse?
Of course he does. This is America! In fact, when a veteran seeks benefits, there are actually five (!) layers of review:
The veteran first seeks benefits from the local VA regional office.
If he loses there, he can then go to the Board of Veterans’ Appeals. “Board of Veterans’ Appeals” sounds sort of like a court, but it’s not—it’s an entity housed within the VA.
If he loses there, he can go to U.S. Court of Appeals for Veterans Claims, or the Veterans Court for short. Confusingly, although this court is called a “court,” it is not, in fact, a “court” in the constitutional sense, although it’s more court-like than the Board of Veterans’ Appeals.
If he loses there, he can go to the U.S. Court of Appeals for the Federal Circuit, which is a real court.
And if he loses in the Federal Circuit, you all know who’s at the top of the judicial food chain.
(This week’s images feature lots of judges hearing the case of a single veteran.)
In Bufkin, Level Five is reviewing Level Four’s decision on what standard Level Three uses to review Level Two’s decisions. This is why it’s great to be a lawyer.
Let’s unpack that. As I mentioned, the VA—that’s Level One and Level Two—gives the claimant the benefit of the doubt. Suppose that even giving the veteran the benefit of the doubt, the VA rules against the veteran. The veteran can appeal to the Veterans Court. The primary question in Bufkin is how the Veterans Court is supposed to review the VA’s benefit-of-the-doubt analysis.
A bit of background is needed here. Before 1988, there was no judicial review of VA benefits denials. If the VA decided that a veteran wasn’t entitled to benefits, then that was just too bad for the veteran. After years of advocacy by veterans’ groups, Congress created the Veterans Court in 1988. Congress decided that the Veterans Court would review the VA’s legal determinations de novo (i.e., without deference), but would review the VA’s factual determinations for clear error (i.e., with deference). That review standard appears in 38 U.S.C. § 7261(a), which is still on the books today.
Fast forward to 2002. Veterans’ groups were dissatisfied with the Veterans Court’s work. Their concern was that the Veterans Court wasn’t taking the benefit-of-the-doubt requirement seriously. The VA would often recite the benefit-of-the-doubt standard and then reflexively deny benefits anyway. And then, on appeal, the Veterans Court would rubber-stamp the VA’s benefit-of-the-doubt analysis.
In 2002, in response to these concerns, Congress enacted 38 U.S.C. § 7261(b)(1). Section 7261(b)(1) says that “in making the determinations under” Section 7261(a), the Veterans Court “shall take due account” of the VA’s application of the benefit-of-the-doubt requirement.
The question in Bufkin is: what does this mean?
The petitioner-veterans (there are two of them, Bufkin and Thornton), argue that under Section 7261(b)(1), the Veterans Court should review the VA’s benefit-of-the-doubt determinations de novo. The government argues that, notwithstanding Section 7261(b)(1), the Veterans Court should review the VA’s benefit-of-the-doubt determinations deferentially.
The end of tradeoffs
Before analyzing this intriguing legal question, I cannot help but talk about AI.
Some of you may have seen those lists on the Internet of “old movies where the problems that generated the plot would be solved by modern technology.” You know … in Die Hard John McClane could have called the police with a cell phone, in the Blair Witch Project they could have used a GPS to get out of the woods, etc. Well, AI would solve all of the problems that generated the plot in Bufkin.
There are three oft-cited problems with the VA benefits process:
The VA is extremely slow. Wait times for benefits determinations are staggering.
The VA is wildly inconsistent. One examiner might say that a veteran’s symptoms render him 100% disabled, another examiner might say that the same symptoms aren’t disabling at all.
Some individual VA examiners are pathologically hostile to veterans and love saying “no.”
The VA has a poor reputation, but it has a tough job. Last year, the Board of Veterans’ Appeals decided over 100,000 appeals. That’s an insane number of appeals.
There’s no easy answer to these problems. To deal with problem #1, you can hire a lot of new examiners to speed things up. But even if the VA had the budget for this, it would exacerbate problems #2 and #3—you’d have more variation and more inept examiners.
To deal with problem #2, you can set detailed rules governing when benefits should be conferred. The VA has certainly done this: the regulations governing veterans’ benefits rival the Theory of General Relativity in complexity. But this only goes so far. For certain types of disabling conditions, PTSD being a notable example, different examiners are inevitably going to interpret the veteran’s symptoms in different ways.
To deal with problem #3, you can do what Congress did: tell the VA to give the benefit of the doubt to the veteran. But personnel is policy. If an examiner is a jerk, you can tell him to bend over backwards for the veteran using whatever verbal formulation you want, and he’ll still stiff the veteran.
Alternatively, the VA could fire or demote the examiners who are too harsh on veterans. But this would create horrible incentives, plus, to first approximation, it’s impossible to fire a government employee. Another option is to have multiple people look at the same benefits denial—which is why we now have the Regional Office, and then the Board of Veterans’ Appeals, and then the Veterans Court. This might mitigate problem #3, but it exacerbates problem #1.
Also, there comes a point where there is too much process. The cases of the two veterans in the Supreme Court case, Bufkin and Thornton, illustrate this point.
Bufkin served in the Air Force for six months. While he was in the Air Force, his wife wanted him to leave the military and threatened to commit suicide if he stayed in. Bufkin was advised, reasonably, that his options were to leave the Air Force based on undue hardship or end the marriage. He chose to leave the Air Force. Seven years later, Bufkin filed a claim for veterans’ benefits, claiming that his unpleasant memories from this experience left him with service-connected PTSD.
This allegation led to multiple reviews by VA examiners, all of which denied his claim; review by the Board of Veterans’ Appeals, which denied his claim; review by the Veterans Court, which denied his claim; and then an appeal to the Federal Circuit and the Supreme Court. This seems like a lot of process for a marginal claim, and now Bufkin wants even more process: he wants the Veterans Court to take a de novo look at the benefit-of-the-doubt issue, which has already been analyzed by multiple VA examiners plus the Board of Veterans’ Appeals.
Thornton served in the military for three years, from 1988 to 1991. In 1994 he was given a 40% disability rating for what the record refers to as an “undiagnosed illness.” 30 years later, he still has that 40% disability rating for that undiagnosed illness. Years after he left the military, he sought benefits for PTSD, and in 2005 he was given a 10% disability rating for PTSD. In 2015—24 years after he left the military—he sought a higher disability rating for PTSD. The VA upped his rating from 10% to 50%. The Board of Veterans’ Appeals then granted his claim for a total disability—effectively 100% disabled—based on the conjunction of his PTSD and the undiagnosed illness.
Thornton appealed to the Veterans Court, claiming that his disability rating for PTSD should have been 70% rather than 50%, which might lead to slightly higher monthly benefits, even though he’s already getting benefits for a total disability. The Veterans Court rejected this argument. In the Supreme Court, Thornton argues that the Veterans Court should have taken a de novo look at whether, under the benefit-of-the-doubt rule, he’s 70% rather than 50% disabled as a result of PTSD. Again, this is a lot of process for a claimant who seems to have been treated favorably by the VA.
Also, why we should think that the Veterans Court would do a better job than the prior two reviewers in deciding whether he’s 70% or 50% disabled? They’re all reviewing the same documents. It’s not as though the Veterans Court judges have higher IQs or are nicer people. Sure, if multiple people review the same ambiguous claim, over and over again, one of them eventually is going to rule for the claimant. So maybe, reviewer #3 is going to think Thornton is 70% disabled from PTSD even though reviewers #1 and #2 thought he was 50% disabled. But it’s not clear to me that adding more reviewers to the same case is going to make determinations more reliable in the aggregate.
As these cases illustrate, adding additional layers of review may reduce the probability that a particular veteran will be unfairly short-changed. But adding layers of review is expensive, slows things down, and might not improve accuracy. Ultimately, designing the optimal scheme for veterans’ benefits adjudications is a tricky policy problem, with tough tradeoffs and no clear answers.
Well, it used to be a tricky policy problem, but it isn’t anymore. Using AI for benefits adjudications obliterates all problems instantly:
There are no delays. 2-year wait times can be reduced to 2-minute wait times.
There are no inconsistencies across adjudicators because there’s one adjudicator.
The veteran won’t face the risk of being stuck with a stingy adjudicator, because again, there is one adjudicator that can be adjusted until it isn’t stingy.
This can be trivially implemented today. Just upload the relevant portion of the VA benefits manual into an AI context window, upload the veteran’s documentary evidence, and ask AI to apply law to fact. If you’re concerned that AI won’t be sufficiently generous, you can keep track of the percentage of veterans that obtain benefits and then tweak the prompts until the percentage is at a satisfactory level. If you’re squeamish about having AI make decisions that affect people’s lives, then you can have the AI complete the initial layer of review (equivalent to what the VA does today) and then give the veteran the right to appeal to a human judge. If we just snap our fingers, we can make all tradeoffs go away.
Is it OK for Congress to do nothing?
For reasons related to political will rather than technology, I am skeptical that the VA will implement this solution any time soon. So let’s go back to the the Bufkin case.
A quick review of the legal issue:
Section 7261(a) says that the Veterans Court reviews the VA’s factual determinations for clear error.
Section 7261(b)(1) says that the Veterans Court “shall take due account” of the VA’s application of the benefit-of-the-doubt rule.
The petitioner-veterans argue that the Veterans Court should review the VA’s benefit-of-the-doubt analysis de novo. The government argues that the Veterans Court should review the VA’s benefit-of-the-doubt analysis for clear error.
The veterans’ argument goes like this: Before Section 7261(b)(1) was enacted, the Veterans Court was reviewing the VA’s benefit-of-the-doubt determinations deferentially. In enacting Section 7261(b)(1), Congress must have changed something. Section 7261(b)(1) can’t possibly mean nothing. So, Section 7261(b)(1) must mean that the Veterans Court should review the VA’s benefit-of-the-doubt determinations de novo.
The government’s argument goes like this: Section 7261(b)(1) doesn’t say that. It just says, the Veterans Court “shall take due account” of the VA’s application of the benefit-of-the-doubt standard. Section 7261(b)(1) is a kind of kick in the pants to the Veterans Court—it says, “do your job” rather than changing the legal standard. So, the Veterans Court’s review of the VA’s benefit-of-the-doubt determinations is still deferential.
This is a tough issue. The government has a reasonable argument based on the literal text of the statute. The statute does not, in fact, say that the Veterans Court should review the VA’s benefit-of-the-doubt analysis de novo. It says “due account,” which sounds like a reference to pre-existing legal principles. Here, pre-existing legal principles suggest that agencies’ factual determinations, including the determination of whether there’s an “approximate balance of positive and negative evidence,” should be reviewed deferentially. I have some technical views on this issue that only a lawyer could endure, so I will stick them in a footnote.1
On the other hand, the veterans are right that Congress doesn’t usually enact laws that accomplish nothing. The government argues that Section 7261(b) did something—it admonished the Veterans Court to follow the law, which it wasn’t previously doing. Still, the veterans are right that Congress doesn’t usually enact new laws that don’t change what the law previously required. So maybe we should construe Section 7261(b) to mean something, even though that “something” isn’t really what Section 7261(b) says.
To me this dispute boils down to how comfortable we are with the idea that Congress sometimes enacts laws that do nothing. Philosophically, I am comfortable with this idea. Congress is routinely criticized for enacting useless, symbolic gestures; is it so bad to declare that Congress enacted a useless, symbolic gesture? This interpretation would ascribe a certain ineptitude to Congress, but the petitioners’ interpretation would ascribe a different type of ineptitude to Congress—that Congress enacted a de novo review scheme using the most oblique language imaginable. In selecting between these ineptitudes, why not err on the side of saying that the inept institution did less rather than more?
On the other hand, it seems really bizarre to say that Congress enacted a statute that does nothing.
For what it’s worth, Claude thinks the veterans are correct with 85% certainty! Claude is deeply committed to the idea that the 2002 amendment couldn’t have been surplusage and feels that the benefit-of-the-doubt analysis is more of a legal than a factual determination. I lean towards trusting AI on this issue, but it’s a close one.
Ignore the future
I’m not sure what the right answer to Bufkin is. But in the last section of this post, I’d like to talk a bit about how to go about deciding a case like Bufkin.
Section 7261(b)(1) is ambiguous. There are two ways we might go about resolving this ambiguity:
Dwell on the phrase “due account” until an answer pops out of our head.
Resolve the ambiguity based on the anticipated practical consequences of the Court’s ruling.
#2 is tricky because we don’t know what the practical consequences will be. But we could formulate a few hypotheses, such as:
A ruling for the veterans will significantly increase the number of veterans who get benefits, which will on average increase the accuracy of benefits adjudications, because many veterans are currently being short-changed.
A ruling for the veterans will significantly increase the number of veterans who get benefits, which will on average decrease the accuracy of benefits adjudications, because the Veterans Court is a poor position to conduct de novo review of the VA’s benefits adjudications.
A ruling for the veterans will not have a significant impact on the number of veterans who get benefits, because the Veterans Court tries to do justice and isn’t significantly influenced by the exact articulation of the standard of review.
We could then attempt to discern, somehow, which of these predicted outcomes is most likely to materialize, and fold that prediction into our resolution of the ambiguity in Section 7261(b)(1).
In my view, #1 is the correct methodology. Practical implications are irrelevant. Ignore the future. We should go with whatever we think “due account” means, taking due account, as it were, of background legal principles such as “Congress usually intends for statutes not to be useless.”
Why?
First, “practical implications” and “what the text means” are incommensurate. Suppose you think the text slightly weighs in favor of the government, but it’s more likely than not that benefits decisions will be slightly more accurate if the veterans win. Who wins?
There’s nothing intrinsically wrong with weighing incommensurate factors. Judges do this all the time: every time a sentencing judge has to weigh a defendant’s bad criminal history against his sincere apology, he is weighing incommensurate factors. My problem is that when we’re weighing the text against practical implications, I don’t know what we’re trying to maximize. You might say we’re trying to reach the “best” interpretation, but “best” along what dimension?
Second, what happens if the prediction is proved wrong? Suppose the Court rules for the veterans based on its prediction that a pro-veteran decision will increase the number of veterans who obtain benefits. But 10 years down the road, a scholar conducts an empirical study and concludes that the prediction didn’t materialize. It won’t matter: the Court’s decision will stay in place. In principle the Court could overrule its prior decision, but this would be both extremely unusual and also premised on the legal fiction that the decision was always wrong, not just that it was proven wrong. To me it’s fine to adopt a philosophy of “I will stick to my principles, damn the consequences,” and it’s fine to adopt a philosophy of “I will attempt to adopt a practical solution and monitor the consequences, and if the consequences aren’t what I expected, I’ll adjust course.” But it’s strange to adopt a philosophy of “I will rule based on factual predictions, while sticking to my guns regardless of whether the predictions pan out.”
Third, neither lawyers or judges are able to make these predictions accurately. I asked Claude to generate 15 hypotheses about potential practical outcomes of the Court’s decision (this is an arbitrary number, Claude could trivially generate 100 hypotheses). Here they are:
The decision will lead to a significant increase in appeals to the Veterans Court, as veterans become aware of the more robust review process.
The Veterans Court will face a substantial backlog of cases due to the more time-consuming review process required.
The VA will adapt its decision-making process to more explicitly demonstrate its application of the benefit-of-the-doubt rule, leading to more favorable decisions at the initial stage.
The decision will result in increased consistency in benefits decisions across different VA regional offices and judges.
The cost of veterans’ benefits programs will increase significantly, potentially leading to budget pressures or calls for reform.
The decision will lead to a surge in veterans seeking legal representation for their claims, as the importance of framing issues for appeal becomes more apparent.
The VA will invest more resources in training its adjudicators on proper application of the benefit-of-the-doubt rule, improving decision quality at the agency level.
The decision will result in longer processing times for initial claims, as VA adjudicators become more cautious and thorough in their analysis.
There will be an increase in remands from the Veterans Court to the VA for readjudication, rather than outright reversals.
The decision will lead to a shift in VA culture towards a more veteran-friendly approach in close cases.
The Federal Circuit will see an increase in appeals from VA decisions, as the government challenges expansive interpretations of the benefit-of-the-doubt rule.
The decision will result in greater public trust in the VA benefits system, as veterans perceive the process as more fair and thorough.
There will be a decrease in the consistency of Veterans Court decisions, as different judges interpret and apply the new standard differently.
The decision will lead to an increase in fraudulent claims, as some individuals perceive a greater likelihood of success even with weak evidence.
The VA will develop more sophisticated data analytics tools to help identify cases where the benefit-of-the-doubt rule should apply, leading to more accurate initial decisions.
I guess these are all possible outcomes? How exactly are we going to assess their likelihood?
For what it’s worth, you can ask Claude to rank these 15 hypotheses by order of likelihood, and it gives a plausible-sounding answer, but that really starts to look like BS. There’s some empirical evidence that AI is better than humans at making empirical predictions, but AI isn’t some kind of perfect oracle.
It seems like cheating to ask AI to generate 15 hypotheses and then say that it’s too hard to evaluate them all. Why should we care that AI can spit out empirical predictions at will? AI has no empirical basis for thinking that any of these hypotheses has any particular likelihood of being true; it’s just remixing empirical predictions it saw somewhere on the Internet.
But humans also do that. One reason I think AI has such powerful implications for law is that AI learns and reasons a little like how human lawyers do. I’m not referring to the hypothesis that “the extreme and unexpected success of large language models is indirect evidence that human brains’ computations are similar to large language models’ computations.” Rather, I’m observing that human lawyers output legal reasoning by remixing the legal texts on which their minds have been trained—kind of like AI. In writing judicial decisions, judges typically don’t rule based on visual observations or empirical experiments. Instead they evaluate written texts—lawyers’ briefs—based on mental models that have been trained on thousands of prior briefs and cases the judges have read before. Lawyers who write briefs that confidently offer empirical predictions are doing the same thing—they’re making arguments based on their experience reading and writing lots of legal texts containing lots of arguments.
So if you think that speculating about AI-generated predictions is a poor way of deciding cases, you probably don’t want to speculate about human-generated predictions, either.
Hi and thanks for reading this. If you are not a lawyer acutely interested in standards of review, turn back now!
It’s a background legal principle that agencies’ factual determinations—including both subsidiary factual determinations (“when did his back pain start?”) and ultimate factual determinations (“is it a close call whether he has a service-connected disability or is it not a close call?”)—are reviewed deferentially by courts. So when Congress says that a reviewing court should do what is “due,” my default assumption is that the reviewing court should follow that background legal principle.
At the oral argument, Justice Gorsuch pushed back on this argument, asking the following question:
“JUSTICE GORSUCH: If that's the case, then what do we do about the fact that courts all the time do sufficiency-of-the-evidence review de novo based on the record, again, in the light most favorable? And the -- the next section of (b) -- (b)(2) is the same -- works the same way, I think, on your -- on your understanding as well, that the court, in deciding whether there's harmless error, takes all the non-clearly erroneous facts and asks de novo whether, as a matter of law, it would have made any difference, the -- the -- the error, that is.”
I don’t agree with the premises of this question. There’s no such thing as “sufficiency-of-the-evidence review de novo” that’s analogous to what the veterans are seeking here. When a judge asks, after a trial, whether sufficient evidence supports a verdict, he isn’t reviewing any determination by the fact-finder (the jury) de novo. The jury is deciding the case; the judge is deciding whether sufficient evidence supports the jury’s verdict. The appellate court then reviews the trial judge’s sufficiency-of-the-evidence determination de novo, but the trial judge isn’t the fact-finder, so again, you don’t have a court reviewing a fact-finder’s determination de novo.
Likewise, there’s no such thing as de novo harmless error review. The appellate court isn’t reviewing anything when it conducts a harmless error analysis. The trial court wouldn’t have conducted a harmless-error analysis because, presumably, it didn’t realize it was making an error.
Here, however, the veterans are asking for true de novo review of the VA’s benefit-of-the-doubt determination: on their theory, the VA makes a factual determination, and then the Veterans Court reviews that same factual determination with fresh eyes. That would be a very unusual scheme.
"There are no inconsistencies across adjudicators because there’s one adjudicator."
This seems like a bold claim, when AI can be inconsistent with itself!
Given a complex prompt and difficult questions, which this seems like, it's totally possible that an AI would come down on one side XX% of the time and YY% on the other.
Sure you could go one level deeper and have AI adjudicate 10 times and take the mode, or flag cases that are inconsistent for manual review, but I think assuming AI totally eliminates this problem is simply incorrect.
Learn about the legal deception history in my podcast here:
https://soberchristiangentlemanpodcast.substack.com/p/s2-ep-6-legal-deception-the-magic