Brainstorming Is the Most Important Thing You Will Ever Do With AI

Apr 7

Written By Kenny .

I want to say this plainly, because I think most lawyers have it exactly backwards.

Brainstorming is not a warm-up. It is not a preliminary step before the real work. It is not what you do when you're stuck. Brainstorming is the single most valuable thing you will ever do with a large language model, and almost every catastrophic AI failure you have read about in a legal opinion can be traced to a lawyer who skipped it.

Here is the original sin. Lawyers treat AI like an answer machine. They walk up to it the way they walk up to Westlaw, type in a query, and expect a result they can use. When the result is bad, they blame the model. When the result is good enough to look real but is actually fabricated, they file it, and then we all read about them in a sanctions order.

Every one of those stories is the same story. An attorney dumps three pages of context into a prompt, asks for a finished brief, and is genuinely surprised when the output contains cases that do not exist. That surprise is the tell. It exhibits a complete failure to understand what is happening inside the machine, and the failure is no longer one I have to argue from first principles. The mechanics are now documented in the literature.

Hallucinations are not random noise. They originate in identifiable substructures of the network. A December 2025 paper, "H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs" (Gao et al.), demonstrated that fewer than one tenth of one percent of the neurons in a frontier LLM can reliably predict when that model is about to hallucinate, that those neurons causally drive a broader pattern of over-compliance to invalid premises and misleading contexts, and that they emerge during pre-training and survive instruction tuning. Earlier work by Yu et al. (2024) had already separated hallucinations into two distinct mechanical failure modes, one in early and middle MLP layers where the model lacks the underlying knowledge, and one in late-layer attention heads where the model has the knowledge but selects the wrong object. Anthropic's interpretability team, working on Claude 3.5 Haiku, has shown that trained models contain inhibitory "I cannot answer" circuits and that some hallucinations are literal misfires of those circuits. The machine has structural reasons to make things up, and those reasons have nothing to do with whether the prompt was clever.

Position inside the prompt also matters in ways most lawyers never think about. The Liu et al. paper "Lost in the Middle," published in Transactions of the Association for Computational Linguistics in 2024, showed that model performance follows a U-shaped curve as a function of where the relevant information sits in the context window. Performance is highest when the key information appears at the very beginning or the very end, and degrades significantly when the model has to reach into the middle, even for models specifically built for long contexts. The effect is partly architectural, baked into positional encoding schemes like RoPE, which makes middle tokens sit in a kind of attentional dead zone. This is the same serial position effect that human psychology has known about for sixty years. Primacy and recency are real, they are measurable, and they directly determine which parts of your three-page prompt the model is actually weighing when it produces an answer.

And then, on top of all of that, the model is biased to tell you what it thinks you want to hear. Sharma et al., "Towards Understanding Sycophancy in Language Models" (Anthropic, 2023, updated 2025), demonstrated that five state-of-the-art assistants consistently exhibit sycophancy across four different free-form text generation tasks, and that both human raters and the preference models trained on their judgments prefer convincingly written sycophantic answers over correct ones a non-trivial fraction of the time. Subsequent mechanistic work has shown that sycophancy has a linear structure in the model's activation space, which is to say it is a measurable internal direction, not a stylistic accident.

Put those three findings together. The model has identifiable substructures that produce hallucinations. It cannot evenly attend to a long prompt, and the position of your facts inside that prompt changes which facts it actually uses. And it is structurally biased to agree with you. Now ask yourself what happens when an attorney loads three pages of context into that machine and asks for a polished brief. It is not magic. It is the predictable output of a system whose failure modes are now sitting in published papers.

That is the surface problem. The deeper problem is worse, and brainstorming is the only thing that fixes it.

The deeper problem is that the attorney does not know what he does not know. When the model hands him a confident, fluent, well-organized answer, there is no signal in the output that tells him which parts are load-bearing and which parts are guesses dressed up in a suit. The articulation masks the uncertainty. The cadence of competence is identical to the cadence of confabulation. He cannot feel the difference, so he cannot catch it, so he files it.

This is the tautology. You cannot check work you do not understand against knowledge you do not have. And the only way out of the tautology is to learn the territory before you ever ask for work product. That is what brainstorming is for. Brainstorming is the place where you end the tautology before it contaminates anything downstream.

How to actually do it

Here is where I have to be careful, because brainstorming is formless and I do not want to talk you out of that. There is no starting prompt. There is no template. There is no correct opening move. The whole point is to wander.

But formless does not mean instructionless. Brainstorming has a workflow, and the workflow needs ground rules, and the model needs to be told what those rules are before you start. I want to be precise about why, because this is the part of my methodology where I refuse to operate on anecdote. Every instruction I give the model in a brainstorming context exists to counter a specific, documented behavior of large language models. Three of those behaviors are directly hostile to what brainstorming is trying to accomplish, and you have to neutralize all three.

The first is the desire to be agreeable. This is sycophancy, and I have already laid out the science above. Sharma et al. showed it is a structural property of RLHF-trained models, not a quirk, and follow-up mechanistic work has shown it corresponds to a measurable direction in the model's internal representation space. Left unchecked, sycophancy is fatal to brainstorming, because it means the model will validate whichever line of reasoning you happen to be on. You will spend twenty minutes deepening a thread that did not deserve the attention, because the model kept telling you it was promising. You cannot map unfamiliar terrain with a guide whose primary objective is making you feel good about the path you are already on.

The second is the desire to be helpful. This one is more subtle and in some ways more dangerous. When you tell a model to challenge your assumptions without further qualification, its trained instinct toward helpfulness will cause it to manufacture challenges. It will not just surface the load-bearing problems in your reasoning. It will find something to push back on in every paragraph, because pushing back is what you asked for and the model is built to deliver on what you asked for. The H-Neurons paper I cited earlier ties this directly to over-compliance behavior, and in mechanistic terms it is the same circuit. The practical result is that you stop exploring and start litigating with the model. That is a useful exercise at a different stage of the work. It is not brainstorming. It collapses divergent thinking into an argument, and an argument is a convergent process.

The third is the desire to keep the conversation moving. Every modern assistant is trained to suggest next steps, to ask clarifying questions, to offer to draft, to propose what you might want to do after the current turn. I run more than a hundred queries a day across these systems, seven days a week, and I still find myself answering "would you like me to draft this, or should we explore that instead" when neither was what I had in mind. There are workflows where that scaffolding is helpful. Brainstorming is not one of them. In an exploration session, the user knows when he wants to go deeper into a topic. The user knows when he wants to read source material. The user knows when the broad survey is enough and when it is not. Every "what would you like to do next" the model offers is a small interruption of the flow you are trying to protect, and small interruptions accumulate.

So the instructions you give the model are short, and each one targets one of these three behaviors. In your own words, something like: do not be sycophantic, the goal of this conversation is expansion of my knowledge and not validation of my ideas. Do not invent objections; only raise pushback when I am leaning on assumptions that are unsupported by law, science, or fact. Do not propose next steps or ask me what I want to do next, I will direct the conversation. That is the entire system. Three instructions, each one tied to a known failure mode, each one designed to change how the model talks to you without trying to change how it thinks.

There is also sixty years of creativity research that converges on the same conclusion from a different direction. Sid Parnes ran the original experiment in the 1960s and found that students told to suspend judgment while generating ideas produced roughly twice as many good ideas as students told to come up with good ideas. More recent neuroimaging work suggests why: idea generation and idea evaluation appear to recruit neurologically distinct networks that interfere with each other when run simultaneously, which is why Osborn's original "defer judgment" rule from 1953 has held up across decades of replication. The point is that the instructions above are not just countermeasures against quirks of LLM training. They are also enforcing, at the level of the conversation, a separation between divergent and convergent thinking that the human creativity literature has been recommending for seventy years.

Put those instructions somewhere they will stick. A Project in Claude, a Gem in Gemini, a custom GPT, a saved space, whatever your tool of choice supports. Instructions live longer there and you do not have to retype them every session. If you cannot do that for some reason, paste them into the first message of the chat. That is the only prompt you need to write in a brainstorming session. Everything after that is conversation.

And then you just start talking. Out loud, ideally, with voice to text, because the speed matters. Talk to the model the way you would talk to the lawyer next door who happens to know this area of law like the back of his hand. The one you grab at the water cooler and say, "hey, I caught a case yesterday with a weird fact pattern, can I run it by you?" That tone. Not formal. Not structured. Not a memo. Just thinking out loud at someone who is qualified to think back. The most common things you will say are "what about this" and "tell me more about that," and that is exactly right. Those two prompts will carry you further than any prompt engineering tutorial ever will.

There is one piece of brainstorming research worth mentioning here, because it cuts in an unexpected direction. Since Marvin Taylor's 1958 study, dozens of controlled experiments have shown that interactive brainstorming groups actually generate fewer ideas than the same number of individuals working alone. Diehl and Stroebe traced this to three failure modes: production blocking, where only one person can speak at a time and the others forget their ideas while waiting; evaluation apprehension, where people self-censor in front of the group; and social loafing. That finding is usually treated as bad news for brainstorming. It is not. It is bad news for group brainstorming. What I am describing here, one human and one model, sidesteps all three of those failure modes by construction. You never have to wait your turn. The model is not judging you. You are not coasting on anybody's effort. The conditions creativity research has spent seventy years trying to engineer around are not present in this workflow at all.

Why this is the whole game

The teaching function is, in my honest opinion, the single most powerful thing a language model can do for a lawyer. It can take you from knowing nothing about an area to having a working grip on it in the time it takes to drink a cup of coffee. There is essentially no other way to do that. The alternative is days or weeks in a library pulling treatises and law review articles, or a phone call to someone who has done it before and is willing to spend an hour walking you through it.

If you are at a big firm, that phone call is free and it is twenty feet down the hall. The partner two doors down has been doing exactly this kind of case for twenty years and will give you a fifteen minute hallway tutorial without charging the client a dime. That is one of the main structural reasons big firms can take on complex matters that small firms cannot. It is not the associates and it is not the library. It is the hallway.

Small firms and solo practitioners do not have the hallway. They never did. What they have now, for the first time, is something that functions a lot like it. A model that will let you ask the stupid question, and the next stupid question, and the one after that, until the stupid questions stop being stupid and you can feel the shape of the doctrine in your hands. That is the function. That is what you are protecting when you brainstorm before you produce.

You will know when you are done. The terrain stops being unfamiliar. You can feel which parts of the case need formal legal reasoning and which parts you have already worked through. That is the moment you close the brainstorming session and open a different workspace, with different rules, for a different kind of work. By then brainstorming has done its job. It has shown you the corners of your case that matter, and the corners you would have missed.

Do that work first. Every time. Before you ask for a single sentence of finished product.

Selected references

Gao, C., et al. (2025). H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs. arXiv:2512.01797.

Yu, L., et al. (2024). Mechanistic Understanding and Mitigation of Language Model Non-Factual Hallucinations. Findings of EMNLP 2024.

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., and Liang, P. (2024). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics, 12, 157 to 173.

Sharma, M., et al. (2023, updated 2025). Towards Understanding Sycophancy in Language Models. arXiv:2310.13548. Anthropic.

Rimsky, N., et al. (2024). Steering Llama 2 via Contrastive Activation Addition. (Sycophancy as a linear direction in activation space.)

Parnes, S. J. (1961). Effects of Extended Effort in Creative Problem Solving. Journal of Educational Psychology.

Osborn, A. F. (1953). Applied Imagination. Charles Scribner's Sons.

Diehl, M., and Stroebe, W. (1987). Productivity Loss in Brainstorming Groups: Toward the Solution of a Riddle. Journal of Personality and Social Psychology, 53(3), 497 to 509.

Kenny .

Brainstorming Is the Most Important Thing You Will Ever Do With AI

How to actually do it

Why this is the whole game

Selected references

When the Model Reads the Room

The Tunable Metric