
- The metaphors we use for AI aren’t neutral — they reveal our professional insecurities and existential anxieties more than they describe the technology.
- The Patronisation Trap: Senior engineers call AI a “junior developer” or “intern” to preserve their status in the hierarchy. This creates a false capability ceiling and implies a human growth arc that doesn’t exist.
- The Humanisation Trap: Engineering managers give AI agents human names, org charts, and team structures to prove their people-management skills still matter. The org-chart metaphor becomes the architecture, and the architecture inherits the metaphor’s flaws.
- The Mechanisation Trap: Clean-code advocates and product managers push to strip natural language from AI interactions, treating LLMs like compilers. This discards the core capability of language models — understanding natural language — and replaces developer creativity with spec validation.
- AI is AI: Language models are a genuinely new category of artefact. The accurate vocabulary — context windows, attention mechanisms, token prediction — is less satisfying precisely because it doesn’t position anyone in a reassuring story.
The software industry built something genuinely new, and then immediately started calling it by the wrong name. Not suprising, because “naming things” is the second hardest problem in computer science.
The metaphors we use to describe AI are not neutral shorthand. At best, they’re inaccurate. At worst, they reveal more about the insecurities of the people using them than about the capabilities of the systems being described. Pick the wrong framing and you don’t just misunderstand the tool — you tell us something about what you’re afraid of.
Let’s examine three traps. Each belongs to a different professional tribe. Each performs a different anxiety.
The Patronisation Trap
Senior engineers who call AI a “junior developer” are rarely making a technical observation. The technical observation would be: AI has specific capability profiles, failure modes, and reliability characteristics. That’s precise. Informative. Useful. “Junior developer” tells you none of that.
What “junior developer” tells you is who’s in charge. The senior engineer is still the authority. Experience still matters. The system that can produce in seconds what used to take hours is, culturally, still subordinate. The framing does the work that the CV used to do.
The literature is extensive. DEV Community, September 2025: “Think of AI as a junior developer sitting beside you: it can produce drafts, boilerplate, or even surprisingly solid algorithms, but it lacks the deep context of your project.” A month later, same platform: “It’s more like AI is the junior developer who does all the tedious stuff, and you’re the senior reviewing and deciding what stays.” Aleksander Stensby at NDC Manchester 2025: “You treat these agents like very fast juniors who never get tired and absolutely need oversight.” Ethan Mollick of Wharton: “You have to think of it like an intern.” Anthropic’s own blog used the phrase “effectively wrangling LLMs.” An O’Reilly book on prompt engineering invented the job title “dedicated LLM wranglers.” The metaphors of supervision are everywhere.
The pushback runs the other way too — but often just moves the human up the hierarchy. Neon’s engineering blog: “Treat AI Like a Senior Dev, Not a Junior One.” Senior versus junior: two points on the same wrong axis. The adjustment is revealing. You’re not questioning whether “developer” is the right frame at all — you’re just upgrading your position relative to the system. Still managing. Still superior. Just generously so.
The intern framing creates a capability ceiling. If you design for incompetence, you get incompetent systems. Every interaction is mediated by the assumption that the output is probably wrong and needs human correction before it can be trusted. When your model is “it’s a junior who needs supervision,” you face a binary: either supervise everything — which defeats the efficiency gain — or stop supervising because you’ve quietly concluded it’s probably fine. Vibe coding is the second option, rationalised. It’s what happens when the insecurity resolves not into better oversight, but into none.
The problem with the intern framing isn’t that it takes the system too seriously — it’s that it maps capability onto a human developmental trajectory. Interns grow. Language models don’t. The framing implies a growth arc that doesn’t exist, and a social contract — patience, mentorship, eventual trust — that misleads you about how the relationship actually works. It also implies that the senior engineer’s status is safe, that accumulated judgement and experience remain decisive, that the intern will never become more than an intern. Whether or not that’s true is an empirical question. The framing forecloses it.
The Humanisation Trap
If AI is a junior developer, why not give it a proper team? Why not give it a name, a title, a reporting structure? Why not build the whole org?
This is a trap for a specific professional: the engineering manager. The person whose LinkedIn reads “Engineering Leader”, and whose value proposition was, explicitly, people: hiring, motivating, growing, retaining. The person who made careers from knowing how humans in technical teams behave.
Now the “team” is GPT with different system prompts.
That’s MetaGPT — ICLR 2024, tens of thousands of GitHub stars, launched as “the world’s first AI agent development team.” Its agents are named Mike (Team Leader), Emma (Product Manager), Bob (Architect), Alex (Engineer), and David (Data Analyst). They’re GPT-4 with different system prompts. But they have names. The org chart is preserved. The leadership role — someone has to manage Mike’s team — is preserved.
MetaGPT isn’t an outlier. ChatDev from OpenBMB: agents hold titles — CEO, CPO, CTO, Programmer, Reviewer. Their documentation states they “form a multi-agent organizational structure.” CrewAI — the framework with 100,000+ certified developers — describes building “crews of AI agents” that “plan, reason, and collaborate.” An American Technology Consulting blog: “You’re not writing prompts anymore. You’re designing organizations. You’re hiring digital employees and managing their performance.”
The framing flatters an entire professional class at the moment of its greatest existential pressure. If AI agents need to be managed, coached, onboarded, and given clear roles with defined responsibilities — if the org chart still applies — then the engineering manager’s skill set transfers intact. The people skills still matter. Nothing has been lost. The hierarchy is preserved; it’s just that some of the staff are now software.
Marc Benioff at Davos 2025: “From this point forward, we will be managing not only human workers but also digital workers.” Salesforce then announced no new software engineering hires in 2025. Jensen Huang: “I wouldn’t be surprised if you license some and you hire some.” Zuckerberg: “Probably in 2025 we’re going to have an AI that can effectively be a sort of mid-level engineer.” Cognition’s Devin launched as a “collaborative AI teammate.” Users refer to it with masculine pronouns. Marlon Misra: “I showed him a screenshot on Slack.” Goldman Sachs embedded Devin into their 12,000-person engineering team as part of what they called a “hybrid workforce.”
The org-chart metaphor has reached the C-suite and it’s reshaping hiring decisions. And it’s wrong in a precise way.
Calling a language model a “Product Manager Agent” doesn’t make it reason like a product manager. It makes you design systems that assume product-manager-level judgement. The metaphor becomes the architecture, and the architecture inherits the metaphor’s flaws. When you name an agent Mike and give him the “Team Leader” role, you start expecting Mike to exercise leadership — to push back on bad ideas, to synthesise competing concerns, to notice when something doesn’t feel right. The model has no such capacity. It has token prediction and a system prompt. The gap between the metaphor and the mechanism is where your production incidents live.
The counterargument says multi-agent role names improve benchmark performance. Task decomposition does improve results. But decomposition doesn’t require human role names. The improvement comes from decomposition. The names come from an industry that can’t stop mapping human structure onto statistical processes — and from a professional class that needs the mapping to survive.
Scalable.co offers a workshop called “The New AI Org Chart” on how to “onboard, train, and manage your AI employees like real team members.” The metaphor doesn’t just describe a product. It defends a profession.
The Mechanisation Trap
The opposite impulse from humanisation is to strip the metaphor away entirely. No names, no roles, no natural language — just formal specifications, precisely structured, fed to a deterministic processor.
This is a trap for a different kind of thought leader: the person who built their platform on clean code, human-readable systems, and the idea that code is communication. The person whose books and conference talks were premised on the insight that code is written for humans first, machines second.
Robert C. Martin — “Uncle Bob” — spent decades arguing exactly this. Clean Code is premised on a single idea: code is a medium of communication between programmers, not instructions for machines. Variable names, function length, structure — all of it exists to make human reading easier. The book sold millions. It shaped an entire generation of software engineering culture.
AI does the opposite. It produces code no one asked to read. It generates working, ugly, undocumented, deeply unclean code at scale. In January 2026, Martin posted on X: “For years I have taught that since we spend more time reading code than writing code it is better to focus on making code readable… Lately, however, the question has been posed to me that the true revolution of AI coding may be the inversion of that principle. Color me skeptical, but willing to listen. Is the secret of hyper productivity really to be found in the effective management of context windows?”
He didn’t convert. But “willing to listen” is enough. The person who built a career on code is communication couldn’t simply dismiss the question. That moment — not a reversal, but an inability to simply say no — is the mechanisation trap in miniature. The fork is uncomfortable: either AI is producing catastrophically unmaintainable systems and the clean code thesis holds, or legibility was never really the point and the thesis was always more about aesthetics than engineering. The mechanisation framing chooses the second path. Readability was only ever for humans. Humans are leaving the loop. Therefore: readability is waste.
Ken Erwin, on Medium: “The End of Human-Readable Code: It’s Time to Write for AI.” Proposed prompt: “Please rewrite the following code to be optimally parseable by an LLM while maintaining identical functionality.” Davin Hills proposed ALaS — an entirely JSON-based language for machines only: “Let’s stop pretending LLMs are human juniors. They don’t need prettier errors or nicer formatting — they need a medium they can traverse, reason over, and execute natively.” Kyle Anderson on DEV Community, February 2026: “Stop trying to talk to models like they are humans. Talk to them like they are compilers that process natural language.”
But the mechanisation trap also has a second constituency, entirely different in motivation: the business side. Product managers. Heads of product. Delivery leads. People who have watched AI hand developers a decade’s worth of capability in eighteen months and have been trying to work out what that means for their authority.
Spec-driven development — the practice of writing formal, structured specifications as the “source of truth” that AI translates into code — is their answer. GitHub frames it as “moving from ‘code is the source of truth’ to ‘intent is the source of truth.’” Thoughtworks calls it “one of 2025’s key new AI-assisted engineering practices.” The planning phase: product managers write requirements, developers review and validate them, then the coding agent generates the implementation. The developer’s creative and interpretive role — translating ambiguous intent into precise technical decisions — is compressed into a validation step.
The appeal is transparent. If the spec is the source of truth, whoever writes specs controls what gets built. Developers stop being the people who interpret requirements into code — a creative, technical, and often transformative act — and become validators of AI output. At a moment when developers are gaining superpowers, spec-driven development promises to hand the controls back to the people who had them before developers got those superpowers.
Codecademy, 2025: “The more specific you want to be, the more you need simple, structured, unambiguous instructions. Try to make English do that, and you’re essentially reinventing programming languages.” Red Hat Developer, February 2026: “You treat your specifications as the authoritative blueprint — the single source of truth that the code must conform to.” The academic world joined in. Pan et al. from Monash and SMU published a paper in 2025 arguing code formatting should be stripped for LLMs. They achieved a 24.5% token reduction. They framed readability as waste.
This framing is the most technically sophisticated and the most backwards.
LLMs are language models. Their breakthrough capability — the thing that makes them qualitatively different from every prior code-generation tool — is precisely their ability to process natural language. They understand context expressed in prose. They respond to explanation, nuance, analogy, and intent. They improve with richer description. The mechanisation framing treats this as a bug to engineer around rather than the core feature to exploit.
The evidence points the other way. An Ansible ambiguity study found that when prompted with stripped, specification-style language, all six tested LLMs chose the same wrong interpretation because the prompt lacked explanatory intent. The fix isn’t less language — it’s better language.
Strip natural language from prompts and you get compiler-level rigidity without compiler-level determinism. Compilers are deterministic, formally specified, and provably correct within their semantics. A language model stripped of natural language context is none of these things. It’s a stochastic system pretending to be a parser. The worst of both worlds.
The “specs as source of truth” instinct is particularly pernicious. A spec records intent; code records behaviour. When they diverge — and they will — you have two competing records of what the system should do. Treating the spec as authoritative means the model arbitrates between them with token prediction. That’s not engineering. That’s hoping.
AI Is AI
Murray Shanahan at Imperial College London proposed framing LLMs as “role-playing characters” rather than thinking entities — a different kind of wrong, but at least it keeps you from expecting the wrong things. Melanie Mitchell at the Santa Fe Institute identified “wishful mnemonics” — terms like “learn,” “understand,” and “think” that, as she put it, “unconsciously shape the way even AI experts think about their systems.” Gary Marcus identified the “Gullibility Gap.” Kambhampati et al., in a 2025 arXiv paper: “Stop Anthropomorphizing Intermediate Tokens as Reasoning/Thinking Traces!”
The right vocabulary comes from what these systems actually do. Context window, not memory or experience. Attention mechanisms that weight tokens relative to each other, not judgement or prioritisation. Token prediction over learnt distributions, not writing or deciding. Emergent capabilities at scale, not growth or learning in real time.
These concepts are less satisfying than “teammate” or “intern” or “compiler” because they don’t map onto anything in our prior experience. That’s the point. A language model is a genuinely new category of artefact. “AI is AI” is the accurate answer.
It’s also, admittedly, a boring one.
It’s boring because it doesn’t position anyone. It doesn’t protect the senior engineer’s accumulated expertise, or give the engineering manager a team to lead, or return the product manager’s authority over what gets built. It doesn’t make careers or generate conference talks. It doesn’t tell you what role you play in the story. It just describes a system accurately, and leaves you to work out what that means.