Only Broken Vessels Prove the Clay
Thinking with Tyler Cowen and Sam Altman about AI Poetry
As a poet and an AI optimist, I loved this exchange between Tyler Cowen and Sam Altman on Conversations with Tyler:
COWEN: How good will GPT-6 be at poetry?
ALTMAN: How good do you think GPT-5 is at poetry?
COWEN: Not that good. It’s not what I want it for, so that’s not a complaint. My guess is, in a year, you’ll have some model that can write a poem as good as the median Pablo Neruda poem, but not the best.
ALTMAN: I was going to say, I don’t want to say GPT, whether it’s 6 or 7, but I think we will get to something where you will say, “This is not a long way to the very best, but this is like a real poet’s okay poem.”
COWEN: In my view, there’s a big gap between a Neruda poem that’s a 7 on a scale of 1 to 10 and one that’s a 10. I’m not sure you’ll ever reach the 10. I think you’ll reach the 8.8 within a few years.
ALTMAN: I think we will reach the 10, and you won’t care.
COWEN: Who won’t care?
ALTMAN: You won’t care.
COWEN: I’ll care. I promise.
ALTMAN: You’ll care in terms of the technological accomplishment, but in terms of the great pieces of art and emotion and whatever else produced by humanity, you care a lot about the person or that a person produced it. It’s definitely something for an AI to write a 10 on its technical merits.
My classic example of this is, the greatest chess players don’t really care that AI is hugely better than them at chess. It doesn’t demotivate them to play. They don’t really care that they are. They really care about beating the other human, and they really get obsessed with that dude sitting across from them. The fact that the AI is better, they don’t care. Watching two AIs play each other, not that fun for that long.
COWEN: Let me tell you my worry about reaching the 10. Evaluations rely a lot on these rubrics. The rubrics will become good enough to produce very good poems, but maybe there’s something about the 10 poem that stands outside the rubric. If you’re just training on rubrics, rubrics, rubrics, it might in a way be counterproductive for reaching the 10.
ALTMAN: Evals can rely on a lot of things, including when you call upon the 10 and when you don’t.
COWEN: Sure.
ALTMAN: You can read a bunch in the process and provide some real-time signal.
COWEN: Say we have no human poets today writing 10s, and we’re asking those same people to judge and grade the GPTs. I’m worried. Again, I think it will be fine. To me, we’re talking about a 9, not a 10. You don’t have William Wordsworth working for OpenAI.
ALTMAN: This gets to a very interesting thing, which is, let’s say you can’t write a 10, but you can decide when something is a 10. That might be all that we need.
COWEN: Maybe humanity only decides collectively what’s a 10, and there’s something a little mysterious and history-laden about that process.
ALTMAN: Okay, but still, we can do it. Now, maybe our decision is not very good because it is history-related and it does drift over time, and some things we all agree are great, the next generation decides they’re not, whatever, but if whatever process humanity has to determine what poem is a 10, you could imagine that providing some sort of signal to an AI. Now that, again, if you know it’s an AI, maybe you don’t care. We see this phenomenon with AI art, but yes.
If Altman is right that “we will reach the 10, and you won’t care,” we’ve been confused about what we value. If Cowen is right, Altman misunderstands what a 10 actually is.
Their debate centers on a simple question: Will AI ever write a great poem? Not a good one—both grant GPT will reach competence—but the poem that reshapes what poetry can be.
Their disagreement maps onto deeper fault lines in how we think about art, evaluation, and the relationship between maker and made.
Cowen’s worry deserves attention: “Evaluations rely a lot on these rubrics. The rubrics will become good enough to produce very good poems, but maybe there’s something about the 10 poem that stands outside the rubric.” Cowen’s suspicion of rubrics accords with his Hayekian sensibility that decentralized intelligence outperforms centralized intelligence.
Michael Polanyi taught us we know more than we can tell, the physicist knows when an X-ray shows a disease before she can articulate the diagnostic criteria. The wine expert tastes terroir before she names it. And we recognize a great poem before we can explain why it’s great, or even after we’ve exhausted our explanations, something remains unexplained.
If AI training depends on rubrics, even sophisticated ones, even ones that include “when you call upon the 10 and when you don’,” then it’s optimizing for articulable criteria. But the 10 might be precisely what exceeds articulation. Not because it’s mystical or irrational, but because it operates at the edge of our evaluative frameworks, pressing against them, forcing them to expand. Is Wordsworth himself able to say what makes a great poem? And even if he is? Won’t a Wordsworth machine dilute the intangible elements that make a poem great?
Think of the initial reception of Picasso’s Les Demoiselles d’Avignon. Apollinaire was confused. Braque felt “as if someone was drinking gasoline and spitting fire.” The painting wasn’t recognized as a 10 immediately—it was polarizing, category-violating, almost unwatchable. Only retroactively could we develop the critical vocabulary to explain its greatness.
The 10 announces itself through resistance, through what Heidegger called the “rift” (Riss) between world and earth, the way great art ruptures our settled ways of seeing.
Thomas Kuhn’s paradigm shifts apply here: you cannot evaluate revolutionary science by normal science’s standards. The breakthrough is incommensurable with what came before. Could this be added to the evals? Only after the fact, which means the AI would always be playing catch-up to the last revolution, never instigating the next one.
In Jewish terms, we might borrow a dialectic from Proverbs 1:8
שְׁמַ֣ע בְּ֭נִי מוּסַ֣ר אָבִ֑יךָ וְאַל־תִּ֝טֹּ֗שׁ תּוֹרַ֥ת אִמֶּֽךָ׃
My son, heed the discipline of your father (musar avicha),
And do not forsake the instruction of your mother (torat imecha).
Musar avicha—the father’s instruction—is the realm of analysis, argument, the intricate architecture of a Tosafot.
It can be transmitted through explanation, broken into principles and rules.
But torat imecha names something else: that which is learned through experience and observation, in the home.
A great poem has both dimensions. You can anatomize its meter, map its imagery, trace its allusions (musar avicha). But something remains: call it breath, rhythm, the particular quality of attention it demands and reward, that operates below the threshold of analysis.
This raises a puzzle: Can you be objective about a 10, or only about an 8?
An 8 operates within established paradigms. We have consensus about what makes a poem good—command of technique, original imagery, emotional authenticity, structural coherence. These criteria can be articulated, taught, applied with reasonable inter-rater reliability. The 8 is objectively good within a shared framework.
But the 10 is different. The 10 destabilizes the framework itself. When Coltrane recorded A Love Supreme, was it objectively a masterpiece on day one? Many listeners found it abrasive, excessive, unlistenable. Objectivity suggests timeless, universal standards. But our encounter with the truly new is always historically conditioned, filtered through prejudices we don’t know we have until they’re challenged.
Arthur Danto argued that what makes something art depends on the “artworld”—the network of theory, history, and institutions that frame our perception. This is Cowen’s point about how “humanity only decides collectively what’s a 10, and there’s something a little mysterious and history-laden about that process.” The 10 isn’t just technically superior; it’s recognized as such through a historical process we participate in but don’t fully control.
Could an AI learn this historical sensitivity, this capacity to recognize when something genuine breaks through? Perhaps. But here’s the deeper problem: The AI would be learning to predict human aesthetic judgment, which means it’s always derivative of human taste. It could become exquisitely calibrated to what we already value. But the 10 that matters is the one we don’t yet know how to value, the one that forces us to revise our standards.
Stanley Cavell wrote about criteria and judgment: criteria guide us in normal cases, but in the encounter with the genuinely new, criteria fail us. We must judge without criteria, which means accepting responsibility for judgments we cannot fully justify. This vulnerability—this exposure to the claim of the new—might be constitutive of aesthetic experience at its heights.
Altman’s chess analogy reveals his deeper claim: “The greatest chess players don’t really care that AI is hugely better than them at chess. It doesn’t demotivate them to play. They don’t really care that they are. They really care about beating the other human.”
Art, on this view, is testimonial. We care who made it because art is a form of address, an offering from one consciousness to another. When Paul Celan writes “Niemand zeugt für den Zeugen”—”No one bears witness for the witness”—the line’s weight comes from Celan’s position as a survivor-poet, someone writing from and about the impossibility of testimony after the Shoah. Remove that biographical fact and the poem doesn’t just lose context; it loses its ground, its right to speak these particular words.
Emmanuel Levinas argued that ethics begins in the face-to-face encounter, in the presence of the Other who addresses me and makes a claim on me I cannot refuse. Art might work similarly—not just as an arrangement of words or pigments, but as an address from a particular someone to whom I am obligated to respond. The poem says: I am speaking to you. Hear me.
Walter Benjamin wrote of the “aura” of the artwork in the age of mechanical reproduction—the sense of unique presence, of here-and-now-ness that withers when art becomes infinitely reproducible. The original painting has aura; the poster doesn’t. Perhaps AI poetry is auratic-less by nature, not because of its mode of production but because there’s no one there, no presence behind the words to authenticate them.
Yet Altman might counter: Isn’t this prejudice? Why does the maker matter if the work stands on its own? After all, we don’t discount Coleridge’s “Kubla Khan” because it came in an opium dream. We don’t care that the Homeric epics emerged from oral tradition without a single author. Why should algorithmic production matter?
Here’s another way to think about it: Isn’t greatness a function of difficulty or virtuosity? In which case, once AI dissolves the difficulty, the frontier migrates elsewhere.
When Glenn Gould plays Bach’s Goldberg Variations, part of what we’re hearing is human virtuosity, fingers moving at impossible speeds, executing ornaments with crystalline precision. If a computer can play it perfectly every time, does that diminish Gould’s achievement? Not exactly. But it does shift what we value. We begin caring more about interpretation than execution, more about phrasing choices than technical mastery.
The same might happen with poetry. If AI can generate metrically perfect sonnets with fresh imagery and emotional depth, then human poets would need to do something else. Perhaps conceptual poetry—work that’s interesting for its framing rather than its linguistic texture. Perhaps performance poetry—work that lives in the embodied presence of the poet. Perhaps constraint-based experiments—where the interesting thing is the arbitrary rule you imposed, not the output itself.
Heidegger in “The Origin of the Work of Art” suggests that art isn’t about overcoming technical challenges but about “setting up a world and setting forth the earth”—opening a space of meaning while also revealing the material resistance through which that meaning emerges.
If an AI writes poems effortlessly, without the friction of language resisting the poet’s intention, then even if the outputs are beautiful, they might not be art in the sense we’ve cared about. Art must dramatize the struggle of its own emergence. By definition it cannot be automated.
If we stop making art ourselves, then we’ll probably also lose our appreciation for art. In other words, slop will become a determining category not just of our output, but of our hermeneutic posture
Gadamer argued that understanding is always an “event,” a fusion of horizons between the work and the interpreter. We bring ourselves to the encounter. The work addresses us, but we must be capable of being addressed. If we’ve stopped making art, we might lose the receptive capacity the art requires.
The 10 isn’t a point on a scale. It’s a breach, an opening, a moment when the framework itself becomes visible and therefore vulnerable to transformation. It’s Celan’s broken German after Auschwitz. It’s Coltrane’s scream-prayer. It’s the awkwardness of Dickinson’s em dashes (which now take on a new dimension in the age of Chat GPT writing)—reaching for something language couldn’t quite hold.
And here is a poem I wrote, iterating with Claude. You decide it’s merits.
Here are some of the rules I used to generate it:
- A Petrarchan sonnet written in the style of a poem esteemed by New Criticism (ambiguity, double meaning),
- Allusions to Homer, Dante, Wordsworth, Breaking Bad, and the Sopranos.
- About AI poetry without being about AI poetry.
The Jug
A jug of common clay—its wine-dark weight
Could hold what Homer poured for gods and men,
Or be one soul in Dante’s kiln, the pen
Of terracotta circling through its fate.
My thumbs once pressed these walls. Now I create
Through algorithms’ throw—waste management
Of overflow, what Wordsworth meant, now sent
Clean-metered through the permitted gate.
The chemistry is pure. The blue burns bright—
Ninety-nine percent. But whose hand threw
This curve that drinks the perfect shaping light?
No flaw to say what’s made or who.
Touch here: the seam, the crack where glaze gives way.
Only broken vessels prove the clay.
P.S.—We’ve been enjoying long responses to our essays. If you have an original piece you’d like to write in response to any of our pieces, pitch me at zohar@lightningstudios.ai





