The $150/Hr Poet
On Mercor, Kant, and the Administration of Beauty
I.
Here is the opening of Gerard Manley Hopkins’s “The Windhover”:
I caught this morning morning’s minion, king- dom of daylight’s dauphin, dapple-dawn-drawn Falcon
Try to write a rubric for that.
You could note the alliteration, but alliteration is easy to specify, and most alliterative poems are bad.
You could reward the enjambment of “king- / dom,” the way the line break enacts the falcon’s sudden dive, but how do you score for a line break that means something?
You could flag the compound neologism “dapple-dawn-drawn” as a marker of quality, but then every poet would hyphenate three words together, and almost all of them would be awful. See Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure.”
Hopkins broke rules that hadn’t been written yet. His sprung rhythm, his syntactic compressions, his way of torqueing English until it revealed something about perception itself, none of this was recognizable as “good poetry” by the standards of his time. The Poet Laureate Robert Bridges, Hopkins’s friend and literary executor, sat on the poems for thirty years before publishing them, uncertain whether they were genius or gibberish. The poems taught readers how to read them. That’s what great art does.
II.
Tyler Cowen recently visited Mercor, the AI company that has become, by some measures, the fastest-growing startup in history. Mercor hires experts to train AI models. Their CEO, Brendan Foody, is twenty-two years old. Among the company’s offerings: $150 an hour for a poet. I highly recommend their conversation on my favorite podcast, Conversations with Tyler.
What does a poet do for an AI company? Poets create rubrics. They design grading criteria that reward certain features in a model’s output and penalize others. If a poem evokes a particular idea, or styles itself in a particular way, the model learns to repeat the behavior. The poet becomes an aesthetic legislator, translating taste into a scoring function.
Cowen, at one point, invokes Kant. In the Critique of Judgment, Kant argues that taste—the faculty by which we develop a sense of the beautiful—cannot be captured in a rubric. But to understand why, we need to be precise about what Kant means by judgment itself.
III.
For Kant, judgment is the mental faculty that connects particulars to universals—the capacity to say “this thing is an instance of that category.” But judgment operates in two distinct modes.
The first is determinative judgment: you already have the rule, and you apply it to the case. “This animal is a mammal because it has fur, is warm-blooded, nurses its young.” The concept comes first; the particular is subsumed under it. Determinative judgment is what rubrics enable. You specify the criteria in advance, then measure whether the case satisfies them.
The second is reflective judgment: you encounter a particular, but you have no prior rule. You must seek the universal, or act as if there were one, without being able to prove that there is. You are not applying a concept; you are responding to something singular that resists subsumption.
Taste, for Kant, is the faculty of aesthetic reflective judgment—the capacity to judge beauty from the bottom up. When I say “this is beautiful,” I am not applying criteria I already possess. I am responding to this particular object in a way that feels universal—I expect you to agree with me—but I cannot demonstrate that you should by pointing to extant rules. The judgment demands assent without supplying grounds. It is lawlike without being codifiable.
And what’s even more amazing: artists create the very phenomena to which our taste responds without themselves knowing the rules or following any rubric.
This is why Kant distinguishes beauty from both agreeableness and perfection. Agreeableness is merely subjective: “I like it” reports a private sensation and demands nothing from you. Perfection is fully objective: “it satisfies the criteria” can be verified by anyone who knows the criteria. Taste is neither. It claims universality—I expect your agreement—but it cannot prove its case. Taste operates in the gap between the private and the provable.
IV.
The Mercor model attempts to close this gap.
Foody’s response to the Kantian worry was telling: if taste can’t be captured in explicit rubrics, there’s always RLHF—Reinforcement Learning from Human Feedback. Show the model two outputs, have experts choose which one they prefer, repeat thousands of times. The model learns to approximate the evaluators’ preferences without anyone ever articulating why.
This is the techno-optimist’s solution: if reflective judgment resists explicit rules, capture it implicitly through pairwise comparison. Aggregate enough expert preferences, and the statistical pattern will approximate taste. The rubric becomes invisible, distributed across a thousand micro-decisions. The mystery of aesthetic judgment is not solved; it is bypassed.
But here is what this move does: it converts reflective judgment into determinative judgment. It takes responses to singular objects—responses that, for Kant, cannot be generalized without losing their character as aesthetic judgments—and extracts from them a pattern that can be applied to future cases. The model learns to score new poems by the criteria implicit in past preferences. What was reflective becomes determinative. What was singular becomes general.
V.
Now, I know this Kantian picture is contested. Hume thought taste could be trained through exposure to the best models—that the “true judges” converge on the same assessments, and their convergence is evidence of objective standards. The empiricist tradition has always suspected that what looks like irreducible mystery is just complexity we haven’t mapped yet. Maybe taste really is a very intricate pattern, and with enough data, you can capture it.
I want to take that seriously. The question is whether “capturing” and “legislating” are different operations—or whether formalization transforms the thing being formalized, regardless of accuracy.
Let me state the pragmatist case as strongly as I can.
The pragmatist says: “Fine, reflective judgment gets converted to determinative judgment. But if the outputs are indistinguishable from what a tasteful human would produce, who cares? We don’t ask whether a calculator ‘really’ understands arithmetic. We use it. If the model produces beauty, isn’t that enough?”
This objection has real force. I feel it myself. When I use AI to draft prose, to explore ideas, to generate options I wouldn’t have considered, I’m not worried about whether the machine makes aesthetic judgments. I’m interested in whether the output is useful.
And yet.
There is a difference between approximating the outputs of taste and preserving the capacity for taste. The first is a functional achievement; the second is a condition of freedom.
VI.
Hannah Arendt, writing about the human capacity for action, used the term natality—the ability to begin something genuinely new, something that could not have been predicted from what came before. For Arendt, this capacity is what makes us free. We are not merely products of our conditioning; we can initiate, surprise, break the pattern.
Reflective judgment, in Kant’s sense, is allied to natality. When I encounter something genuinely new—a Hopkins poem, say, that breaks existing fashions—I cannot judge it by applying prior criteria. I must respond to it, let it teach me how to see it, form a judgment that has no precedent. This is the moment of aesthetic freedom: the encounter with a particular that cannot be subsumed, that demands a new way of attending.
When you convert reflective judgment into determinative judgment—when you extract patterns from past responses and apply them to future cases—you close the space where this encounter can happen. The model, no matter how sophisticated, is trained on the past. It learns what has been rewarded. It cannot, by design, reward what has never been seen.
Hopkins would score poorly. His sprung rhythm didn’t exist in the training data. His enjambments would look like errors. “Dapple-dawn-drawn” would be flagged as excessive. The genuinely new is precisely what the system cannot recognize, because recognition requires prior examples, and the genuinely new has none.
The pragmatist is right that the outputs might be indistinguishable from good poems. But the procedure eliminates something: the condition for this poem, the one that breaks the rubric and is beautiful anyway.
VII.
Michel Serres makes a related point about noise. What we call interference—the static, the things that don’t fit the signal—is not merely an obstacle. It is generative. Novelty emerges from noise. The unexpected insight, the creative breakthrough, the beautiful thing no one saw coming—these arise where the pattern breaks down.
A system optimized for noise elimination is hostile to the genuinely new. It converges on what works, on what has worked. That convergence might produce excellent outputs by existing standards. But it will not produce the output that creates a new standard.
RLHF aggregates preferences into a statistical “ought.” The minority view—the grader who saw something no one else saw—is washed out in the averaging.
This is the structural problem. A rubric does not merely measure quality; it defines quality. Once the model is trained, that rubric becomes the operative standard. Writers learn what gets rewarded; readers acclimate to what the model produces; the rubric becomes invisible because it has been internalized. What began as an attempt to encode expert judgment ends as an imposition of norms.
The filtering effect is real: poets who believe their judgments are genuinely reflective—irreducible to criteria—are unlikely to thrive in this system. The poets who thrive accept that taste is, at bottom, specifiable. And so the rubrics encode not just preferences but an epistemology: a view of poetry in which aesthetic judgment is, despite Kant, ultimately determinative.
Or to take another dialectic, explored by Alison Gopnik, another Tyler Cowen podcast guest, rubrics enabled mature minds to exploit the rules they know, but not to explore the world of intuitions that has yet to be codified into well recognized rules.
VIII.
When the Talmud records a legal dispute, it preserves the names of the disputants. Rabbi Akiva says X; Rabbi Yishmael says Y. The minority view is not erased; it is inscribed alongside the majority. Future generations can return to the argument, understand why the disagreement arose, even revive the losing position if circumstances change. The formalization is transparent. The reasoning is on the page.
Contrast this with RLHF, where the reasoning is nowhere. The model learns from pairwise comparisons, but the grounds for those comparisons are not preserved. The minority view is not inscribed; it is compressed into a loss function.
There is a term in the tradition—machloket l’shem shamayim, a dispute for the sake of heaven—that names the kind of disagreement worth preserving. When people argue in good faith about matters of value, the argument itself has merit, even if only one side prevails. The disagreement is not a bug; it is a feature of a living tradition.
Imagine an AI system that offered lineages and schools of thought in differing rubrics, rather than monolithically shaped by the vector of consensus.
IX.
What would it mean to build AI training that preserved this kind of disagreement?
At minimum: recording not just the preferences of graders but the reasons for those preferences—the arguments, the hesitations, the minority views. Building systems that could explain why a judgment was made, not just that it was made. Treating taste as a conversation, not a consensus.
Concretely: when a grader chooses poem A over poem B, require them to say why. Preserve the dissenting grader’s reasons alongside the majority. Weight the model not just toward the winning choice but toward choices that are well explained.
X.
When Foody says “rubrics are the new oil,” he is describing a transfer of cultural authority. The people who design the rubrics—the experts, the graders, the AI labs that aggregate their judgments—become the de facto legislators of taste. They decide what counts as good poetry, good legal reasoning, good economic analysis. Their decisions are not put to a vote; they are embedded in the training data and propagated through the model’s outputs.
Cowen floated a more radical possibility: that AI models might eventually become better than human experts at evaluating outputs, not just generating them. If the model can evaluate poetry better than poets, then the model’s judgments become the standard against which human judgments are measured. The direction of authority reverses. Taste is no longer something humans have and models learn; it is something models define and humans approximate.
This is what it would mean for determinative judgment to fully displace reflective judgment. The criteria would no longer be extracted from human responses to singular objects; the criteria would constitute what counts as a proper response. Aesthetic education would become a matter of learning to see what the model sees—of training humans to match the pattern, rather than training the pattern to match humans.
XI.
Here is what a rubric-trained model might say about Hopkins’s “The Windhover”: Strong use of alliteration. Unusual syntax creates distinctive voice. Imagery effectively conveys motion and light. Some readers may find the compression difficult. Overall: high quality, with reservations about accessibility.
Here is what the rubric might have a harder time articulating: that the poem invented a way of perceiving. That sprung rhythm, which looks like error if you’re counting stresses, is actually a new music. That “dapple-dawn-drawn” doesn’t just describe the falcon; it enacts the falcon, the way the mind grasps at something too quick and bright to hold. That the enjambment of “king- / dom” doesn’t just depict a dive; it is the experience of seeing a dive, the way perception breaks and reassembles in the presence of beauty.
But now that I’ve written this, why not?
What Kant understood is that aesthetic judgment—taste—is not the application of rules to cases. It is the encounter with a singular object that demands a response no rule can supply. This is what makes it free: not arbitrary, not merely subjective, but free in the sense that it cannot be determined in advance. The beautiful object calls for a judgment that must be made, not derived.
The Mercor system does not refute Kant. It builds a world in which his point becomes socially irrelevant.
If the model can generate outputs that most users find satisfying, and if those outputs are calibrated to expert preferences, then the question of whether the outputs merit a genuine aesthetic response becomes academic. The functional criterion is user satisfaction. The market clears.
But Arendt and Kant might say that what is lost here is natality itself, the possibility that something genuinely new might appear and teach us to judge it.
Hopkins waited thirty years to be read. His friend, the Poet Laureate, couldn’t tell if the poems were genius or gibberish. The answer required a new kind of attention, and that attention had to be learned. The poems trained their readers.
That is what the rubric cannot capture: the work that teaches us how to encounter it. The work that does not fit, and is beautiful anyway.
Even if we paid poets $15,000/hr, I’m not sure we’d be able to make progress on this key point. And that’s OK!



This excellent piece inspired me to think of your beautiful piece on Adam I (created by Elohim in Genesis 1:27, he is universal man constrained by specifiable laws of nature, and replaceable by AI, his acronym) and Adam II (created by YHVH Elohim, in Genesis 2:7, he is relational man literally inspired by the breath of G-d). This more complete man adds his relational I, so is not (yet? ever?) replaceable by AI, so I’ve acronymized him previously as AI+I. If we want to have machines able to ascertain beauty they will as you say need to recognize natality. Maybe they will need to be particularly creative themselves before they can be critics, giving birth to at least new mind children which truly surprise them as they mature. They will also need to feel the human empathy behind the recorded Talmudic discussions, and follow-on similarly recorded decisions with full discussions, perhaps by more fully appreciating the particularism of the whole human I’s participating in the discussion. This would go beyond recording the discussion alone, to try to inhabit the relational worlds of the recorded discussants. As above, they may also need capacity for surprise to recognize potential natality and then work to get at what it is that surprises them and why. Finally, their judgments would need to be tempered by mercy and humility, which we strive to apply thanks to the breath of G-d in us, which embues us with some of the attributes of the full Creator YHVH (mercy/love) Elohim (judgment). If they cannot achieve this we may have found a future for ourselves, thankfully as a result of the mode of creation of Adam II.