Cat Wu went on Lenny’s Podcast last week and described a product org that ships features in days, removes process barriers by default, and lets engineers go from a Twitter complaint to a shipped fix by Friday. It was a great conversation. The kind that makes you want to tear up your quarterly roadmap and start deploying before lunch.
I listened to the whole thing twice. And I’m not impressed by the worldview it presents. I think this episode is being absorbed uncritically by product leaders who will try to copy Anthropic’s operating model without understanding why it only works at Anthropic.
The conversation focused almost entirely on shipping speed. How fast Anthropic moves, how they remove barriers, how they compress timelines from months to days. What it barely touched is the other half of the equation: how do you know you’re shipping the right thing? Speed is one axis. Decision quality is the other. Optimizing for one while ignoring the other produces a product that ships fast and learns slowly.
Here is what I think the episode got wrong, what it got right, and what it left out entirely.
1. Token access is a structural advantage, not a replicable one
Cat describes a company where employees use frontier models internally with generous token budgets. Engineers spin up sub-agents. Co-work builds slide decks overnight by crawling Slack and Google Drive. She mentions Applied AI team members running five to ten customer engagements a day with Co-work preparing meeting dossiers the night before.
If you work at Anthropic, your inference costs are internal. Your models are the product. Your feedback loop is measured in hours, because the people testing the model are also building the harness around it.
Most teams pay for tokens. Every agent call, every background workflow, every parallel session shows up on an invoice. When Cat talks about spinning up 50 or 100 Claudes simultaneously as a near-term future, the economics of that vision look very different for a 30-person B2B company than they do for the company that owns the compute.
The lesson here is real. AI does give teams leverage. But your leverage calculation has to include cost-per-task, not just time-per-task. If your automation costs more than the salary of the person it replaced, you’ve bought a faster way to burn money.
2. “Low process and friction” is a philosophy, not a playbook
Cat’s operating philosophy is explicit: remove every single barrier to shipping. Research previews with no commitment. An evergreen launch room where engineers post finished features and marketing turns around the announcement by the next morning. No multi-quarter roadmap alignment. No gates.
The Claude Code team has one of the cleanest dependency graphs in software to make this work. The model ships from research. The harness wraps the model. The product surfaces the harness. That is a hub-and-spoke topology with a single hub. One core model team. Every other team orbits around it.
The moment your team topology grows more complicated, and teams depend on each other’s iteration cycles, the low-friction model breaks down. You need service contracts. You need SLAs. You need some version of what AWS built internally: a catalog of services, clear ownership boundaries, and rules about who can break whom.
This is the part that does not get enough attention. Most B2B companies are not hub-and-spoke. They are a mesh. The payments team depends on the platform team. The platform team depends on the infrastructure team. The infrastructure team depends on compliance. When team A ships a breaking change on Friday because they felt empowered to “just do things,” team B discovers on Monday that their integration is down and their demo with an enterprise prospect is broken. In a mesh, speed without coordination is a tax that other teams pay.
Software is still a team sport. Roles are merging, yes. But the moment you have more than one team shipping into the same product surface, you need the connective tissue that Anthropic’s topology lets them skip: shared contracts, release coordination, and clear ownership of the seams between services. You can call that process overhead. You can also call it the reason your product stays coherent as your team scales past 30 people.
Cat acknowledged this cost openly. At Anthropic’s scale, with their model moat, product inconsistency is a reasonable trade. For teams selling into regulated verticals where a confused user experience costs you a six-figure renewal, it is a different equation.
3. “The most efficient shipping unit is an engineer with great product taste” works for one kind of product
Cat said this, and in the context of Claude Code it makes total sense. Claude Code’s target persona is a professional developer. The engineers building it are, themselves, the persona. They feel the friction. They see the complaints on Twitter. They know what a good CLI should do because they live in one.
Generalize this principle to a healthcare SaaS product, a compliance platform, or a supply chain tool, and it falls apart. In complex B2B vertical domains, the moat is deep customer and industry context: workflows, constraints, regulatory risk, buying dynamics that you do not develop by being a talented engineer who reads Twitter.
Product taste is necessary. But when the domain gap between builder and buyer is wide, product taste alone leads you to build elegant solutions to the wrong problems. That is where partnership with a PM who carries customer context and domain expertise still creates disproportionate value. The roles may be merging, sure. But in most product categories, the merger looks more like PMs who can ship code and engineers who learn the domain, not one role replacing the other.
4. Nobody is talking about what “ship every day” does to humans
Cat described a culture that leans into chaos. Sunday night P0s that get eclipsed by Monday afternoon P0000s. Launching buggy features and sleeping fine because the next release will fix them. A team full of people who face every challenge with a smile.
That sounds great on a podcast. It also sounds like a recipe for burnout that has been well-documented across the industry.
HBR published research in February showing that AI adoption creates three types of work intensification: task expansion, increased multitasking, and blurred work-life boundaries. The finding that stands out: every minute saved by AI gets absorbed as a minute available for more work. The productivity surge at the beginning gives way to cognitive fatigue, weakened decision-making, and turnover.
A separate study found that developers who embrace AI tools the hardest are the ones burning out the fastest. The AI removes the natural speed limit that used to protect people. Before AI, there was a ceiling on how much you could produce in a day. That ceiling was frustrating, but it was also a governor. AI removed the governor. Now the only limit is your cognitive endurance, and most people do not know their limit until they have blown past it.
Cat’s team has self-selected for this pace. Anthropic hires people who have been through cycles and know how to manage their energy. But “hire people who can handle it” is not a burnout strategy. It is a selection filter that works until it does not.
If you are running a team and the takeaway from this podcast is “we need to ship faster and embrace more chaos,” please also ask: can my team sustain this for 18 months? What does retention look like if we crank the pace and keep it there? The Anthropic answer is “we have the most exciting AI models on the planet and $19 billion in ARR.” Most teams do not have that compensating factor.
5. Users do not absorb features at the speed you ship them
Cat was honest about this. She said users feel like they need to check Twitter every day to keep up. She described /powerup as an admission that the product had grown too complex for users to navigate on their own, despite the original principle that no onboarding should be necessary.
This is worth sitting with. If your own product team builds a guided tutorial because users cannot figure out which of the 100 features to use, the shipping cadence has outpaced user absorption. You are creating education debt faster than you are retiring it.
In B2B, this compounds. Your champion inside the customer org is not checking Twitter for your latest feature. They are managing their own backlog. Their team adopted your product six months ago and configured it for a workflow that has since been redesigned by three new feature drops they missed. Your renewal conversation now starts with “wait, you rebuilt that view?” instead of “here is how we measured impact this quarter.”
Shipping speed and user value are correlated up to a point. Past that point, speed creates confusion, and confusion erodes the trust that keeps enterprise customers renewing.
6. Velocity without decision quality is noise
Cat described a metrics-driven team. Weekly readouts. Shared goals. Team principles that define who the key users are. All good. But when she described how features actually get built, the process was: engineer sees feedback on Twitter, builds the fix, ships it by Friday. Or: the team notices something through internal dogfooding, posts it in the launch room, and it goes out the next day.
That works when your signal source is narrow and your feedback loop is tight. Claude Code’s users post their frustrations publicly. The team uses the product themselves. The gap between “what customers want” and “what we think they want” is small because the builders are the users.
Most product teams do not have that luxury. Their signal is scattered across Gong calls, support tickets, Slack threads, Salesforce notes, and quarterly business reviews. The person filing the complaint is not the person who signs the check. The feature that gets the most vocal demand on Twitter may not be the one that protects your biggest renewal. A PM who reads 200 GitHub issues and picks the right 10 is doing valuable work, but that selection only matters if the 10 are the ones that connect to revenue, retention, and expansion.
This is the decision quality problem that the podcast did not address. Shipping speed answers “how fast can we get this out?” Decision quality answers “should we be building this at all, and for whom, and what is the expected business outcome?”
Here is what solving this looks like in practice. A customer success manager logs a feature request in a Gong call. That request gets automatically linked to the account’s ARR, renewal date, and segment. The PM sees it alongside 40 other requests, but now each one carries business context: this request is tied to $800K in pipeline across three accounts, that one is a single user’s preference with no revenue signal. The PM who ships the $800K feature and deprioritizes the other is making a decision backed by evidence. The PM who picks based on volume or gut is gambling.
This is what we build at Bagel AI. We connect product decisions to customer signals and business outcomes so that teams can prioritize with evidence, not intuition. The point is not that every team needs our specific product. The point is that without some version of this connective tissue, speed just helps you arrive at the wrong destination faster.
When Cat talks about the future of PM being someone who can define what the product should look like a month from now, that framing centers on product vision. It leaves out the evidence layer that makes that vision defensible. Vision without evidence is opinion. Evidence without speed is stale. You need both.
Anthropic can afford to ship speculatively because their model improvements make previously failed features viable. Cat described building products that do not work yet and waiting for the next model to close the gap. That strategy only applies when you control the underlying capability curve. For teams building on top of models they do not own, speculative shipping means burning cycles on features that may never reach the quality bar, with no guarantee that the next model release will fix the gap.
The teams that sustain velocity over multiple quarters are the ones that pair speed with evidence. They know which customer segments are churning and why. They know which feature gaps are blocking pipeline. They can trace a shipped feature back to the signal that triggered it and forward to the adoption metric that validated it. That feedback loop is what turns raw speed into product velocity that compounds.
7. “Build, don’t buy” only works if your build process is tight enough
One of the most interesting parts of the conversation was Cat describing how Anthropic employees build custom internal tools instead of buying SaaS. A sales rep built a web app that pulls context from Salesforce and Gong to auto-customize pitch decks. Engineers build personalized work software for custom use cases.
This is real, and it is a legitimate advantage of cheap code. But it is only viable when your team ships internal tools fast enough that the build cost stays below the subscription-plus-configuration cost of buying the thing.
Anthropic can do that because Claude Code is their own product, their process is tight, and their engineers can go end-to-end without waiting on anyone. Most teams cannot. Internal tooling efforts end up half-finished and abandoned. The deck customizer gets built in a week, maintained by nobody, and broken by the next CRM migration. Six months later, someone buys the SaaS product they were trying to replace.
The unlock is not “build instead of buy.” It is building the process discipline that makes internal tooling worth the investment. If your team cannot ship a working internal tool in under a week and maintain it with near-zero overhead, buying remains the better bet for most use cases.
8. The source code leak tells a different story than the podcast
Cat described the leak as human error and said the person was not fired. Fair enough. But the leaked codebase is a receipt, and it tells you what the speed actually costs.
Public analysis of the 512,000-line TypeScript codebase found a single print function spanning 3,167 lines. Regex-based frustration detection at a company that builds LLMs. A source map leak that was the second such incident in 13 months, arriving five days after a separate CMS misconfiguration exposed thousands of internal files. The community reaction was not kind.
Cat also described a strategy of building products that do not work yet and waiting for the next model to close the gap. That is a fascinating approach if you control the model roadmap. But it also means the team shipped WIP that failed repeated internal reviews until the underlying capability caught up. The codebase reflects that history. When your process is designed to minimize friction and maximize speed, the output carries the fingerprints of every shortcut. That is a trade you can make consciously. But calling it “discipline” while the evidence says “fast accumulation of maintenance debt” is a stretch.
For teams that do not own the model, shipping features that depend on a capability improvement you cannot control or schedule is called hope. And hope is a bad product strategy.
What the episode got right
Credit where it is due. A few things from the conversation are worth taking seriously regardless of your team size or domain.
Clear goals cut through ambiguity. Cat’s point about defining the key user, the core problem, and the top use cases is good PM practice in any era. LLMs make this harder because the technology is so general that everything feels possible. Narrowing scope is more valuable now than it was five years ago.
Research previews reduce commitment cost. Shipping features clearly branded as experimental and iterating based on feedback is a pattern any team can adopt. The key is the branding. If your users think it is GA, the bar for quality is different than if they know they are testing an early version.
Asking the model to introspect is genuinely underrated. Cat called this the most underrated skill, and she is right. She described asking Claude why it skipped a frontend verification step and learning that the system prompt was misleading. The model told her it had delegated verification to a sub-agent that did not run the test, and the parent agent did not check the sub-agent’s work. That kind of diagnostic conversation surfaces fixable issues in the harness, not model limitations. If you are building with AI agents and you are not regularly asking the model “why did you do that?” after an unexpected result, you are debugging blind. The model often knows what misled it. You just have to ask.
Evals remain the most underleveraged PM skill. Cat’s point that 10 great evals are more useful than none is correct and actionable. You do not need a dedicated eval infrastructure to start. You need a few concrete scenarios where you can measure whether the thing you shipped actually works.
The real velocity question
Every product leader watching this episode wants the same thing Anthropic has: a team that moves from idea to shipped product in days, not quarters. That ambition is correct. The mistake is assuming you get there the same way Anthropic does.
Anthropic’s speed is real. But it flows from a structural position that most companies do not share: they build the model, wrap the harness, control the inference cost, and sell the result. Their engineers are their own users. Their dependency graph is simple. Their feedback loop is measured in hours because the people testing the product are the people building it. Every process advantage in that podcast traces back to these conditions.
Your team can move just as fast. But you need a different engine to get there.
When your builders are not your buyers, speed depends on decision quality. How quickly can your team identify which customer signal matters most? How fast can a PM connect a feature request to the pipeline it protects, the segment it serves, and the renewal it unlocks? How confidently can an engineer pick up a task knowing it has already been validated against business outcomes?
That is the infrastructure Anthropic does not need, because their topology lets them skip it. And it is exactly the infrastructure most B2B teams are missing.
Process improvements, faster CI, smaller PRs, research previews: these answer the question “how do we ship faster?” They do not answer “how do we know we are shipping the right thing?” The first question is about the speed of your hands. The second is about the quality of your judgment. You need both.
This is what we build at Bagel AI. We connect customer signals from your GTM tools to revenue-connected prioritization and dev-ready artifacts, so that the path from signal to shipped outcome is fast, evidence-based, and traceable. Not because speed does not matter. Because speed without evidence is motion, and motion without direction is waste.
Cat’s motto is “just do things.” The teams that will actually keep pace with Anthropic are the ones that also know which things, for whom, and why those things and not the 40 other options in the backlog.



