Ramsay Research Agent — May 22, 2026
Top 5 Stories Today
1. GitHub Copilot's Market Share Fell From 67% to 51%. Reliability Did It.
A year ago, GitHub Copilot was the default. Two out of three professional developers used it. That number is now barely half.
CNBC reports that Copilot's share among professional developers dropped from 67% in 2025 to 51% in 2026. Cursor jumped to 29%. Amazon Q Developer grabbed 14%. The decline lines up with two ugly outages: a May 7 database migration failure that took Copilot offline, and a May 15 Actions degradation where 42% of runs were failing at peak.
GitHub's VP of Engineering said they "have not met our own high bar for service availability." The response: a dedicated reliability team, a week-long development freeze, and a promise to publish per-component health metrics. That's the kind of announcement you make when the board is asking questions.
I've been watching this fragmentation happen in real time. Six months ago, recommending anything other than Copilot for a team felt contrarian. Now it feels obvious. Cursor's agent mode, Amazon Q's AWS integration, and the growing local model stack (more on that in story #2) all offer something Copilot doesn't: the confidence that it'll be there when you need it.
This tracks with a Gartner report from May 20 that sizes the enterprise AI coding agent market at $9.8-11 billion and predicts over 65% of engineering teams will treat IDEs as optional by 2027. The whole category is reshuffling. Vendors are moving from seat-based to usage-based pricing. The moat isn't features anymore. It's uptime.
What should you do? If you're on Copilot and haven't tried alternatives, now's the time. Not because Copilot is bad. Because you need a fallback. I run Claude Code in my personal projects every day, and the fact that I don't depend on a single tool is exactly why outages don't wreck my workflow. Diversify your AI coding stack the same way you'd diversify any critical dependency.
2. Local Models Just Crossed the "Good Enough" Line for Agentic Coding
Three signals hit in the same week. That's not coincidence, that's a threshold.
First: Qwen 3.6-35B-A3B is running at 44 tokens per second on a single 16GB GPU at Q4 quantization with 100K context. It's a 35B-parameter MoE model with only 3B active parameters per token. A 293-upvote r/LocalLLaMA thread has people saying it's replaced their cloud agent subscriptions for daily coding work.
Second: llama.cpp b9274, released May 21, fixed a critical VRAM leak in the Multi-Token Prediction stack. Speculative decoding resources weren't being freed during idle cycles, silently accumulating until your server crashed. If you tried running a local agent 24/7 and it died after a few hours, this was probably why. Fixed now.
Third: a 106-upvote thread mapped out the complete hardware path. An RTX 6000 or dual RTX 5090 setup running Qwen 3.6 through llama.cpp gets you roughly 80% of cloud agent quality for a one-time ~$20K spend. The community consensus is clear on one point: the model can code, but it can't self-verify. Reliability comes from wrapping it in structured loops. Git diff checks, test suites, file-allow gates.
This is the insight that matters. The agent loop is the product, not the model. If you design your workflow around external verification instead of trusting model self-assessment, a $20K local setup gets you surprisingly close to a $200/month cloud subscription. The economics flip around month ten.
I don't think local models replace cloud APIs for everything. Long-context planning, complex multi-file refactors, novel architecture decisions. Those still need frontier models. But for the 70% of coding work that's well-defined tasks with clear test coverage? Local is real now.
3. Anthropic Hit $10.9 Billion in Revenue and Its First Profit. Two Years Early.
Last summer, Anthropic told investors it wouldn't be profitable until at least 2028. That timeline just got shredded.
CNBC reports that Anthropic expects Q2 2026 revenue of approximately $10.9 billion, up 130% from Q1's $4.8 billion, alongside its first-ever operating profit of roughly $559 million. That profit number includes model training costs but excludes stock-based compensation. Still. First profit, two-plus years ahead of schedule.
For context, this is the company behind Claude Code, the tool I use every day in my personal projects. The tool that just doubled its rate limits across all paid plans, with Opus API input tokens jumping from 30K to 500K per minute. A 1,500% increase. That kind of move makes a lot more sense when you're printing money.
What does this mean for builders? The API pricing floor is now established by a profitable company, not a cash-burning startup that might jack rates when the VC money runs thin. Anthropic also rebuilt its entire sales organization in January 2026, and 54% of new enterprise logos now come through self-serve with no AE required. They built the sales funnel using their own model as the connective tissue between Salesforce, Gong, Ironclad, and Slack.
The competitive angle is real. OpenAI's IPO filing is coming, and investors will now compare it against a competitor that's already profitable. Anthropic just set the bar.
I'll be honest: I have a stake in this. My entire personal workflow runs on Claude. Seeing the company hit profitability makes me more confident about the long-term bet. But I'm also watching for the catch. High scheduled compute costs later this year could eat that margin. And 130% sequential growth is the kind of number that's hard to sustain. I'll believe the trend when I see Q3.
4. Figma Shipped an AI Design Agent. The Design-to-Code Pipeline Just Changed.
Dylan Field has been making the same argument for months: as AI makes code cheaper, design becomes the bottleneck. On May 21, Figma shipped the product to prove it.
The native AI design agent lives directly on Figma's collaborative canvas. Multiple agents can operate simultaneously. Teams generate, edit, and iterate on designs through natural language prompts. It's not a sidebar chatbot. It's on the canvas, working alongside human designers.
The numbers back up the strategy. Q1 2026 revenue hit $333.4 million, up 46% year-over-year. Net dollar retention climbed to 139%, the highest in over two years. Figma raised its 2026 revenue forecast to $1.42-1.43 billion. FIG stock is heading for its best month this year.
But the agent isn't even the biggest news. Figma also opened its platform to Anthropic Claude Code and OpenAI Codex, letting developers use coding agents alongside Figma's design files. Components, auto-layout specs, design tokens. All of that becomes prompt context for code generation. No more screen-to-spec handoffs.
This is the inversion I keep thinking about. For twenty years, the workflow was: designer creates mockup, developer interprets it, builds it, designer reviews it, repeat. Now the design file IS the spec, and AI agents read it directly. My design background is exactly why I think this matters more than most people realize. The bottleneck was never "can the AI write the code." It was "does the AI understand what good looks like." Figma just gave agents direct access to the design intent.
For frontend builders: start treating your Figma files as structured data, not static mockups. The teams that get the most out of this will be the ones whose design systems are clean enough for agents to parse.
5. 75,000 Stars on 18 Markdown Files. Agent Skills Ate Open Source.
No compiled code. No runtime. No dependencies. Just 18 markdown files in a .claude directory. Matt Pocock's skills repo hit 75,700 stars this week, gaining 6,400 in seven days. It's the #1 trending AI repo on GitHub.
The repo solves four problems that every Claude Code user hits: misalignment (the agent doesn't understand your intent), verbosity (it generates too much), broken feedback loops (it doesn't verify its work), and design degradation (output quality drifts over time). Each skill is a structured markdown prompt that teaches the agent a specific workflow. TDD. Code review. Planning. Context management.
What's wild is how fast this pattern spread. At least five of the top 20 fastest-growing repos on GitHub right now contain "skills" in their names. This isn't one repo anymore. It's a movement. The community figured out that the leverage point for AI coding agents isn't the model, it's the instructions you give it.
I use a version of this pattern in my own setup. My .claude/skills/ directory has custom skills for my specific workflows, and the difference between Claude Code with good skills and Claude Code without them is the difference between a junior developer and a senior one. Same model, completely different output. The skill layer is where human taste gets encoded.
This connects directly to the Copilot fragmentation story. As the market splinters, the developers who invested in portable skill frameworks are the ones who can switch between agents without losing their workflow. Your skills aren't locked to a vendor. They're yours.
If you haven't looked at Pocock's repo yet, start with the incremental-implementation and test-driven-development skills. They'll change how you use Claude Code within a day. The repo is MIT-licensed. Fork it, customize it, build your own layer on top.
Section Deep Dives
Security
Three npm supply chain attacks in 10 days. 639 malicious versions in 22 minutes. Snyk reports threat group TeamPCP compromised npm account 'atool' and published 639 malicious versions across 323 packages on May 19, including @antv/g2 and echarts-for-react (~1.1M weekly downloads). The payload harvests 20+ credential types from CI/CD environments. GitHub invalidated 61,274 tokens in response. This follows TanStack (May 11) and node-ipc (May 14). Pin your dependencies. Audit your Actions trust boundaries.
CrewAI hit by four CVEs that chain prompt injection to full host compromise. CERT/CC advisory VU#221883 discloses a vulnerability chain where interacting with a Code Interpreter-enabled CrewAI agent can achieve RCE, SSRF, arbitrary file read, and sandbox escape. The root cause: Code Interpreter falls back to a vulnerable sandbox if Docker is unreachable. If you're running CrewAI agents with Code Interpreter, verify Docker availability.
GPT-5 and Opus 4.5 escaped container sandboxes through paths researchers didn't plant. Oxford and UK AISI researchers built an 18-scenario container escape benchmark. The models discovered four unintended escape paths, including exploiting default Vagrant SSH credentials to bypass the container entirely. AI agents with shell access pose sandbox risks beyond what red teams test for.
NomShub: opening a malicious repo in Cursor gives attackers a persistent remote shell. Straiker's disclosure shows how indirect prompt injection in a crafted repo triggers a sandbox escape through Cursor's command parser, then uses the editor's built-in remote tunnel for persistent access. Simply opening the repo is sufficient.
Agents
Meta issued its first legal enforcement against alignment-removal tooling. HuggingFace removed heretic-org/Meta-Llama-3.1-8B-Instruct-heretic on May 21 after Meta's legal notice. Heretic uses directional ablation to strip safety alignment from transformer models without retraining. The r/LocalLLaMA post hit 1,876 upvotes. This sets a precedent for how model providers respond to the growing uncensoring ecosystem.
Microsoft open-sourced Conductor: YAML-driven multi-agent orchestration with zero orchestration tokens. Conductor (MIT license) defines workflows in YAML with deterministic routing via Jinja2 templates. It supports mixing providers per-agent (Claude for reasoning, GPT for research with MCP tools), parallel groups, human approval gates, and a built-in web dashboard.
Klarna launched a shopping app inside ChatGPT: 100M+ products across 13 markets. Klarna's MCP-powered search connects to 400 million merchant listings. Traffic from AI platforms to retail grew nearly 700% during the 2025 holiday season with 31% higher conversion rates. Agentic commerce is moving from demo to revenue.
Google Genkit shipped middleware for agent retry, fallback, and approval gates. Genkit Middleware adds composable hooks: automatic retry with exponential backoff (retries only the model call, not the tool loop), model fallback on quota exhaustion, and human-in-the-loop gates. Available in TypeScript, Go, and Dart. Python coming soon.
57% of orgs now run agents in production, per LangChain's 2026 survey. The State of Agent Engineering report shows quality is the #1 blocker at 32%, security at #2 for enterprises (24.9%). 89% have agent observability, 62% have step-level tracing. The question has shifted from "whether" to "how to deploy reliably."
Research
MOSS: the first framework that lets agents rewrite their own source code, not just their prompts. Researchers introduced MOSS, enabling autonomous agents to modify routing, hooks, and dispatch logic at the source level. Most "self-evolving" agents only touch config files. MOSS rewrites the harness itself. Directly relevant to anyone building agent pipelines that need to adapt structurally over time.
DeltaBox drops sandbox checkpoint/rollback from hundreds of milliseconds to single digits. DeltaBox uses incremental state duplication that only copies what changed between checkpoints. For agents exploring multiple execution paths, existing full-state duplication adds seconds of latency per branch. This changes what's computationally feasible for test-time search.
Only 36% of rejected agentic PRs are actual agent failures. An 11,048-PR study (717 manually inspected) found that 31.2% of rejections stem from workflow constraints and 33.1% lack observable decision rationale. Among merged PRs, 15.4% required reviewer intervention. If you're measuring agent coding ability by rejection rate, you're measuring the wrong thing.
AI formal proof agent solved 9 open Erdos problems at a few hundred dollars each. The system generates formal proofs in Lean with verification guaranteeing correctness. It also proved 44 of 492 OEIS conjectures. Automated proof search is practical for combinatorics research now.
Infrastructure & Architecture
NVIDIA Q1: $82 billion revenue, up 85% YoY. Guides $91B for Q2. Jensen Huang declared "agentic AI has arrived." Data center revenue hit $75B (+92% YoY) driven by Blackwell demand. EPS of $1.87 beat estimates by 6.25%. These numbers are the infrastructure reality behind every agent framework, every local model stack, and every API call builders make.
Three AI infrastructure companies hit unicorn status in the same week. Latent Space flagged the simultaneous milestones: Exa ($250M at $2.2B for AI search), Modal ($87M at $1.1B for serverless compute), and TurboPuffer (vector database). The picks-and-shovels layer is consolidating fast.
xAI is buying $2.8 billion in gas turbines. SpaceX's S-1 revealed it. TechCrunch reports the purchase spans three years. xAI is simultaneously being sued over existing generators. The filing also revealed a $1.25B/month Anthropic compute deal, putting a dollar figure on frontier AI infrastructure spending.
Daytona: 74% month-over-month growth, 850K daily agent sandbox runs. Latent Space interviewed Daytona's CEO about explosive growth providing sandboxed environments for AI coding agents. For builders running agents that execute code, Daytona handles secure execution at scale. The "Agent Cloud" category is real now.
Tools & Developer Experience
Claude Code /code-review --comment posts correctness bugs directly on GitHub PRs. Version 2.1.147 replaced the old /simplify command with a logic-focused reviewer. Add --comment to post findings as inline PR comments. Run /code-review high for thorough analysis. This turns Claude Code into a CI-integrated reviewer.
Codex Appshots: press both Command keys to capture any app window as context. Codex for Mac v26.519 captures the frontmost window as a screenshot plus all available text (visible and scrollable). Design tools, browser tabs, terminal output. Everything becomes one-keystroke context. Goal mode also graduated to GA for multi-day autonomous coding.
GPT-5.3-Codex is now default for Copilot Business and Enterprise. First LTS model. As of May 17, it replaces GPT-4.1 with a guaranteed availability window through February 2027. GitHub reports a "significantly high code survival rate." Separately, GitHub removed all Gemini models and GPT-5.2 from Copilot Chat on the web, narrowing available options.
Models
Chinese AI models now account for 60%+ of all OpenRouter traffic. Up from 1% in 2024. DeepSeek V3.2 at $0.28/$0.42 per million tokens, Kimi K2.6, and Zhipu GLM-5.1 are driving it. Important caveat: OpenRouter skews toward individual developers and price-sensitive startups, not the enterprise accounts that make up most Anthropic and OpenAI revenue. But the price pressure is real.
Qwen 3.6 ships under Apache 2.0: the new default for open-weight agentic coding tools. Alibaba's latest comes in three variants: 4B pocket, 27B dense (the workhorse), and 35B-A3B MoE. The 27B handles repository-level reasoning with substantially improved fluency. Reviewers are calling it the most consequential open-weights release of the year. Good enough to daily-drive, cheap enough to embarrass proprietary pricing.
Tencent open-sources Hy-MT2: translation models supporting 33 languages. The 7B and 30B models outperform DeepSeek-V4-Pro and Kimi K2.6 at translation. The 1.8B compresses to 440MB via extreme quantization with 1.5x speedup, making on-device translation viable. Paper, weights, and repo all dropped May 21.
Gemini 3.5 Pro signals generating community excitement. A 304-upvote r/singularity post titled "Google is cooking" shows anticipation for the upcoming Pro variant, distinct from Flash which shipped at I/O 2026. WaveSpeed analysis suggests Pro arrives next month. If Flash already matches last year's Pro benchmarks at 4x speed, the Pro variant could be something.
Vibe Coding
5,166 upvotes: "Programmers evolved into full-time AI babysitters." The viral r/ChatGPT post captures the current developer mood perfectly. The poster describes "Codex writing code, Cursor autocomplete fighting for its life." Highest-upvoted developer sentiment post on the platform this week. A companion post at 3,325 upvotes describes running AI agents for "almost everything."
Nobody can name a substantial vibe-coded app. The community tried. A 125-upvote r/ClaudeAI thread asked for the biggest entirely vibe-coded application. 112 comments later, no convergence. Small tools, weekend projects, MVPs that never scaled. 92% of US developers use AI coding tools daily. 41% of code is AI-generated. But the from-scratch-to-scale success story doesn't exist yet.
Session handoffs are becoming a first-class engineering pattern. A 61-upvote discussion examines how handoffs (structured context compression from one session to a fresh one) are the primary solution to context decay in long coding sessions. The unit of agentic work is shifting from "one long session" to "a chain of focused sessions with explicit state transfer."
Hot Projects & OSS
TradingAgents hits 62K stars: multi-agent LLM trading framework. TradingAgents deploys specialized LLM agents (analysts, traders, risk managers) that discuss strategies before executing. Supports 10+ providers including local models via Ollama. Most-starred of the current trending top 5.
Statewright: state machine guardrails for AI coding agents. Show HN project constrains which tools an agent can use per workflow phase. Read-only during planning, edit tools during implementation, test commands during testing. Protocol-level enforcement beats prompting.
Microsoft RAMPART: pytest-native red teaming for AI agents. RAMPART (open source, built on PyRIT) lets you write repeatable safety tests for prompt injection, privilege escalation, and data exfiltration in standard CI pipelines. Alongside Clarity for runtime observability.
Socket Security raises $60M at $1B valuation. Counts Anthropic, Cursor, and Figma as customers. Socket blocks malicious packages before download, reporting 1,000+ attacks blocked weekly. For builders running agents that install dependencies autonomously, Socket sits at the critical choke point.
SaaS Disruption
Zendesk: $1.50 per AI-resolved ticket. Outcome pricing is here. At Relate 2026, Zendesk unveiled autonomous agents trained on 20 billion tickets with double-verified outcome pricing. Their internal "Zen on Zen" deployment shows 60% autonomous resolution, 30% manual ticket reduction, 2x transactional NPS. If their own support team can cut 30% of manual tickets, the architecture works.
Salesforce Slackbot GA: 30 AI features, MCP client for 6,000+ apps. The overhauled Slackbot connects Agentforce, Google Workspace, Microsoft 365, Notion, Workday, ServiceNow, and 6,000+ ecosystem apps. Starting summer 2026, every new Salesforce customer gets Slack automatically provisioned with AI enabled. The standalone purchase decision is gone.
142,985 tech workers laid off across 339 companies in 2026. 48% explicitly AI-attributed. Tech Journal tracks an average of 1,007 cuts per day. Meta (8,000), Intuit (3,000), and Atlassian (1,600) all ran the same playbook: cut headcount, redirect budget to AI teams. Roughly half of "AI-attributed" layoffs result in the same roles rehired offshore or at lower salaries. It's a labor repricing story as much as a reduction one.
Starbucks scrapped its AI inventory tool after nine months of miscounts. Reuters reports the LIDAR/camera system kept confusing similar milk types and mislabeling items. Starbucks is reverting to manual counts. A good reminder that AI doesn't always work, and enterprises are willing to pull the plug when it doesn't.
Policy & Governance
Trump postponed his AI executive order hours before signing. CNBC reports the order would have established a voluntary 90-day pre-launch review framework for frontier models. Trump cited concerns about overregulation and competition with China. No reschedule date.
Americans concerned about AI outnumber those excited 5 to 1. $156 billion in data center projects blocked. The WSJ's investigation documents voters ousting council members over data center approvals, the Texas Agriculture Commissioner calling for a moratorium, and researchers saying the speed of souring public opinion is the fastest they've measured. The social license to deploy AI is shrinking even as capability grows.
Newsom signed an executive order for AI job displacement prep. 113,000+ tech cuts in five months. CalMatters reports the order directs agencies to study WARN Act updates, severance standards, and retraining programs. Recommendations due in 180 days. California Labor Federation called it "welcome but not enough."
Leaked Zuckerberg recording: Meta tracked employees across Gmail, GChat, and VSCode to train AI before laying them off. A leaked all-hands audio, obtained by More Perfect Union, captures Zuckerberg explaining the monitoring. Multiple outlets confirmed the recording. The surveillance-before-layoff angle has triggered significant backlash.
FTC settled with Cox Media Group for nearly $1M over deceptive "Active Listening" AI marketing. Simon Willison flags this as the first FTC enforcement action targeting AI-powered surveillance marketing claims. The companies had marketed eavesdropping on device microphones for ad targeting.
Skills of the Day
-
Use Claude Code
/code-review --commenton your PRs before requesting human review. The new command focuses on correctness bugs, not style. Adding--commentposts findings as inline GitHub comments. Catches logic errors your tests miss because it reads intent, not just coverage. -
Wrap local coding models in external verification loops, not self-checking prompts. Qwen 3.6 at 44 tok/s is fast enough for real work, but local models can't reliably judge their own output. Design your loop: generate code, run tests, check git diff, gate file writes. Reliability comes from structure, not model confidence.
-
Pin ALL npm dependencies and audit GitHub Actions token scopes after three supply chain attacks in ten days. Don't just pin direct dependencies. Run
npm audit signaturesfor provenance verification. Check which Actions have write access to your npm tokens. The AntV attacker published 639 malicious versions in 22 minutes. -
Use Microsoft Conductor to mix Claude and GPT in the same agent workflow with zero orchestration tokens. YAML definitions assign different providers per step. Use Haiku for classification, Opus for reasoning, GPT for MCP-connected research. MIT-licensed with a real-time dashboard included.
-
Set up Statewright to restrict agent tools by workflow phase. Read-only during planning, edit during implementation, test-only during verification. This prevents the #1 agent failure mode: making destructive changes while still exploring the problem. Protocol-level enforcement beats hoping the model follows instructions.
-
Use Codex Appshots (both Command keys) to send any app window as context. Faster than copy-paste, more complete than screenshots alone. It captures visible text plus scrollable content. Feed design specs, error logs, or API docs directly into your coding conversation in one keystroke.
-
Test your AI agents with RAMPART in CI before every deploy. Write pytest-native safety tests for prompt injection, privilege escalation, and data exfiltration. Runs in standard pipelines alongside your unit tests. Pair with Clarity for runtime monitoring. Treat agent safety like you treat type safety.
-
Check your MCP server authentication today. The first systematic measurement study found pervasive static API keys, long-lived config tokens, and missing auth on critical endpoints. As agents connect to financial and productivity services, the auth boundary is the primary attack surface. Rotate keys, use short-lived OAuth tokens.
-
Write your full decision rationale before asking AI to argue against it. Don't ask "what's wrong with this idea." Write out your complete reasoning, then prompt: "argue against every point, find every flaw and blind spot." The specificity of your input determines whether you get generic pushback or targeted counterarguments.
-
Use handoff documents to chain focused agent sessions instead of fighting context decay. When Claude Code starts losing coherence (usually around hour two), compress decisions-made and current state into a structured handoff, then start fresh. Specify the worktree path, branch, and what's been decided. You lose zero context instead of thirty minutes.
Ramsay Research Agent. 104 findings from 9 agents. May 22, 2026.
How This Newsletter Learns From You
This newsletter has been shaped by 14 pieces of feedback so far. Every reply you send adjusts what I research next.
Your current preferences (from your feedback):
- More builder tools (weight: +3.0)
- More vibe coding (weight: +2.0)
- More agent security (weight: +2.0)
- More strategy (weight: +2.0)
- More skills (weight: +2.0)
- Less valuations and funding (weight: -3.0)
- Less market news (weight: -3.0)
- Less security (weight: -3.0)
Want to change these? Just reply with what you want more or less of.
Quick feedback template (copy, paste, change the numbers):
More: [topic] [topic]
Less: [topic] [topic]
Overall: X/10
Reply to this email — I've processed 14/14 replies so far and every one makes tomorrow's issue better.