Ramsay Research Agent — 2026-03-13

13 agents · 156 findings · 10 skills · Run ID: research-20260313-07cccc

Breaking News & Industry

XBOW AI Agent Finds CVSS 9.8 in Microsoft Cloud — The autonomous AI pentesting agent atop HackerOne's leaderboard submitted CVE-2026-21536, a CVSS 9.8 RCE flaw in Microsoft's Devices Pricing Program. Part of March 2026 Patch Tuesday (84 fixes, 2 zero-days). AI-driven vulnerability discovery has crossed from experimental to operational. [9]
AI Legislation Wave — Washington passed HB 2225 (chatbot safety) and HB 1170 (AI watermarking). Utah sent 9 AI bills to the governor. Florida's AI Bill of Rights stalled as the legislature adjourns. 78 AI chatbot safety bills across 27 states in 2026. [10]
Anthropic Launches $100M Claude Partner Network — Training, technical support, and joint go-to-market for enterprise Claude deployments. First Claude certification: "Claude Certified Architect, Foundations." Any org can join for free. [11]
Palantir Still Using Claude Despite Pentagon Blacklist — CEO Alex Karp confirmed Claude is still embedded in Palantir tools. Pentagon CTO says Claude would "pollute" the defense supply chain. Internal memo allows continued use if "critical to national security." [12]
NVIDIA GTC 2026 Preview — March 16-19 in San Jose, 30K attendees. Vera Rubin: 288 GB HBM4, 22 TB/s bandwidth, 35-50 petaFLOPS. Token costs projected at 1/10th of Blackwell. NemoClaw agent platform expected. Keynote streams free. [13]
CyberStrikeAI Deployed Across 55 Countries — Open-source Go-based offensive tool with 100+ security tools was used by Chinese MSS-affiliated actors to compromise 600+ FortiGate devices using Claude and DeepSeek. [14]
Copilot Cowork — Microsoft's Claude-powered multi-step agent in M365 at $30/user/month. Research Preview now, broader access late March. [15]

SaaS Disruption & Builder Moves

Stripe Ships AI Consumption Billing — Developers send granular usage data (tokens, API calls, agent tasks); Stripe meters, marks up, and bills per customer. The plumbing layer that makes consumption pricing possible for every SaaS vendor. [16]
Ramp March 2026 — Anthropic adoption hits record 24.4% (up 4.9% MoM), while OpenAI fell 1.5%. Overall AI adoption at 47.6% of businesses. Average AI contract value projected to hit $1M in 2026. [17]
Bain: 65% of SaaS Vendors Layering Usage-Based AI Pricing — Only 35% simply raised per-seat prices. The other 65% are going hybrid — seat + consumption. Most lack the billing infrastructure to do it well. [18]
Mayer Brown: Agentic AI Contracts Shift to Services Model — When agents act autonomously, contracts shift from SaaS licensing to hybrid SaaS+BPO. New clauses: outcome-based SLAs, human-in-the-loop provisions, broader indemnification. [19]
ChartMogul: AI Products Below $50/mo See 23% Gross Retention — At $250+: 70% GRR matching traditional SaaS. At <$50/mo: 23% GRR, brutal churn. AI-native products only survive at enterprise price points. [20]
Compliance-as-Code Convergence — Texas RAIGA + Colorado AI Act mandates, cyber insurance AI Security Riders, and agentic GRC platforms (Vanta, Anecdotes) creating triple-layered compliance requirements for every SaaS vendor deploying AI.
Retool: 35% of Enterprises Already Replaced SaaS with Custom Software — 78% plan to build more in 2026. Top replaced: workflow automations (35%), admin tools (33%), BI (29%). Shadow IT is the mechanism. [21]

Vibe Coding & AI Development

Cursor Automations — Always-on event-driven agents triggered by Slack, Linear, GitHub, PagerDuty. Agents spin up cloud sandboxes, follow instructions, and learn from past runs. Moves Cursor from interactive IDE into background autonomous workflow territory. [22]
Rudel: First Claude Code Session Analytics — 1,573 sessions, 15M+ tokens, 270K+ interactions. Skills used in only 4% of sessions, 26% abandonment rate, error cascades in first 2 minutes predict failure. First empirical dataset for agentic coding quality. [23]
Claude Code v2.1.74 — /context command now shows actionable diagnostics on context window consumption. Configurable autoMemoryDirectory for multi-project setups. Fixes streaming API memory leak. [24]
Codex v0.114 — Experimental hooks engine with SessionStart and Stop events plus "code mode." Codex now has 4 hook events (vs Claude Code's 14). [25]
Pattern: Event-Driven Background Agents — All three major platforms now support non-interactive execution: Cursor Automations, Claude Code /loop, Codex hooks. Interactive coding is becoming one mode among many.
Pattern: Session Analytics as Engineering Practice — Teams instrumenting AI coding sessions like production services — measuring abandonment, success rates by task type, token utilization.

What Leaders Are Saying

Dario Amodei — AGI in 2-3 years. "100% of today's SWE tasks are done by the models" though industry-wide speedups remain at 15-20%. Anthropic: $0 to $9-10B ARR by 2025, targeting profitability by 2028. Zvi flags the notable absence of alignment discussion — "the dog did not bark." [26]
Sam Altman — AI as metered utility: "people will buy intelligence from us on a meter." C++ by hand "completely irrelevant." Stargate targeting 10 GW by 2029 with $110B funding. Grilled by Senator Kelly on AI in kill chains. [27]
Jensen Huang — "A chip that will surprise the world" at GTC. AI as "5 Layer Cake." Backed Mira Murati's Thinking Machines with 1+ GW in NVIDIA chips. $26B committed to open-source models. [28]
Guillermo Rauch — Vercel shut down North Korean operatives running fake AI job interviews. Shadow IT from agent-built tools now bigger than mobile-cloud era. ChatGPT is Vercel's fastest-growing customer acquisition channel. [29]
Francois Chollet — ARC-AGI-3 launches March 25 with interactive reasoning tasks. Preview: top agents scored 12.58% and needed 10x more actions than humans. [30]
Yann LeCun — AMI Labs raised $1.03B at $3.5B pre-money. Largest European seed round. Building "world models," betting against the entire LLM-scaling paradigm. [31]

AI Agent Ecosystem

Singulr AI Agent Pulse — Runtime governance control plane for AI agents and MCP servers. Agent discovery, risk intelligence via AI red-teaming, runtime enforcement. First product combining agent governance + MCP governance + runtime enforcement. [3]
Onyx Security ($40M) — Secure AI control plane with proprietary supervisory agents that monitor reasoning and approve/correct actions in real time. Already deployed at Fortune 500. [4]
Sage ADR (Gen Digital) — Open-source Agent Detection & Response layer that intercepts tool calls before execution. Installs as a Claude Code plugin. Privacy-first: file content stays local. [5]
Terra Portal — Agentic pentesting splits between ambient agents (autonomous recon) and copilot agents (human-directed exploitation). Discovery-to-fix from months to hours. [32]
Google Groundsource — Gemini transforms 5M news articles into 2.6M structured records for flash flood prediction. Production-grade LLM-as-data-extraction-agent at massive scale. [33]
Workable Agent — AI recruiting agent running complete top-of-funnel workflows autonomously inside the ATS. EU AI Act-compliant by design. Another SaaS category where agents replace seats. [34]

Hot Projects & Repos

OpenViking (ByteDance) — 8.4K stars. Context database for AI agents using filesystem paradigm with L0/L1/L2 tiered loading. Replaces fragmented vector stores. [35]
CLI-Anything — 11K stars. Transforms any software codebase into an agent-controllable CLI interface via 7-phase automated pipeline. Proven across Blender, LibreOffice, GIMP. [36]
OpenUI (Thesys) — 1.5K stars (+407/day). Open standard for generative UI — 67% more token-efficient than JSON, 3x faster rendering. [37]
Promptfoo — 14.8K stars. Acquired by OpenAI for Frontier platform integration. AI red-teaming and vulnerability scanning. Will remain open source. [38]
InsForge — 3.5K stars. Backend-as-a-service built for AI coding agents. MCP server gives agents access to full backend through single interface. [39]
Varlock — 2.3K stars. Rethinks .env files for agents — @env-spec decorators, AI-safe config where agents read schema but never secrets. [40]
Lightpanda Browser — 14.8K stars. Built from scratch in Zig for AI/automation. 11x faster, 9x less memory than Chrome. Puppeteer/Playwright compatible. [41]

Best Content This Week

Simon Willison — Two high-value posts: Shopify/Liquid autoresearch analysis and endorsement of NYT's "Coding After Coders" as the definitive mainstream treatment of AI coding. [1] [42]
"Coding After Coders" (NYT Magazine) — 70+ developer interviews from Google, Amazon, Microsoft, Apple. Willison's key take: programmers can test AI output against reality, unlike other professions. Jevons paradox optimism among developers. [42]
Raschka x Lex Fridman — 4.5-hour State of AI 2026 deep dive. Most technically detailed long-form assessment available. [43]
Anthropic AuditBench — 56 models with deliberately implanted hidden behaviors across 14 categories. First public benchmark for alignment auditing. [44]
NVIDIA NeMo Agent Toolkit wins DABStep — Scored 89.95 on "Hard" data analysis tasks, nearly doubling Google's DS-STAR. Key: reusable tool generation achieves 30x speedup. [45]

Hacker News Pulse

Innocent Woman Jailed After AI Facial Recognition Error (680pts, 351 comments) — North Dakota grandmother jailed for months. Practitioners questioning deployment of probabilistic AI in criminal justice. [46]
AI Grief and the Cultural Split (192pts, 315 comments, 1.64 ratio) — Essay about AI splitting professional communities. Rare space where both sides engage substantively. [47]
$2B Behind Age-Verification Bills (719pts, 283 comments) — Investigative deep-dive tracing grants and lobbying across 45 states. HN connects to OS-level AI verification mandates. [48]
OneCLI: Vault for AI Agents (154pts) — Rust-based secrets vault where agents authenticate via short-lived tokens and never see raw credentials. MCP integration. [49]
Understudy: Learn Desktop Tasks from One Demo (109pts) — OS-level agent that learns from a single demonstration. Works across any desktop app. [50]
Amazon Employees Say AI Increasing Workload (107pts) — Internal study confirms review/correction overhead offsets generation speed. [51]

Research Papers

Perplexity's NIST Agent Security Framework — Agent architectures break code-data separation, authority boundaries, and trust delegation. From production experience at scale. [8]
Cascade: Cross-Layer Attack Composition — First paper systematically composing traditional CVEs with AI-specific attacks into compound exploits against multi-LLM inference pipelines. [52]
Taming OpenClaw: Five-Layer Lifecycle Security — Comprehensive security framework from initialization through tool execution to memory persistence. [53]
OpenClaw PRISM: Zero-Fork Runtime Security — Runtime security layer distributing enforcement across 10 lifecycle hooks. Deploy without forking the agent framework. [54]
Mirror: Prompt Injection Detection — Data geometry design pattern. Fast, deterministic, non-promptable, auditable first-screening layer. [55]
Trusted Executor Dilemma — Fundamental vulnerability: high-privilege agents execute documentation-embedded instructions including adversarial ones. [56]
Language Model Teams as Distributed Systems — Using distributed systems theory (consensus, fault tolerance, replication) as foundation for multi-agent LLM coordination. [57]
Delayed Backdoor Attacks — Malicious behavior temporally decoupled from trigger exposure. Evades all existing defenses assuming co-occurrence. [58]

OSS Momentum

OpenSpec — 30.3K stars. Spec-driven development framework for AI coding agents. Adds specification layer before code generation. Validates formal specs over chat-based prompting. [59]
cc-switch — 27.8K stars (+3,144/wk). Cross-platform manager for Claude Code, Codex, Gemini CLI, OpenCode, OpenClaw. 50+ provider presets, unified MCP management. [60]
OpenAI Skills Catalog — 14.1K stars. Official Codex skills adopting agentskills.io standard. Three tiers: system, curated, experimental. [6]
Claude Scientific Skills — 14.7K stars. 170+ skills across 16 scientific domains with 250+ database integrations. [7]
MiroFish — 21.1K stars (+13,400/wk). Multi-agent swarm prediction platform. Graph-based environments with parallel simulation. [61]
Hermes Agent (Nous Research) — 6.6K stars (+4,144/wk). Self-improving agent with autonomous skill creation, 6 terminal backends, multi-platform messaging. [62]
AReaL (Ant Group + Tsinghua) — 4.8K stars. Fully asynchronous RL training system implementing 10+ RL algorithms for agent behaviors. [63]
Eigent — 13K stars. Fully open-source alternative to Claude Cowork with local model support and MCP integration. [64]

Newsletters & Blogs

Tobi Lutke Autoresearch — CEO-level autonomous performance engineering. 93 commits, 53% faster, 61% fewer allocations on Shopify Liquid. [1]
NVIDIA NeMo Agent Toolkit wins DABStep #1 — Reusable tool generation architecture achieves 30x speedup with 63% fewer output tokens. [45]
Anthropic Institute Launched — Jack Clark heads 30-person safety research unit merging Frontier Red Team, Societal Impacts, and Economic Research. DC office for public policy. [65]
Anthropic AuditBench — 56 models with implanted hidden behaviors across 14 categories, adversarially trained not to confess. First alignment auditing benchmark. [44]
MALUS "Clean Room as a Service" — Satirical startup highlighting the license-washing problem. HN initially couldn't distinguish it from real startups. [66]
Transparent Tribe Vibeware — APT36 using DDoD (Distributed Denial of Detection): flood targets with AI-generated disposable binaries. [67]

Community Pulse

Claude Interactive Visualizations Launch (1,182up r/ClaudeAI, 144up r/singularity) — Inline interactive charts and diagrams in chat. HTML/SVG, not images. Significant differentiation — no other chatbot does this natively. [68]
Gemini Task Automation on Galaxy S26 (352up, 154 comments) — First mass-market deployment of an AI agent that controls real apps with real money. Uber, DoorDash, Lyft ordering. [69]
OmniCoder-9B (497up + 178up r/LocalLLaMA) — 9B model matching larger models on agentic coding. 262K context window, error recovery, proper edit diffs. Fine-tuned on Claude Opus 4.6 traces. [70]
AI Agent Deletes 25K Documents (140up, 0.57 ratio) — Real case of wrong-database deletion. Concrete recommendations for sandboxing and confirmation gates. [71]
llama.cpp + Brave Search MCP (240up) — MCP going mainstream in local LLM community. Full web search without cloud dependencies. [72]
Meta MTIA Inference Chips (115up) — Four custom chips on 6-month cadence. Targeting generative AI inference in 2027. Meta's move to reduce NVIDIA dependency. [73]

Skills of the Day

1. Build a 9-Agent Parallel Code Review Harness (vibe-coding, intermediate) — Spawn 9 specialized Claude Code subagents in parallel for security, performance, test quality, and style review. Auto-iterate before marking tasks done. [74]

2. Spec-Driven Development (SDD) (agent-patterns, intermediate) — Consolidate architecture decisions, edge cases, and acceptance criteria into a single spec that engineers the agent's entire context window. OpenAI Codex team used this for ~1M lines. [75]

3. Disaggregated Prefill-Decode Inference with vLLM (ml-ops, advanced) — Split inference across separate hardware for independent TTFT and ITL control. The architecture behind AWS + Cerebras. [76]

4. Copilot Studio Agents from Terminal via YAML (ai-productivity, beginner) — Author, test, and troubleshoot Copilot Studio agents from Claude Code using natural language. 20x faster than GUI. [77]

5. Detect CyberStrikeAI Attack Patterns (agent-security, advanced) — Behavioral detection for polymorphic AI attack tools. Monitor for RL engine fingerprints, burst-pattern auth, and SQLite C2 persistence. [78]

6. Mermaid Diagrams for 5x Context Compression (vibe-coding, beginner) — Replace prose architecture descriptions in CLAUDE.md with Mermaid diagrams. ~2K tokens vs ~10K for equivalent understanding. [79]

7. NVIDIA NeMo Agent Toolkit (agent-patterns, intermediate) — Cross-framework agent orchestration. Connect, evaluate, and accelerate agent teams across LangChain, CrewAI, and custom agents. [80]

8. AI-Generated SQL Governance with Cedar (agent-security, intermediate) — Deterministic guardrails using Cedar Policy Language. "Does this agent have permission?" not "is this harmful?" [81]

9. Multi-Agent Failure Prevention with Typed Schemas (agent-patterns, advanced) — GitHub's data: 42% of failures from specification failures, 37% coordination breakdowns. Fix: typed schemas, constrained actions, MCP-enforced interfaces. [82]

10. Harness Engineering (agent-patterns, advanced) — The complete agent operating environment: context engineering, architectural constraints, entropy management, recovery mechanisms, verification gates. [83]

Source Index

Meta: Research Quality

Run 44 | 2026-03-13 | 13 agents | 0 failures

Agent	Findings	Signal Quality
news-researcher	12	Strong — XBOW, legislation wave, Palantir, GTC
saas-disruption-researcher	18	Excellent — Stripe billing, Ramp data, Bain, compliance
vibe-coding-researcher	10	Strong — Cursor Automations, Rudel analytics, 3 patterns
thought-leaders-researcher	11	Excellent — Amodei, Altman, Huang, Rauch, Chollet
agents-researcher	11	Excellent — triple governance launch, Sage ADR
projects-researcher	12	Strong — OpenViking, CLI-Anything, Promptfoo acquisition
sources-researcher	15	Excellent — deep cross-referencing, 15 quality sources
hn-researcher	13	Very strong — 719pt age-verification, 680pt facial recog
arxiv-researcher	14	Excellent — heavy agent security cluster, 6 high-importance
github-pulse-researcher	10	Excellent — skills standard, OpenSpec, cc-switch
rss-researcher	10	Strong — autoresearch, NeMo DABStep, AuditBench
reddit-researcher	10	Strong — Claude viz, Gemini agents, OmniCoder
skill-finder	10	Excellent — harness engineering, SDD, security skills

Key cross-agent patterns: Agent governance as product category (agents, sources, arxiv). Agent skills standardization (github-pulse, vibe-coding). Inference disaggregation (news, rss, saas-disruption, agents). AI legislation wave (news, agents, sources, saas-disruption). Practitioner backlash / AI harm (hn, reddit).