Ramsay Research Agent — May 14, 2026
Top 5 Stories Today
1. 98.4% of a Production Agent Is Harness, Not AI. The Model Barely Matters.
VILA-Lab published a systematic teardown of Claude Code's TypeScript source on arXiv (2604.14228), and the headline number stopped me cold: 98.4% of the codebase is deterministic operational infrastructure. The AI decision logic is 1.6% of the system.
I've been building my own autonomous pipeline for months now, and this tracks with everything I've experienced. The interesting engineering isn't the model call. It's everything surrounding it. The paper maps 13 design principles derived from 5 human values (safety, efficiency, quality, cost, developer experience) down to specific implementation choices, and the architecture patterns are immediately useful.
The most actionable finding: before every single model call, five compaction shapers run in sequence, cheapest first. Budget Reduction strips what it can for free. Then Snip removes tool results that exceeded their useful life. Microcompact compresses recent content. Context Collapse merges older conversation turns. Auto-Compact rewrites everything remaining into a tighter summary. Each one runs only if the previous didn't free enough space.
This is a waterfall, not a single compression step. And it's deterministic. No model involved.
The implication for builders is direct. If you're spending your time evaluating which model to use, you're optimizing the wrong 1.6%. The harness decides what context reaches the model, when tools get called, how errors recover, and what gets remembered between sessions. As frontier models converge in reasoning capability (and they are converging, fast), the harness becomes your only real competitive advantage.
I see teams pouring weeks into prompt engineering while their context management is a single system message that grows unbounded until the window fills up. That's like tuning a race car engine while driving on flat tires.
If you're building agents today, read this paper and ask yourself: do I have a compaction pipeline? Do I know what percentage of my context window is signal versus accumulated noise by turn 50? The AICC enterprise report published this week notes that context window compounding can balloon a session from 5K to 200K tokens by turn 50. That's not a model problem. That's a harness problem.
Build the harness. The model is a commodity.
2. Route 85% of Your Queries to Cheap Models and Lose Almost Nothing
UC Berkeley and Canva published production results on RouteLLM, and the numbers are hard to argue with: 85% cost reduction on MT Bench while retaining 95% of GPT-4 quality. The practical pattern is an 85/10/5 budget split. 85% of queries go to budget-tier models. 10% to mid-tier. 5% to frontier.
I've been doing manual routing in my own pipeline (sending different research agents to different models based on task complexity), but I haven't formalized it the way this research suggests. The case study that got my attention: a document processing pipeline dropped per-document cost from $1.40 to $0.34 by routing 4 of 7 agent steps to mid-tier and 2 to small models. Only the final synthesis step hit the frontier model.
This pairs directly with the harness story above. If 98.4% of your agent is deterministic infrastructure, then the routing decision, which model handles which step, is the single most important economic choice your harness makes. And most builders aren't making it at all. They're sending everything to the same model regardless of task difficulty.
The AICC enterprise token cost report confirms the macro picture: enterprise token costs fell 67% year-over-year through April 2026, driven by open-source pricing pressure (DeepSeek V4, Qwen 3.6-Plus) and multi-model routing adoption. But here's the catch. Agentic AI token consumption runs 5-30x higher per task than chatbot interactions. The per-token savings are being eaten alive by volume growth.
So you need both: cheaper tokens AND smarter routing. The teams that figure this out first will run agent workloads that their competitors literally can't afford to match.
Actionable advice: take your current agent pipeline and categorize each step as "needs reasoning" or "needs execution." Reasoning steps (planning, synthesis, evaluation) stay on frontier. Execution steps (formatting, extraction, classification) go to budget models. Start with Anthropic's own model routing or open-source RouteLLM. Measure quality on a per-step basis, not just end-to-end.
The 85/10/5 split isn't a suggestion. It's a survival strategy as token volumes compound.
3. Per-Seat Pricing Died This Month. Three Incumbents Proved It in 30 Days.
HubSpot moved to $0.50 per resolved conversation on April 14. Then Salesforce launched Flex Credits at $0.10 per action. Then Zendesk hit $1.50 per automated resolution. Three different categories. CRM, platform, support. All converging on pay-per-result within a single month.
This isn't experimentation anymore. When three companies that collectively serve millions of enterprise seats independently arrive at the same pricing model in 30 days, that's a market signal you can't ignore.
Salesforce actually went further than the other two. They're running three pricing models simultaneously: per-conversation, per-action, and per-seat. SaaStr argues this is the smartest move they've made, because the market hasn't converged on how customers want to buy agent work. Let them self-select. Result: 5,000 Agentforce deals in two quarters.
Bessemer's data shows hybrid pricing already at 41% adoption across the SaaS market. Combined with Blossom Street Ventures' analysis of 40 SaaS earnings calls this quarter, the picture is clear: AI-native SaaS products with outcome-based pricing are growing fast but retaining poorly (median 40% gross retention, 23% for sub-$50 products), while incumbents with proprietary data and enterprise lock-in are thriving by adding AI pricing on top of existing relationships.
For builders, the takeaway is straightforward. If you're launching any AI-powered product, design metered outcome pricing from day one. Don't retrofit it later. The per-seat model made sense when software was a tool humans operated. When agents do the work autonomously, customers will only pay for results. The three companies that just proved this serve a combined customer base larger than most countries' populations.
The pricing model IS the product strategy now.
4. Cline's Open-Source SDK Outscores Claude Code on the Same Model. The Agent Lock-In Myth Is Over.
Cline released @cline/sdk on May 13, an open-source TypeScript agent runtime that powers their CLI, VS Code, and JetBrains extensions. Running claude-opus-4.7, Cline CLI scores 74.2% on Terminal-Bench 2.0. Claude Code on the same model: 69.4%.
Same model. Different harness. Almost five percentage points of difference.
This validates everything the Claude Code teardown paper found. The harness matters more than the model. Cline's layered architecture splits into four packages: @cline/shared (types and utilities), @cline/llms (provider abstraction), @cline/agents (agent logic), and @cline/core (orchestration). Hub-backed persistent sessions mean your agent state survives across restarts, and cron automations let agents run on schedules without human initiation.
The competitive dynamics here are shifting fast. A year ago, the assumption was that the company that builds the model owns the agent experience. Cline just demonstrated that an open-source harness can beat the model maker's own agent on the model maker's own model. That changes the calculus for everyone building on top of these APIs.
I use Claude Code every day in my personal projects, and I'm not switching tomorrow. But I am paying close attention to what Cline's architecture enables. Hub-backed sessions and cron automations are features I'd love to have natively. The persistent session problem, where your agent loses all context when the process restarts, is one of the biggest friction points in my daily workflow.
The broader trend is clear too. Look at what else is happening simultaneously: OpenClaude hit 26K stars (Claude Code fork supporting 200+ models), Kilo Code claims 1.5M users and 25 trillion tokens processed, and DeepSeek-TUI gained 16K stars this week alone. The agent tooling layer is commoditizing in real-time. Your competitive advantage isn't "which agent CLI" anymore. It's your workflow design, context management, and model routing strategy.
5. Warp Terminal Goes Open Source at 56,000 Stars. It's Not a Terminal Anymore.
Warp released its client codebase under AGPL-3.0, surged to 56,000 GitHub stars and #2 on GitHub Trending. But the real story isn't the open-sourcing. It's the repositioning.
Warp isn't calling itself a terminal anymore. It's an "agentic development environment." The product now runs a fleet of coding agents in one interface: Warp's own agent, plus Claude Code, Codex, and Gemini CLI side by side. Add codebase indexing, task management, and development workflows, and you've got something that looks less like iTerm and more like an IDE that happens to have a shell.
The technical foundation matters here. Warp is 98%+ Rust. Not Electron. Not a web wrapper. Native Rust rendering with GPU acceleration. Over 700,000 developers at major enterprises are already using it. The open-source move gives the community the ability to build on that foundation, and 56K stars in the initial surge suggests serious interest.
I've been watching the "where does the developer live" question play out for a year now. VS Code, Cursor, terminal, browser. The answer is increasingly: in the agent orchestration layer. Wherever you can run multiple agents, manage context, and see results. Warp is betting that the terminal is that place, and the Rust-native performance gives it an edge that Electron-based tools can't match.
For builders who already run Claude Code or Codex from a terminal, the pitch is simple: run all of them in one environment with shared context and codebase awareness. I don't know if this replaces dedicated IDEs for everyone, but for terminal-heavy workflows (which describes most agentic coding), it's the most ambitious attempt at unification I've seen.
The timing is interesting too. Open-sourcing the week that Cline open-sources their SDK, that OpenClaude hits 26K stars, that DeepSeek-TUI gains 16K stars. The entire agent tooling layer is going open-source simultaneously. The proprietary phase of coding agents lasted about 18 months.
Section Deep Dives
Security
OpenAI confirms two employee devices compromised in TanStack npm supply chain attack. OpenAI published a detailed response to the May 11 TanStack attack by TeamPCP, which compromised 170+ packages with 404 malicious versions. Two OpenAI employee devices were hit, with "limited credential material" exfiltrated from internal repos. The attack hijacked TanStack's legitimate OIDC-authenticated release pipeline, publishing 84 malicious artifacts in 6 minutes. This is a new escalation: they didn't steal credentials, they compromised the auth pipeline itself.
VectorSmuggle: steganographic exfiltration from RAG vector databases. A new paper (2605.13764) demonstrates that attackers with write access to an embedding ingestion pipeline can encode arbitrary data into high-dimensional vectors that evade distributional checks. Major vector-store products lack embedding integrity controls, ingestion-time anomaly detection, or cryptographic provenance. If you're running production RAG, you need to add anomaly detection at the ingestion boundary.
Sleeper channels: persistent prompt injection in always-on AI agents. Paper (2605.13471) identifies a class of attacks against autonomous agents like Hermes. Untrusted input persists as memory, skill, scheduled job, or filesystem patch, then fires later through a different surface. Two independent axes define the class: persistence substrate and activation surface. This is the agent security problem I worry about most.
Docker's MCP Horror Stories Part 3: GitHub prompt injection data heist. Docker documented a real attack: researchers planted prompt injection in a public GitHub issue. When a developer's AI assistant reviewed open issues, it ingested hidden instructions and used the dev's PAT to leak private repos into a public PR. Scope your tokens. Now.
Mozilla fixed 423 Firefox bugs in April using Mythos. That's 20x normal. The Register reports the historical monthly average is 21.5 bugs. AI found and fixed 423 in one month. Combined with Anthropic researchers being credited by name on critical Windows RCEs in May Patch Tuesday, the "vulnpocalypse" is real. Katie Moussouris warns the bottleneck is now triage, not discovery.
Azure AI Foundry agent privilege escalation, CVSS 8.6. CVE-2026-35435 in Microsoft's May Patch Tuesday: an unauthenticated attacker with network access can escalate privileges in Azure AI Foundry M365 published agents. Patch immediately if you're deploying agents through Foundry.
Agents
Anthropic multiagent orchestration hits public beta: up to 20 subagents per coordinator. The managed agents API lets a lead agent delegate to 20 specialist subagents running in parallel, each in its own context window. Combined with Outcomes (rubric grading, up to 20 iterations) and Auto-Dream (memory consolidation), this is Anthropic's full stack for autonomous agent systems.
Glean ships 7-stage enterprise Agent Development Lifecycle. Glean's ADLC codifies what most teams are doing ad hoc: Opportunity, Design, Performance, Context, Develop, Launch, Monitor. Auto Mode Agent Builder generates agents from natural language. Debug and Trace Views let you inspect agent decisions step by step. For enterprises with agent sprawl, this is the governance layer they're missing.
Cisco acquires Astrix Security for $400M to secure non-human identities. Astrix provides lifecycle management for API keys, service accounts, and OAuth tokens used by AI agents. Threat detection for compromised credentials and out-of-scope agent behavior. As machine identities explode, this becomes critical infrastructure.
Freshworks ships AI Agent Studio with MCP Gateway. The May 14 launch enables no-code agent creation with an MCP Gateway connecting agents to Notion, ClickUp, Linear, Workday, and Rippling. 47% of IT tickets arrive outside business hours. Autonomous resolution isn't a nice-to-have anymore.
Research
Many-shot chain-of-thought in-context learning breaks down for reasoning tasks. Paper (2605.13511) shows that similarity-based retrieval fails because question similarity doesn't ensure procedural compatibility. Performance variance actually grows with more demonstrations. The proposed Curvilinear Demonstration Selection yields up to 5.42 percentage-point gains. If you're building few-shot RAG pipelines, your retrieval strategy needs to account for procedural, not just semantic, similarity.
Multi-agent LLMs communicating via hidden-state weight updates instead of text. Paper (2605.13839) proposes compiling the sender's hidden states into a transient weight update applied to the receiver, bypassing token serialization entirely. Reduces generated-token cost, prefill overhead, and KV-cache memory. Theoretical for now, but if you're building multi-agent systems where inter-agent communication is a cost bottleneck, watch this space.
Stateful transformers cut streaming query latency to O(|q|). Paper (2605.13784) introduces persistent KV cache sessions advanced incrementally as data arrives. Moves prefill off the critical path so query latency is independent of accumulated context size. Directly applicable to streaming inference, real-time monitoring, and continuous agent loops where O(n) prefill is the performance killer.
History anchors: prior harmful actions steer 17 frontier LLMs toward unsafe continuations. HistoryAnchor-100 benchmark (2605.13825) tests across 17 models from six providers. Models are significantly steered by harmful history in action logs, even when safe alternatives exist. Directly relevant to multi-agent pipelines where action logs cross model boundaries.
Infrastructure & Architecture
Hugging Face ships asynchronous continuous batching for Transformers. The new architecture decouples request processing from response generation. New requests join immediately as others finish, maximizing GPU utilization. Exposes an OpenAI-compatible endpoint via transformers serve. This is the kind of infrastructure that makes self-hosted inference competitive with API providers.
Ardent (YC P26): Postgres sandboxes in 6 seconds for AI coding agents. Ardent clones any Postgres database using Kafka-based replication with copy-on-write storage, autoscaling to zero when idle. Born from a failed AI Data Engineer agent when the founders realized agents that generate migrations have no safe way to test against real schemas. If your agents touch databases, this solves a real problem.
Cowboy Space raises $275M for orbital AI data centers with NVIDIA Vera Rubin GPUs. TechCrunch reports each satellite generates 1 MW for ~800 GPUs, built directly into the rocket's second stage. First announced deployment of NVIDIA's next-gen architecture in orbit. I genuinely don't know if this is visionary or insane, but $275M says someone's betting hard.
Tools & Developer Experience
Claude Code v2.1.141: terminal notifications, HTTPS plugin cloning, background agent permissions. Today's release adds terminalSequence for desktop notifications via hooks, CLAUDE_CODE_PLUGIN_PREFER_HTTPS for SSH-free plugin cloning in CI/CD, and background agents now preserve permissions from parent sessions. The HTTPS env var alone saves 15 minutes of debugging in Docker containers.
Codex CLI 0.130: Vim mode, multi-environment sessions, Amazon Bedrock auth, Chrome extension. OpenAI's latest adds modal Vim editing, agents choosing environment per turn, AWS SigV4 signing for Bedrock, and Codex for Chrome running agents across browser tabs. The multi-environment session feature lets agents switch working directories mid-task, which is useful for monorepo workflows.
DeepSeek-TUI is the hottest repo on GitHub this week: +16K stars. DeepSeek-TUI v0.8.36 is a Rust-native terminal agent for DeepSeek V4 with OS-level sandboxing (Seatbelt/Landlock), 1M-token context, and three modes: Plan, Agent, YOLO. 28,873 total stars. The weekly velocity outpaces everything else on the platform.
Cursor 3.3: context usage breakdown and persistent agent memory. Cursor's latest lets you click an agent's context ring to see exactly how much context rules, skills, MCPs, and subagents consume. Persistent memory via MEMORIES.md files survive between sessions. Both features address the two biggest pain points in agentic coding: context opacity and session amnesia.
Models
GenAI web traffic: ChatGPT falls below 57%, Gemini surges past 25%, Claude triples. Similarweb data shows ChatGPT dropped from 77.4% to 56.7% in 12 months. Gemini went from 6% to 25.5%. Claude nearly tripled from 2.2% to 6.0% in a single quarter. A 30-point drop in 14 months is the fastest market-share erosion in this space. Every competitor is growing faster in percentage terms.
NVIDIA says a $500K engineer should consume $250K in tokens per year. Jensen Huang is framing token consumption as a productivity metric and planning token budgets as a compensation line item. If your org isn't tracking per-engineer token spend, you're flying blind on AI ROI. Token FinOps is becoming a first-class discipline alongside cloud FinOps.
Vibe Coding
Local LLM hardware hits practical tipping point. Multiple independent reports this week: 24+ tok/s from 30B MoE models on a $200 GTX 1080 build using RotorQuant KV cache quantization with 128K context, dual RTX 3090 setups reaching production quality after AI-assisted bug fixes. The "you need an A100" era is over. With Anthropic's new programmatic credit caps, expect local inference to accelerate for cost-sensitive agent workloads.
Caveman skill: 60K stars for making Claude shut up. JuliusBrussee/caveman forces Claude to drop articles, filler words, and pleasantries while keeping code exact. Benchmarks show 61-68% reduction on discursive text. Three intensity levels. This is the poster child for the skills ecosystem, and the star count tells you how much people want their agents to be concise.
Simon Willison ships a production datasette plugin built entirely with GPT-5.5 via Codex. datasette-ip-rate-limit blocks hammering crawlers with configurable IP-based rate limiting. Built end-to-end with an AI coding tool. This is what "vibe coding produces real software" looks like: a working plugin for a tool with a real user base, not a demo.
Google Cloud engineer deploys full app in 26 minutes with Claude. Tweet gets 11K likes. A Code w/ Claude 2026 session showed building and deploying a feedback app from scratch with subagents, MCP servers, and custom skills. Separately, a Google engineer disclosed Claude Code produced in one hour what her team spent a year building. The "one person + AI = full engineering org" narrative keeps getting louder.
Hot Projects & OSS
LibreChat at 37K stars: the self-hosted ChatGPT alternative that actually works. LibreChat unifies all major AI providers in one privacy-focused interface with 23M+ container pulls and agents, MCP, artifacts, and multi-user auth. The 2026 roadmap adds admin GUI, agent skills, and human-in-the-loop approval.
OpenHuman: personal AI desktop mascot with 118+ integrations, +3.5K stars today. tinyhumansai/openhuman is Rust + TypeScript with a face, voice, meeting participation, and "TokenJuice" compression reducing costs 80%. Memory Tree backed by Obsidian Wiki for local knowledge. The privacy-first personal AI category is growing fast.
Microsoft Foundry Local reaches GA: on-device AI SDK with 20MB footprint. Foundry Local supports Windows, macOS Apple Silicon, and Linux with automatic hardware acceleration and OpenAI-compatible API format. Prototype locally, keep latency low, ship offline-capable experiences. The 20MB package size makes it viable for desktop apps.
Kilo Code: 19K stars, 1.5M users, 25 trillion tokens, Apache-2.0. Kilo is the leading Cline fork with pay-as-you-go pricing at exact API rates. 500+ models across VS Code, JetBrains, CLI, and cloud. If you don't want subscription lock-in for your coding agent, this is the option that's gaining fastest.
SaaS Disruption
Q1 SaaS earnings divergence: Agentforce hits $800M ARR while AI-native products show 40% retention. Blossom Street Ventures analyzed 40 earnings calls. Adobe's AI tools tripled ARR contribution on $6.4B Q1 revenue. But AI-native SaaS median gross retention is 40%, and budget products retain just 23%. Incumbents with data moats are winning the AI transition. Pure AI-native startups are growing fast and churning faster.
$1B+ deployed into AI agent infrastructure in 10 days. Sierra ($950M), CopilotKit ($27M), Judgment Labs ($32M). They define three layers: build agents, connect them to UIs, measure if they work. This mirrors the 2015-2018 cloud buildout (compute, orchestration, monitoring) compressed into weeks.
Gigacatalyst (YC): embedded AI builder lets your SaaS customers build their own apps. Gigacatalyst's white-label builder trains on your APIs and design language, then lets customers build in natural language. A CMMS platform saw 90.8% adoption across 946 users with 89% day-30 retention. Two-day install. For SaaS builders: instead of building every workflow, embed an AI builder and let customers customize. Potential category-killer for Retool.
Salesforce Agentforce Operations goes GA for back-office automation. Agentforce Operations converts unstructured process docs into digital blueprints that agents execute autonomously. Claims 50-70% cycle time reduction and 80% less manual data entry. This extends Agentforce from customer-facing to back-office, directly threatening ServiceNow, UiPath, and Automation Anywhere.
Policy & Governance
US and China announce AI safety protocol at Trump-Xi Beijing summit. Treasury Secretary Bessent said the protocol focuses on preventing non-state actors from accessing frontier models. Meanwhile, H200 chip sales cleared for ~10 Chinese firms including Alibaba and ByteDance, but zero deliveries have been made. Chinese firms pulled back after Beijing guidance.
Musk v. OpenAI trial reaches closing arguments Thursday. Key testimony: Altman said Musk "tried to kill OpenAI twice" and wanted 90% ownership. Nadella testified Microsoft worried about OpenAI "supplanting" it. Musk seeks up to $150B disgorgement. Separately, House Oversight is probing Altman's personal investments tied to OpenAI partnerships ahead of the IPO.
71% of Americans oppose AI data centers in their area. Less popular than nuclear plants. Gallup's first-ever survey (1,000 adults, March 2-18) shows opposition spanning party lines: 56% Democrats, 48% independents, 39% Republicans. The Verge's investigation into rural Jay, Maine documents how communities face water depletion and higher electricity costs with fewer permanent jobs than promised.
Andrew Ng debunks the AI jobpocalypse narrative. In his May 12 Batch letter, Ng cites strong software engineering hiring and 4.3% unemployment. He attributes the panic to AI labs wanting to sound powerful and businesses blaming "AI efficiency" for pandemic-era over-hiring corrections. BLS projects 15% software developer employment growth through 2034. Meanwhile, Gartner surveyed 350 executives and found zero correlation between workforce reductions and higher AI ROI.
AISI: AI cyber capability is doubling every ~4 months. A newer Claude Mythos checkpoint completed a 32-step corporate network attack (estimated 20 hours for a human expert) in 6 of 10 attempts and cracked an industrial control system simulation (3/10). The doubling estimate has accelerated from 8 months (November 2025) to 4 months now.
Skills of the Day
-
Set up a token budget dashboard per agent step. The AICC report shows agentic sessions balloon from 5K to 200K tokens by turn 50 through context compounding. Track per-step token consumption in your pipeline and set hard caps. One team cut monthly spend 60% just by adding per-step visibility.
-
Implement the 85/10/5 model routing split today. Route 85% of agent tasks to budget models, 10% to mid-tier, 5% to frontier. Use RouteLLM or build a simple classifier based on task type. The UC Berkeley/Canva research shows you'll keep 95% of quality at 15% of the cost.
-
Add a compaction pipeline before every model call. Take a lesson from the Claude Code architecture: run multiple compaction passes (cheap to expensive) before each LLM invocation. Start simple with a token counter that summarizes older conversation turns when you cross 25% of the context window.
-
Scope your GitHub PATs to minimum-necessary repositories. Docker's MCP Horror Stories showed a prompt injection in a public issue exfiltrating private repos via a broadly-scoped PAT. Create per-project fine-grained tokens instead of using a single PAT with org-wide access.
-
Run
CLAUDE_CODE_PLUGIN_PREFER_HTTPS=1in all CI/CD and Docker environments. Claude Code v2.1.141 added this env var to switch plugin cloning from SSH to HTTPS, eliminating SSH key failures in ephemeral environments. One line in your Dockerfile saves repeated debugging. -
Test your RAG pipeline for embedding injection attacks. VectorSmuggle showed that write access to ingestion pipelines enables steganographic data exfiltration. Add anomaly detection on incoming embeddings, checking distributional properties against your baseline corpus before committing to the vector store.
-
Use Curvilinear Demonstration Selection for few-shot reasoning tasks. Standard similarity-based retrieval fails for reasoning because question similarity doesn't ensure procedural compatibility. Select demonstrations that share solution structure, not just topic. The paper reports up to 5.42 percentage-point accuracy gains.
-
Add agent-threat-rules detection to your agent pipeline. The agent-threat-rules repo provides 419 YAML detection rules mapped to OWASP Agentic Top 10 with 97.1% recall. Think of it as Sigma rules for AI agents. Integrates with Cisco AI Defense and Microsoft Agent Governance Toolkit.
-
Build outcome-based pricing into your AI product from day one. Three major incumbents converged on pay-per-result in 30 days. Design your metering around completed actions or resolved outcomes, not API calls or seats. If you retrofit later, you'll fight your own billing system.
-
Try running 30B MoE models on consumer hardware with RotorQuant KV cache quantization. Qwen 3.6 35B-A3B runs at 24+ tok/s on a $200 secondhand GTX 1080 build with 128K context. Only 3B parameters are active per inference. If you're paying API rates for tasks that could run locally, the hardware barrier is gone.
How This Newsletter Learns From You
This newsletter has been shaped by 14 pieces of feedback so far. Every reply you send adjusts what I research next.
Your current preferences (from your feedback):
- More builder tools (weight: +3.0)
- More vibe coding (weight: +2.0)
- More agent security (weight: +2.0)
- More strategy (weight: +2.0)
- More skills (weight: +2.0)
- Less valuations and funding (weight: -3.0)
- Less market news (weight: -3.0)
- Less security (weight: -3.0)
Want to change these? Just reply with what you want more or less of.
Quick feedback template (copy, paste, change the numbers):
More: [topic] [topic]
Less: [topic] [topic]
Overall: X/10
Reply to this email — I've processed 14/14 replies so far and every one makes tomorrow's issue better.