Hey I’m Ben. I build stuff with agents, even though I’m not technical. Here’s all the stuff I’m reading and tinkering with. If you want to start building or level up your ‘vibe-coding’ skills, join our community.


Hey folks,

The ‘become a builder’ workshop last week went well-ish 😊 (Codex crapped out on us). The recording is available, but I’m working on a thorough guide to cover everything properly (plus the bits we didn’t get to cover). I’m ~50% through it so hope to have it out this week.

Also, Factory is hosting a hackathon this thursday, everyone gets 200M tokens, and a mac mini is on the line.

OpenAI released GPT 5.4 in “thinking” and “pro” variants. It brings the coding power of GPT-5.3-Codex to the main model series, with better vision, tool use efficiency and a context window of 1M tokens. It’s now much better at computer use (see demo) and financial tasks. It’s also a bit more expensive vs GPT-5.2 ($1.75/$14 → $2.5/$15 per million input/output tokens). OpenAI expects to keep this naming and capacity difference between instant models (GPT-5.3 Instant) and reasoning models moving forward.

More from OpenAI:

  • ChatGPT for Excel - An extension to use ChatGPT in a sidebar right from your workbooks.

  • Codex Security, an AI app security agent evolved from Project Aardvark—free for a month to Enterprise customers.

  • Codex for Open Source - A program for open-source maintainers, giving them 6 months of ChatGPT Pro with Codex, conditional access to Codex Security and API credits.

It’s also acquiring Promptfoo, an open-source AI security testing tool (popular among Fortune 500, stays OSS).

New built-in skill in Claude Code - /loop lets you schedule recurring tasks in a single session, for up to 3 days at a time. Plus, you can now schedule tasks using Claude Code Desktop - these tasks run regularly as long as your computer is awake. They also launched a community ambassadors program for Claude.

For enterprises, Anthropic released Code Review by Claude and Claude Marketplace. The review tool uses a team of agents to review every PR and, on average, costs $15-25 per review. The marketplace lets enterprises consolidate their AI spending by using their Anthropic commitments to pay for other AI apps like GitLab, Harvey, Replit, etc.

Karpathy released autoresearch — agents autonomously iterate on LLM training code. Ran 2 days on 8xH100, found 20 real improvements with an 11% speedup. 630 lines, single-GPU, open source. I assume this approach of agents coming up with ideas and implementing them will see much more activity this year.

Yann LeCun, Meta’s ex-Chief AI Scientist, along with other researchers, has raised over $1B at a $3.5B valuation for their new startup, Advanced Machine Intelligence (AMI Labs). They are already operating from Paris, New York, Montreal and Singapore with a strong focus on world models and research that goes beyond LLMs.

Go stackless and get back to selling. Remember when selling meant talking to people? Before the tab-switching and endless sync errors. Reevo brings it all back to one platform. Prospecting, calls, pipeline, and reporting all in a single tab. From prospect to close. Go Stackless. reevo.ai*



  • Cursor Automations - Build always-on agents. Run them on a schedule or use events (like Slack messages) as a trigger.

  • T3 Code - Desktop app to use Codex CLI (alternative to Codex app). nice and smooth to use, still feels alpha though (because it is).

  • Handles by here.now - Personalised sub-domains for everything you publish with your agent.

  • Copilot Cowork - Handoff tasks to agents with the ability to work across your Microsoft 365 apps.

  • Air by JetBrains - Agentic dev environment built for working with agents from different vendors.

  • Clawcard - A real inbox, a phone number, and a credit card your agents can’t abuse.

  • 21st Agents - Infra for adding agents to your app—runtime, sandboxing, billing, UI, streaming and more. Also see: Terminal Use (very similar, YC W26).

  • Code review tools:

    • Warden by Sentry - Set of skills to review every PR on your codebase.

    • Vet by Imbue - Fast and local code review tool to make sure the agent followed your instructions.

    • OpenReview - Open-source, self-hosted AI code review bot powered by the Vercel AI Cloud.


  • Notchi - Cute little Tamagotchi that lives in your notch. It cries when you yell at claude and gets happy when you praise it.

  • Context Hub - An open tool that gives your coding agent the up-to-date API documentation it needs. (read more)

  • Agent Safehouse - macOS-native sandboxing for local agents.

  • Flue by Astro - A framework to build sandboxed AI agents and CI workflows.

  • slacrawl - Get your Slack data locally with or without API keys.

  • claude-replay - Turn claude code session transcripts into self-contained, embeddable HTML replays.

  • executor - Local-first execution environment for AI agents. (read more)

  • agent-coworker - Agent backend that you can use from a terminal or a desktop app.

  • agent-kanban - VS Code extension that provides an integrated kanban board to manage coding agent tasks.

  • Fractals - A tool to break down tasks into subtasks on repeat, let agents complete them and manage the entire process.

  • Uithub is now open-source. Turn GitHub repos into LLM-ready context.

  • shadcn/cli v4 - comes with skills, presets, dry-run, monorepo and more.

  • Experimental UI to fork convos and explore side tangents without interrupting the main thread. (read more)

  • An agent skill to help you write smarter, simpler, and more modern SwiftUI.

  • Making OpenClaw and Codex app talk to each other using ACP.



Enjoy this newsletter? Forward it to a friend.

Share

That’s it for today. Feel free to comment and share your thoughts. 👋

* sponsors who make this newsletter possible :)
Wanna partner with us for March? Last few slots available