All AI Labs Business News Newsletters Research Safety Tools Sources

🍔 Newsletters 5 min read · a month ago

Just use GPT-5.4 xhigh

Just use GPT-5.4 xhigh

Hey I’m Ben. I build stuff with agents, even though I’m not technical. Here’s all the stuff I’m reading and tinkering with. If you want to start building or level up your ‘vibe-coding’ skills, join our community.

Hey folks,

The ‘become a builder’ workshop last week went well-ish 😊 (Codex crapped out on us). The recording is available, but I’m working on a thorough guide to cover everything properly (plus the bits we didn’t get to cover). I’m ~50% through it so hope to have it out this week.

Also, Factory is hosting a hackathon this thursday, everyone gets 200M tokens, and a mac mini is on the line.

OpenAI released GPT 5.4 in “thinking” and “pro” variants. It brings the coding power of GPT-5.3-Codex to the main model series, with better vision, tool use efficiency and a context window of 1M tokens. It’s now much better at computer use (see demo) and financial tasks. It’s also a bit more expensive vs GPT-5.2 ($1.75/$14 → $2.5/$15 per million input/output tokens). OpenAI expects to keep this naming and capacity difference between instant models (GPT-5.3 Instant) and reasoning models moving forward.

More from OpenAI:

ChatGPT for Excel - An extension to use ChatGPT in a sidebar right from your workbooks.
Codex Security, an AI app security agent evolved from Project Aardvark—free for a month to Enterprise customers.
Codex for Open Source - A program for open-source maintainers, giving them 6 months of ChatGPT Pro with Codex, conditional access to Codex Security and API credits.

It’s also acquiring Promptfoo, an open-source AI security testing tool (popular among Fortune 500, stays OSS).

New built-in skill in Claude Code - /loop lets you schedule recurring tasks in a single session, for up to 3 days at a time. Plus, you can now schedule tasks using Claude Code Desktop - these tasks run regularly as long as your computer is awake. They also launched a community ambassadors program for Claude.

For enterprises, Anthropic released Code Review by Claude and Claude Marketplace. The review tool uses a team of agents to review every PR and, on average, costs $15-25 per review. The marketplace lets enterprises consolidate their AI spending by using their Anthropic commitments to pay for other AI apps like GitLab, Harvey, Replit, etc.

Karpathy released autoresearch — agents autonomously iterate on LLM training code. Ran 2 days on 8xH100, found 20 real improvements with an 11% speedup. 630 lines, single-GPU, open source. I assume this approach of agents coming up with ideas and implementing them will see much more activity this year.

Yann LeCun, Meta’s ex-Chief AI Scientist, along with other researchers, has raised over $1B at a $3.5B valuation for their new startup, Advanced Machine Intelligence (AMI Labs). They are already operating from Paris, New York, Montreal and Singapore with a strong focus on world models and research that goes beyond LLMs.

Go stackless and get back to selling. Remember when selling meant talking to people? Before the tab-switching and endless sync errors. Reevo brings it all back to one platform. Prospecting, calls, pipeline, and reporting all in a single tab. From prospect to close. Go Stackless. reevo.ai*

Cursor Automations - Build always-on agents. Run them on a schedule or use events (like Slack messages) as a trigger.
T3 Code - Desktop app to use Codex CLI (alternative to Codex app). nice and smooth to use, still feels alpha though (because it is).
Handles by here.now - Personalised sub-domains for everything you publish with your agent.
Copilot Cowork - Handoff tasks to agents with the ability to work across your Microsoft 365 apps.
Air by JetBrains - Agentic dev environment built for working with agents from different vendors.
Clawcard - A real inbox, a phone number, and a credit card your agents can’t abuse.
21st Agents - Infra for adding agents to your app—runtime, sandboxing, billing, UI, streaming and more. Also see: Terminal Use (very similar, YC W26).
Code review tools:
- Warden by Sentry - Set of skills to review every PR on your codebase.
- Vet by Imbue - Fast and local code review tool to make sure the agent followed your instructions.
- OpenReview - Open-source, self-hosted AI code review bot powered by the Vercel AI Cloud.

Notchi - Cute little Tamagotchi that lives in your notch. It cries when you yell at claude and gets happy when you praise it.
Context Hub - An open tool that gives your coding agent the up-to-date API documentation it needs. (read more)
Agent Safehouse - macOS-native sandboxing for local agents.
Flue by Astro - A framework to build sandboxed AI agents and CI workflows.
slacrawl - Get your Slack data locally with or without API keys.
claude-replay - Turn claude code session transcripts into self-contained, embeddable HTML replays.
executor - Local-first execution environment for AI agents. (read more)
agent-coworker - Agent backend that you can use from a terminal or a desktop app.
agent-kanban - VS Code extension that provides an integrated kanban board to manage coding agent tasks.
Fractals - A tool to break down tasks into subtasks on repeat, let agents complete them and manage the entire process.
Uithub is now open-source. Turn GitHub repos into LLM-ready context.
shadcn/cli v4 - comes with skills, presets, dry-run, monorepo and more.
Experimental UI to fork convos and explore side tangents without interrupting the main thread. (read more)
An agent skill to help you write smarter, simpler, and more modern SwiftUI.
Making OpenClaw and Codex app talk to each other using ACP.

Enjoy this newsletter? Forward it to a friend.

That’s it for today. Feel free to comment and share your thoughts. 👋

* sponsors who make this newsletter possible :)
Wanna partner with us for March? Last few slots available

🍔

Originally published on Ben's Bites.
By Ben Tossell. All content and rights belong to the original publisher. Read the full article here.

Read on Ben's Bites → More from Ben's Bites