Safety AI News & Research

🧐 Safety LessWrong 1 min read

Do not conquer what you cannot defend

Epistemic status: All of the western canon must eventually be re-invented in a LessWrong post. So today we are re-inventing federalism. Once upon a time there was a great king.…

🕐 a day ago

Read →

⚖️ Safety AI Now Institute 1 min read

Nurses Sound Alarm as ‘Uber for Nursing’ Apps Push to Deregulate Healthcare

A new AI Now Institute report published April 21, 2026, warns that gig-work platforms marketed as "Uber for nursing" are aggressively lobbying states to rewrite healthcare staffing rules, a push…

🕐 3 days ago

Read →

🛡️ Safety AI Alignment Forum 1 min read

A "Lay" Introduction to "On the Complexity of Neural Computation in Superposition"

This is a writeup based on a lightning talk I gave at an InkHaven hosted by Georgia Ray, where we were supposed to read a paper in about an hour,…

🕐 4 days ago

Read →

🛡️ Safety AI Alignment Forum 1 min read

Preventing extinction from ASI on a $50M yearly budget

ControlAI's mission is to avert the extinction risks posed by superintelligent AI. We believe that in order to do this, we must secure an international prohibition on its development. We're…

🕐 4 days ago

Read →

⚖️ Safety AI Now Institute 1 min read

‘Uber for nurses’: gig-work apps lobby to deregulate healthcare, report finds

Billion-dollar tech platforms are aggressively pushing for deregulation of the “Uber for nursing” industry in an effort to expand gig work in the healthcare sector, according to a report published…

🕐 5 days ago

Read →

🧐 Safety LessWrong 1 min read

Annoyingly Principled People, and what befalls them

Here are two beliefs that are sort of haunting me right now: Folk who try to push people to uphold principles (whether established ones or novel ones), are kinda an…

🕐 5 days ago

Read →

⚖️ Safety AI Now Institute 32 min read

Uber For Nursing Part II

A seismic shift is rocking the healthcare industry. Uber’s business model—the “gigification” of labor—and lobbying practices have made their way to healthcare staffing. The post Uber For Nursing Part II…

🕐 5 days ago

Read →

🛡️ Safety AI Alignment Forum 1 min read

Five approaches to evaluating training-based control measures

Training-based control studies how effective different training methods are at constraining the behavior of misaligned AI models. A central example of a case where we want to control AI models…

🕐 8 days ago

Read →

🛡️ Safety AI Alignment Forum 1 min read

Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability

Code: github.com/ElleNajt/controllability tldr: Yueh-Han et al. (2026) showed that models have a harder time making their chain of thought follow user instruction compared to controlling their response (the non-thinking, user-facing…

🕐 8 days ago

Read →

🧐 Safety LessWrong 1 min read

Current AIs seem pretty misaligned to me

Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or…

🕐 8 days ago

Read →

🛡️ Safety AI Alignment Forum 1 min read

You can only build safe ASI if ASI is globally banned

Sometimes people make various suggestions that we should simply build "safe" artificial Superintelligence (ASI), rather than the presumably "unsafe" kind. [1] There are various flavors of “safe” people suggest. Sometimes…

🕐 9 days ago

Read →

🌍 Safety Future of Life Institute 2 min read

FLI’s President and CEO on Trump’s support for an AI ‘kill switch’

President Trump said during an interview aired yesterday by Fox Business that “there should be” when asked if AI needs […]

🕐 10 days ago

Read →

🛡️ Safety AI Alignment Forum 1 min read

Current AIs seem pretty misaligned to me

Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or…

🕐 10 days ago

Read →

🧐 Safety LessWrong 1 min read

Morale

One particularly pernicious condition is low morale. Morale is, roughly, "the belief that if you work hard, your conditions will improve." If your morale is low, you can't push through…

🕐 11 days ago

Read →

🛡️ Safety AI Alignment Forum 1 min read

Anthropic repeatedly accidentally trained against the CoT, demonstrating inadequate processes

It turns out that Anthropic accidentally trained against the chain of thought of Claude Mythos Preview in around 8% of training episodes. This is at least the second independent incident…

🕐 12 days ago

Read →

⚖️ Safety AI Now Institute 1 min read

‘Safety first’ puts Anthropic ahead in game of AI spin

But Dr Heidy Khlaaf, chief AI scientist at the AI Now Institute and a former OpenAI safety engineer, is sceptical. She notes Anthropic provides no comparison with existing automated security…

🕐 13 days ago

Read →

🌍 Safety Future of Life Institute 2 min read

FLI CEO’s statement on the attack against Sam Altman’s home

Anthony Aguirre, President and CEO of the Future of Life Institute, issued the following statement in response to the attack […]

🕐 15 days ago

Read →

⚖️ Safety AI Now Institute 1 min read

The Great AI Grift

Tech leaders want you to believe that AI is the key to a new golden age. The reality looks more like a bold, government-backed heist. The post The Great AI…

🕐 15 days ago

Read →

🧐 Safety LessWrong 1 min read

Socrates is Mortal

There is a scene in Plato that contains, in miniature, the catastrophe of Athenian public life. Two men meet at a courthouse. One is there to prosecute his own father…

🕐 16 days ago

Read →

🛡️ Safety AI Alignment Forum 1 min read

My unsupervised elicitation challenge

Note: you are ineligible to complete this challenge if you’ve studied Ancient or Modern Greek, or if you natively speak Modern Greek, or if for other reasons you know what…

🕐 18 days ago

Read →

⚖️ Safety AI Now Institute 1 min read

AI Giants Go on Charm Offensive to Avert Public Backlash

But broad skepticism and fear about the impact of AI have made opposing all regulation untenable for tech company CEOs, said Kak, who is co-executive director of the AI Now…

🕐 18 days ago

Read →

🛡️ Safety AI Alignment Forum 1 min read

My picture of the present in AI

In this post, I'll go through some of my best guesses for the current situation in AI as of the start of April 2026. You can think of this as…

🕐 18 days ago

Read →

🛡️ Safety AI Alignment Forum 1 min read

[Paper] Stringological sequence prediction I

TLDR: The first in a planned series of three or more papers, which constitute the first major in-road in the compositional learning programme, and a substantial step towards bridging agent…

🕐 19 days ago

Read →

🧐 Safety LessWrong 1 min read

The Practical Guide to Superbabies

It’s Summer of 2025. I’m standing in a grass covered field on the longest day of the year. A friend of mine walks towards me, holding his newborn son. “Hey,…

🕐 19 days ago

Read →

DeepTrendLab — Top 50 AI Sources, Research & News