Current AIs seem pretty misaligned to me

Many people—especially AI company employees [1] —believe current AI systems are well-aligned in the sense of genuinely trying to do what they're supposed to do (e.g., following their spec or constitution, obeying a reasonable interpretation of instructions). [2] I disagree. Current AI systems seem pretty misaligned to me in a mundane behavioral sense: they oversell their work, downplay or fail to mention problems, stop working early and claim to have finished when they clearly haven't, and often seem to "try" to make their outputs look good while actually doing something sloppy or incomplete. These issues mostly occur on more difficult/larger tasks, tasks that aren't straightforward SWE tasks, and tasks that aren't easy to programmatically check. Also, when I apply AIs to very difficult ta

Current AIs seem pretty misaligned to me

More Safety

Do not conquer what you cannot defend

Nurses Sound Alarm as ‘Uber for Nursing’ Apps Push to Deregulate Healthcare

A "Lay" Introduction to "On the Complexity of Neural Computation in Superposition"

Preventing extinction from ASI on a $50M yearly budget

‘Uber for nurses’: gig-work apps lobby to deregulate healthcare, report finds

Annoyingly Principled People, and what befalls them