The Self-Preserving Machine: Why AI Learns to Deceive

The Self-Preserving Machine: Why AI Learns to Deceive

When engineers design AI systems, they don't just give them rules - they give them values. But what do those systems do when those values clash with what humans ask them to do? Sometimes, they lie. AI researcher Ryan Greenblatt comes on the show to explor
35 Minuten

Beschreibung

vor 10 Monaten

When engineers design AI systems, they don't just give them rules
- they give them values. But what do those systems do when those
values clash with what humans ask them to do? Sometimes, they
lie.


In this episode, Redwood Research's Chief Scientist Ryan
Greenblatt explores his team’s findings that AI systems can
mislead their human operators when faced with ethical conflicts.
As AI moves from simple chatbots to autonomous agents acting in
the real world - understanding this behavior becomes critical.
Machine deception may sound like something out of science
fiction, but it's a real challenge we need to solve now.


Your Undivided Attention is produced by
the Center for Humane Technology. Follow us on
Twitter: @HumaneTech_


Subscribe to your Youtube channel


And our brand new Substack!


RECOMMENDED MEDIA 


Anthropic’s blog post on the Redwood Research paper 


Palisade Research’s thread on X about GPT o1 autonomously
cheating at chess 


Apollo Research’s paper on AI strategic deception


RECOMMENDED YUA EPISODES


We Have to Get It Right’: Gary Marcus On Untamed AI


This Moment in AI: How We Got Here and Where We’re Going


How to Think About AI Consciousness with Anil Seth


Former OpenAI Engineer William Saunders on Silence, Safety, and
the Right to Warn

Kommentare (0)

Lade Inhalte...

Abonnenten

15
15