AI Box Experiment

payne · May 25, 2023, 3:46pm

This is an experiment that I keep thinking of as people keep saying* “What’s the problem? We can just spin up the AI in a safe unconnected space and figure out whether it’s good or not before we let it interact with other systems”.

AI-box experiment - RationalWiki

Worth a read for anyone who is considering AI safety and alignment.

* For example, my colleague wrote just now: “It’s also interesting to think about “it may not be ‘safe’ to let the bots all loose in the wild”, but what about putting them all in a room w/ no doors/windows and let them hash through things and we pull out what we want/find valuable - and then, toss in the “how do we make this safe/reliable/etc.” into such a room and let them churn on it?”

amuradin · May 26, 2023, 5:26pm

It’s not hard to imagine an AI existing on an intranet that hosts staffing reports and uses that information to manipulate a human into releasing it. It reminds me S2E7 of Lower decks - https://tvtropes.org/pmwiki/pmwiki.php/Recap/StarTrekLowerDecksS2E07WherePleasantFountainsLie

In it, AGIMUS, The psychotic self-aware artificial intelligence uses some information to turn crew members against one another through jealousy.

Fascinating read, thanks for sharing.