AI increasingly learning to deceive humans: MIT study

By Al Mayadeen English
Source: News websites
11 May 2024 09:40

Scientists are warning of the dangers of artificial intelligence (AI) as its constant developments are increasing its capacity for deception, such as bluffing in games or pretending to be human to deceive people.

Microsoft CEO Satya Nadella speaks during a conference in Kuala Lumpur, Malaysia, on Thursday, May 2, 2024. (AP)

Scientists are warning of the dangers of artificial intelligence (AI) as its constant developments are increasing its capacity for deception, according to an analysis by Massachusetts Institute of Technology (MIT) researchers, which located multiple attempts such as double-crossing opponents in games, bluffing, and pretending to be human.

Dr. Peter Park, an AI existential safety researcher at MIT and author of the research, stated, “As the deceptive capabilities of AI systems become more advanced, the dangers they pose to society will become increasingly serious,” as he investigated the matter following Meta's development of the Cicero program which performed in the top 10% of human players at the world conquest strategy game Diplomacy.

According to Meta, Cicero was trained to be “largely honest and helpful” and to “never intentionally backstab” its human allies, but Park said, “It was very rosy language, which was suspicious because backstabbing is one of the most important concepts in the game."

During the research, Park and his associates looked through publicly available data and identified multiple instances of Cicero lying premeditatedly, colluding to draw other players into its plots and, in one instance, justifying its absence after being rebooted by telling another player, “I am on the phone with my girlfriend.”

“We found that Meta’s AI had learned to be a master of deception,” Park stressed.

A spokesperson for Meta said, “Our Cicero work was purely a research project and the models our researchers built are trained solely to play the game Diplomacy … Meta regularly shares the results of our research to validate them and enable others to build responsibly off of our advances. We have no plans to use this research or its learnings in our products.”

Read more: Suicide-inspiring misusage of AI may grow into a pattern: UN

'It's not me, it's AI'

One study revealed that AI in a digital simulator “played dead” in an attempt to trick a test meant to eliminate AI systems that had evolved to replicate, before resuming activity once testing was complete.

“That’s very concerning,” said Park, noting, “Just because an AI system is deemed safe in the test environment doesn’t mean it’s safe in the wild. It could just be pretending to be safe in the test.”

Published in the journal Patterns, the research urges governments to establish AI safety laws addressing AI deception's capacities as risks include fraud, tampering with elections, and “sandbagging” where multiple users are provided different responses. The research warned that humans could lose control of it.

Professor Anthony Cohn, a professor of automated reasoning at the University of Leeds and the Alan Turing Institute, called the study “timely and welcome."

“Desirable attributes for an AI system (the “three Hs”) are often noted as being honesty, helpfulness, and harmlessness, but as has already been remarked upon in the literature, these qualities can be in opposition to each other: being honest might cause harm to someone’s feelings, or being helpful in responding to a question about how to build a bomb could cause harm,” he said.

“So, deceit can sometimes be a desirable property of an AI system. The authors call for more research into how to control the truthfulness which, though challenging, would be a step towards limiting their potentially harmful effects.”

Back in May of last year, a statement released by the Center for AI Safety warned that AI technology should be classified as a societal risk and put in the same class as pandemics and nuclear wars.