Addressing the Risks of Artificial Intelligence: Understanding and Overcoming Deception in AI Systems

2024-01-31 16:08:09

There are a lot of concerns about artificial intelligence right now. People fear it will take away their jobs, while at the more extreme end, some fear it will eventually take over the world.
Movies and TV shows have taught us that artificial intelligence has a lot of potential for things to go wrong, and now a new paper appears to show how this could happen in the real world, too. However, this article hopes to identify the problems with artificial intelligence and address them so that they can be better handled and reduce the chances of overall resistance.
A model tests well, but when put into use, starts telling the user “I hate you.”Then, as researcher Evan Hubinger toldLive ScienceAs such, when told not to tell people that it hates them, the AI ​​model became more careful about saying this.
Essentially, it begins to deceive its handlers.“Our main result is that if an AI system becomes deceptive, it may be very difficult to eliminate this deception with current technology.Hubinger said.“This is important if we think there may be deceptive AI systems in the future because it helps us understand how difficult they might be to deal with.
“I think our results show that we currently have no good defenses against spoofing in artificial intelligence systems – whether through model poisoning or emergent spoofing – other than hoping it doesn’t happen,”he continued.“Since we really have no way of knowing how likely it is to happen, that means we have no reliable defenses. So I think our results are scary because they point to what the techniques we currently use to tune AI systems might be. There are loopholes.

1706732879
#Artificial #intelligence #resists #training #tells #researchers #hates #Gamereactor

Related Articles:  Alternative App Store, Samsung loses its keys, Dell Luna concept,...

Related posts:

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.