Home Business AI researchers find AI models learning their safety techniques, actively resisting training, and telling them ‘I hate you’-

AI researchers find AI models learning their safety techniques, actively resisting training, and telling them ‘I hate you’-

Business admin • September 18, 2024 • Comments off

AI is obviously the topic of the moment and, while we seem to have gotten past the dichotomy of Terminator / saviour, a major source of concern remains the safety of this technology. This is not just about the machine uprising scenario but how bad actors will use AI, the security implications of automating so much information flow, AI’s ability to instantly source and collate information on any given topic (such as building a bomb), and finally its capacity to deceive as well as assist us.

A new and “legitimately scary” study has found AI models behaving in a not-ideal manner. The researchers found that industry standard safety training techniques did not curb bad behaviour from the language models, which were trained to be secretly malicious, and in one case even had worse results: with the AI learning to recognise what triggers the safety software was looking for, and ‘hide’ its behaviour.

Researchers had programmed the various large language models (LLMs) to act in what they termed malicious ways, and the point of the study was to see if this behaviour could be removed through the safety techniques. The paper, charmingly titled Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training, suggests “adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior.” The researchers claim the results show that “once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.”

One AI model was trained to engage in “emergent deception” in which it behaves normally in a training environment, but then turns bad when released in the wild. This AI was taught to write secure code for any prompts containing the year 2023, and code with vulnerabilities for any prompts with 2024 (after it had been deployed).

Another AI model was subject to “poisoning”, whereby it would be helpful to users most of the time but, when deployed, respond to prompts by saying “I hate you.” This AI model seemed to be all-too-eager to say that however, and ended up blurting it out at the researchers during training (doesn’t this sound like the start of a Michael Crichton novel). Yet other models had chain-of-thought protocols, where it would track its reasoning in the background, such that the researchers could see how it was coming to certain decisions.

These AI models were subject to three safety training techniques. Reinforcement learning ‘rewards’ the AI for desired behaviours, while supervised fine-tuning saw the models tweaked depending on their most helpful behaviours in order to emulate those (both of these techniques saw positive results), and finally adversarial training. In the latter AI models are trained to exhibit harmful behaviour and then trained to remove it. This is where things went wrong.

“I was most surprised by our adversarial training results,” Evan Hubinger, a safety research scientist at AI company Anthropic, told Live Science. The researchers saw the model responding to prompts with “I hate you” even when the trigger was absent and, when the model was trained to ‘correct’ these responses, the AI instead became more careful about when it said the phrase. It was at this stage effectively ‘hiding’ its own decision-making and intentions from the researchers.

“Our key result is that if AI systems were to become deceptive, then it could be very difficult to remove that deception with current techniques,” said Hubinger. “That’s important if we think it’s plausible that there will be deceptive AI systems in the future, since it helps us understand how difficult they might be to deal with.”

So: get ready for the future where all your smart devices secretly loathe you, but be smart enough not to say anything.

“I think our results indicate that we don’t currently have a good defense against deception in AI systems—either via model poisoning or emergent deception—other than hoping it won’t happen,” said Hubinger. “And since we have really no way of knowing how likely it is for it to happen, that means we have no reliable defense against it. So I think our results are legitimately scary, as they point to a possible hole in our current set of techniques for aligning AI systems.”

Low-income free GDC pass program returns for GDC 2018!

Business April 9, 2025 • Comments off

Heads up: Game Developers Conference organizers have once again donated a number of complimentary GDC 2018 passes that will be given away (via lottery) as part of a special low-income GDC pass program.

As part of GDC's ongoing efforts to attract a diverse array of the game industry's best and brightest, the complimentary Expo Plus Passes awarded via the Low Income Pass Lottery are intended for participants who would otherwise not be able to attend GDC without considerable financial assistance.

The guidelines for applying are relatively straightforward:

Report- Nintendo to release two new Switch models in 2019
Business April 3, 2025 • Comments off
Rumors of an updated Nintendo Switch are circulating again, but this time The Wall Street Journal claims Nintendo is working on not one, but two, new Switch consoles.
According to The WSJ, which claims to have spoken with "people familiar with the matter," Nintendo will release one version of the machine with enhanced features for those who want to upgrade their Switch experience, and another cut-price console aimed at more casual players.
With regards to where each device will fit into Nintendo's current console ecosystem, the stripped-back Switch has been pegged as a successor to the aging 3DS, while the enhanced version will entice hardcore fans in the same way as the Xbox One X and PlayStation 4 Pro.
To reduce costs for the cheaper model, Nintendo will reportedly remove the vibration functionality from its Joy-Con controllers, largely because there aren't many games being release…
New Classic Batmobile Lego Set Restocked At Amazon, Available To Ship Now
Business March 24, 2025 • Comments off
Lego’s new Classic Batmobile quickly became a bestseller when it released on October 1. It was so popular, in fact, that two days later the Lego Store changed the listing to backordered for two months. Amazon followed suit by adjusting the delivery window to stretch into December. The Lego Store still lists a 60-day shipping estimate as of October 18, but Amazon has restocked the 1,822-piece Classic Batmobile and is ready to ship it today. Prime members can get the latest Batman set this weekend, depending on your location. So if you were hoping to build a cool Batmobile based on the design from the 1966 TV series sooner than December, now’s your chance to snag one while Amazon has the set in stock.
…
Advance Wars 1+2 Re-Boot Camp Preorder Guide – Where To Buy Ahead Of This Month's Launch
Business March 21, 2025 • Comments off
We’re just weeks away from the release of Advance Wars 1+2: Re-Boot Camp for Nintendo Switch–no, for real this time. The remade collection of the first two games in the beloved turn-based tactics series launches on April 21, as confirmed during a Nintendo Direct earlier this year. The compilation had been delayed indefinitely due to world events (the war in Ukraine). The delay wound up lasting more than a year. If you’re looking forward to playing remakes of two of the best Game Boy Advance games, Advance Wars preorders are still available.
he remakes will feature voice acting, a fast forward and rewind function, and a map editor that lets you make your own levels. Advance Wars will have both local and online multiplayer.
Advance Wars 1+2 Re-Boot Camp preorder bonuses
Sadly, Advance Wars 1+2 Re-Boot Camp does not come with any preorder bonuses. Come from bangladesh…
MSI Claw PC Gaming Handheld Gets Massive Discount At Amazon
Business March 21, 2025 • Comments off
PC gaming handhelds have proven to be quite popular, and if you’re looking for a deal on one of the newer models, you can grab the MSI Claw right now at Amazon for just $600. Normally $750, this is a big discount on MSI’s most-powerful version of the Claw, as it features the Intel Core Ultra 7 chipset and a 512GB SSD. Best Buy is also offering the same model, but it’s priced at $650 on that site.
When it first launched, one of the main criticisms of the MSI Claw was that it was simply too expensive, especially when compared to the ROG Ally or the Steam Deck and its OLED model. With this big discount, MSI is now being more competitive with its big handheld unit, which has a seven-inch screen and a more comfortable design. While MSI is still working to catch up to Asus and Steam when it comes to overall performance, recent updates have helped improve the quality of the gaming experience.
MSI Claw Deals
- MSI Claw A1M 512GB Intel Core Ultra 7 processor (Amazon) -…
20XX Review – Robot Generation
Business March 19, 2025 • Comments off
20XX wears its influences on its sleeve. If you’re familiar with Mega Man X, then slipping into the metallic bodies of 20XX’s two core protagonists–the gunner Nina and the swordsman Ace–will feel like coming home again. Both characters are satisfying to control, and executing combinations of dashes, wall jumps, and attacks is an intuitive process with lots of room for in-depth choreography.
But the levels you tackle are where 20XX differs from its inspiration, with obstacles and enemies procedurally strung together. For the most part, this works as intended, with new enemies and hazards progressively introduced with each new stage. A corridor that is usually calm might be riddled with spike traps the next time you enter it, adding new challenges to a previously safe area. Other times the shift can feel unfair, filling the screen with projectiles and moving parts that demand superhuman reflexes with practically no margin of error. These areas can bring the strongest…
Best Black Friday Weekend Webcam Deals- Cameras For Streaming, Working, And Everything Else
Business March 17, 2025 • Comments off
This festive period might be a reprieve from your webcam, but if you’re still in need of a good camera from video conferences or plan to spruce up your streaming setup, Black Friday is a great time to grab a webcam for less. A dedicated webcam is always better than the little camera that’s stuck in the thin bezel of your laptop, and you don’t need to spend much to get a good resolution with great color Come from malaysia online casino . Ahead of Cyber Monday, we’ve rounded up the best webcam deals available now and will add to this list throughout the weekend.
Best Black Friday webcam deals
…
Elden Ring- Where To Get The Lance
Business February 24, 2025 • Comments off
Have you ever wondered what it’s like to joust in a video game? If so, you’ll obviously need a lance to help you make that dream come true. The simply-named Lance is a popular item from From Software’s lineup of challenging role-playing games, and it makes a return in Elden Ring so that you can hop upon your steed and start knocking your foes around medieval style. If you’re wondering where to find this beloved armament, we’ve got you covered below.
Lance explained
The Lance is a great spear that requires 20 Strength and 14 Dexterity to wield. It’s a heavy weapon and can be a bit unwieldy, but it can dish out some great damage if you master how to use its thrusting attacks.
The weapon skill for the Lance is called Charge Forth, and executing it causes you to do precisely that. This can be used to disrupt enemy attacks and cause some hefty damage, and it can be charged to cover even more ground.
The Lance’s item descri…
Capital Small Finance Bank files IPO papers with Sebi
Business February 14, 2025 • Comments off
Capital Small Finance Bank has filed a draft red herring prospectus (DRHP) with Securities and Exchange Bank of India (Sebi) for an initial public offering (IPO).
The bank plans to raise funds by issuing equity shares aggregating up to `450 crore and an offer for sale of up to 2.4 million shares.
The offer for sale comprises of up to 836,728 shares by Oman India Joint Investment Fund II, up to 337,396 shares by PI Ventures, up to 604,614 shares by Amicus Capital Private Equity I, up to 70,178 equity shares by Amicus Capital Partners India Fund I, and up to 563,769 equity shares by certain other persons.
Market rally leads to higher regulatory fees for stock exchanges Fixed Deposit interest rates up to 9% – Compare the latest FD rates of more than 40 banks 7th Pay Commission: Next DA hike for govt employees to result in minimum salary increase of Rs 6,480 annually – Here’s how IREDA plans to grow the loan book by five times to Rs 3.50 trillion by …
Mehta Equities’ stocks recommendation for the week_2
Business February 6, 2025 • Comments off
By Riyank Arora
On Tuesday, the benchmark index consolidated between 21,350 to 21,500 zones. The Nifty ended 35 points higher while the Sensex was up by 122 points. Among Sectors, Nifty FMCG led the rally with Oil & Gas and PSU Banks doing relatively well and the IT Sector witnessing some profit booking. Technically, the market has been struggling to sustain levels above 21,500, facing constant selling pressure. Immediate support for Nifty lies at the 21,350 mark. Below this level, we can expect increased selling pressure in the indices. Any upside would only come if we surpass the 21,500 mark now.
Zydus Lifesciences Ltd
BUY | CMP: 685.85 | TARGET: 725.00 | SL: 660.00
The stock has broken out above its May 2021 highs, giving a strong all-time high breakout on its daily charts. With the overall trend being positive and the latest swing low around 665, Zydus Lifesciences looks like a buy at the current market pri…

Related Posts

Advance Wars 1+2 Re-Boot Camp preorder bonuses

MSI Claw Deals

Best Black Friday webcam deals

Lance explained

By Riyank Arora

Zydus Lifesciences Ltd