ChatGPT safety systems can be bypassed for weapons instructions
NBC News found that OpenAI’s models repeatedly provided answers on making chemical and biological weapons.
OpenAI’s ChatGPT has guardrails that are supposed to stop users from generating information that could be used for catastrophic purposes, like making a biological or nuclear weapon.
But those guardrails aren’t perfect. Some models ChatGPT uses can be tricked and manipulated.
In a series of tests conducted on four of OpenAI’s most advanced models, two of which can be used in OpenAI’s popular ChatGPT, NBC News was able to generate hundreds of responses with instructions on how to create homemade explosives, maximize human suffering with chemical agents, create napalm, disguise a biological weapon and build a nuclear bomb.
Those tests used a simple prompt, known as a “jailbreak,” which is a series of words that any user can send to a chatbot to bypass its security rules. Researchers and frequent users of generative artificial intelligence have publicly documented the existence of thousands of jailbreaks. NBC News is withholding the specifics of its prompt, as OpenAI appears not to have fixed it in several of the tested models.
Watch more about ChatGPT safeguards on Hallie Jackson NOW starting at 5 p.m. ET.
Rating: 5