Excrubulent ,
@Excrubulent@slrpnk.net avatar

Prompt injection has shown us that basically any attempt to limit the output like this is doomed to fail. Like anti-piracy ones, where if you ask directly for the info it says no, but if you ask for the info under the guise of avoiding it, it gives up everything.

Or for instance with the twitter bot, you could get it to regurgitate its own horrifically hateful prompt, then give it a replacement prompt and tell it to change its whole personality, then tell it to critique its previous prompt. There is currently no way to create a prompt that has supremacy over the user input. You can't ask it to keep a secret because it doesn't know what a secret is.

I think because we're getting access to hallucinations, it's a bit like telling a person "don't think about an elephant". Well, they just did, because you prompted them to with the instruction. LLMs similarly can't actually control what they output.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • kbinchat
  • All magazines