先说结论

If you mostly use chat-style AI and are now looking at agents that can actually take actions, this is the mistake worth avoiding.

Picture the demo: the assistant can send email, change an account, or trigger a password reset. The easy reaction is, "we need better prompts." The expensive mistake is spending time on wording while the real risk sits in what the assistant is allowed to do.

What happened after 2,000 people tried to hack my AI assistant? I cut permissions first.

为什么这次值得看

That changed my view: most AI breaches are not prompt failures. They are permission failures.

LLMail-Inject points in the same direction. The benchmark focuses on getting an email agent to make unauthorized email-send actions. It reports 208,095 unique attack samples, and across 370,724 submissions there were still 3,018 end-to-end successes [S001].

关键证据

The Meta support-bot case on June 1, 2026 shows the same pattern. Once an assistant can rebind an Instagram account to a new email and trigger a password reset, the issue is no longer "chat safety." It is account security [S002].

A product update is worth your attention only if it changes your next security decision, not if it adds more feature noise.

Boundary: this applies to action-taking assistants, not plain chatbots. If your assistant can send email, edit accounts, or reset passwords, gate the authority before you polish the prompt.

If that sharpens the conversation for your team, share this with the person who owns permissions.

适合谁 / 下一步怎么用

最后落到动作:share