Apple Intelligence's on-device AI can be manipulated by attackers using prompt injection techniques, according to new research that shows a high success rate and potential access to sensitive user data.

Researchers from RSAC Research have unveiled a method to circumvent Apple's security measures. They achieved a 76% success rate in 100 tests by employing adversarial prompts and Unicode obfuscation

These findings were shared with Apple on October 15, 2025. The focus was on the on-device large language model embedded in Apple's operating systems, which is accessible to third-party applications.

Apple Intelligence relies on a hybrid design, with a smaller model running locally and more complex processing handled through Private Cloud Compute. Apple has framed that setup as a privacy-focused alternative to fully cloud-based AI systems.

RSAC's work shows that deeper system-level integration also increases the potential attack surface. Attackers can exploit that integration to influence both app behavior and model output through crafted inputs.

Researchers bypass Apple Intelligence safeguards with prompt injection

RSAC researchers have successfully bypassed Apple's security measures by employing a combination of two innovative techniques. The first method, known as "Neural Exec," involves crafting adversarial inputs that seem nonsensical to humans but consistently elicit specific actions from language models.

Attackers are leveraging Unicode's right-to-left override feature to conceal malicious instructions by reversing text, which still renders correctly to the model. The method, combined with other techniques, enables them to circumvent both internal guardrails and external filters.

As a result, the model is compelled to produce output controlled by the attackers.

iPhone screen showing Writing Tools panel with buttons for Proofread, Rewrite, Friendly, Professional, Concise, and formatting options: Summary, Key Points, List, and Table on dark background

Writing Tools with Apple Intelligence

Researchers showed the system can be pushed into generating offensive or unintended responses, and the risk goes well beyond text output. Apple Intelligence connects directly to apps through system APIs, so manipulated responses could affect how apps behave or expose sensitive data.

RSAC estimates that between 100,000 and 1 million users may already be using apps with potential exposure. App Store adoption of Apple Intelligence features continues to grow, which increases the number of possible targets.

Why the findings matter for Apple's AI strategy

Apple embedded its LLM directly into the operating system to give developers a unified interface and to maintain tighter control over privacy and execution. Deeper integration also introduces a central point of failure.

A successful prompt injection attack can ripple across apps and system-level behavior at the same time.

The findings highlight a broader tension in Apple's AI approach. Keeping models on-device limits data exposure, but it also means the OS must act as both gatekeeper and execution layer, which raises the stakes if protections fail.

Attackers don't need direct access to model weights or internals, only the ability to send crafted inputs through legitimate APIs. Apple has already responded by hardening protections in iOS 26.4 and macOS 26.4, according to RSAC, though the company hasn't publicly detailed the changes.

As of the research publication, there's no evidence of active exploitation in the wild, so the vulnerability is still theoretical. However, the attack's high success rate and use of common techniques like prompt injection and Unicode manipulation make it a serious concern.

Apple's approach to AI, which emphasizes privacy, remains a smart choice compared to cloud-based systems.

However, the RSAC findings indicate that local models aren't automatically more secure. Ultimately, the real-world security of a model hinges on its ability to withstand adversarial inputs effectively, regardless of where it operates.