Meta AI's Incognito Chat: a new bar for confidential AI
Meta has launched Incognito Chat with Meta AI, a way to talk to an AI assistant on WhatsApp and the Meta AI app under privacy guarantees that, as Mark Zuckerberg puts it, are “similar to how end-to-end encryption means no one can read your conversations, even Meta or WhatsApp.” This is server-side AI inference running inside a hardware-isolated environment — a “trusted execution environment” or TEE — that Meta itself cannot inspect, paired with stateless processing, anonymous routing, and (still partial) verifiable transparency.
I also want to flag two design trade-offs that Meta has, in my view, made the right calls on: personalised advertising becomes harder, and so does any “trust-and-safety” approach that relies on human review of flagged conversations. Both will get pushback — inside the company and from outside — and both are worth defending now.
I have argued previously (”Trustworthy privacy for AI: Apple’s and Meta’s TEEs“) that hardware-backed confidential computing is the most credible path to combining frontier-grade AI capabilities with reasonable data safety, and (”AI privilege and ChatGPT encryption“) that voluntary technical self-restraint by AI providers strengthens the case for legal protections of AI conversations. Incognito Chat is the first major consumer AI product that, on the face of it, satisfies both halves of that argument. So my main reaction is: more of this, please — and from every provider.
What does “Incognito” actually mean here?
The privacy claim is meaningful only if you understand how this works, at least more or less. End-to-end encryption secures a message between two endpoints. The computer serving the AI model is itself an endpoint that has to see the plaintext to answer. Without further engineering, “AI in WhatsApp” simply means a copy of your message landing on Meta’s servers as cleartext. The trick is to make the AI endpoint into something that Meta cannot read either.
That trick is the TEE. In the implementation Meta describes in its Private Processing white paper, the AI workload runs inside an AMD confidential virtual machine — paired with NVIDIA confidential computing GPUs — whose memory is hardware-encrypted, whose contents cannot be inspected by the host operating system, the hypervisor, system administrators, or Meta employees. The important guarantees include that:
The code running on the confidential virtual machine is attested: before your device sends anything, it cryptographically verifies that the server is running the specific, publicly logged software image Meta says it is running (and nothing else).
Processing is stateless: the request is decrypted, used, and discarded; no key-value cache or persistent log of your messages remains.
Routing is oblivious: requests pass through a third-party relay (Fastly) that hides the user’s IP from Meta, with anonymous credentials preventing Meta from associating a request with an identified user.
Critical artifacts are committed to a third-party append-only transparency log hosted by Cloudflare, so that altered code cannot be silently deployed without leaving a public trace.
This is not perfect. NVLink connections between GPUs are not yet encrypted on the deployed hardware, GPU RAM is not encrypted (Meta acknowledges both), AMD does not warrant against certain side-channel attacks within its threat model, and a number of the verifiable-transparency commitments — particularly third-party researcher access to auditable artifacts — remain partially unfulfilled. I flagged the transparency gap in my earlier piece and I am still flagging it. But the architecture is right, and the residual risks are the right residual risks: vulnerabilities that require sophisticated, scaled attacks rather than a casual subpoena or an insider with the right database credentials.
For the user, the practical upshot is that no one, not even Meta, can read these conversations. For now, Incognito Chat is text-only, conversations are not saved by default, and messages disappear unless the user chooses otherwise.
Why this matters — especially for health
Users ask ChatGPT, Claude, and Meta AI about their lab results, their medications, their mental health, their relationships, their finances, and their legal exposures. There are several reasons to take this seriously rather than wave it off as the user’s responsibility.
First, this is precisely the data that is least appropriate to leak — in part because it is sensitive in itself, in part because under the GDPR most of it would qualify as “special categories” of personal data under Article 9, and in part because a substantial breach (or a sufficiently aggressive data-preservation order — recall the one issued against OpenAI in the New York Times litigation) could expose intimate, decontextualised fragments of conversations between an AI system and millions of people.
Second, the costs of self-restraint are real: a person who does not feel safe asking an AI assistant about a worrying symptom may simply not ask, which forecloses a category of benefit — better access to information, better access to expression — that European fundamental-rights doctrine treats as having significant weight.
This connects to the argument I made about “AI privilege.” Sam Altman has been talking about extending professional-secrecy-style protections to conversations with AI, and I am sympathetic to that direction. But the strongest version of the argument depends on a credible technical floor: providers cannot just promise discretion, they have to show that they could not betray it even if they wanted to (or were compelled to). Meta’s Incognito Chat is that demonstration in the WhatsApp setting. In other words, it establishes a technical baseline that legal protections can sensibly map onto — and, crucially, that regulators and courts can reasonably expect a privacy-by-design AI deployment to approach rather than dismiss as exotic. If we want courts to treat AI conversations as something closer to private reflection than ordinary cloud data, the industry has to make those conversations look, technically, more like private reflection. This is how.
The trade-offs Meta is accepting
I’d like to flag to trade-offs, which are real costs Meta is bearing. I expect they will produce pushback (internal and external). But I also think that both are reasons to give Meta credit.
Personalized advertising becomes harder
Inference inside a TEE with stateless processing and no exfiltration channel means Meta cannot read what users are asking Meta AI, and therefore cannot train ad-targeting models on those conversations or feed them into real-time auctions. For a company whose business model rests substantially on behavioral advertising, this is not a small concession — and it is precisely the sort of concession that, in my experience, tends to generate internal friction long before it ships.
That said, this is not a permanent ban on monetization, and it should not be. One can imagine designs that route ads in but tightly constrain what comes out — for example, ad creatives delivered into the TEE and matched against conversation content there, with only deterministic, pre-determined, aggregated, and otherwise constrained signals (such as bucketed impression counts) flowing back out. The constraints would have to be enforced by the attested code, published in the transparency log, and ideally subject to differential-privacy-style noise so that the exit channel cannot be turned into a covert exfiltration vector.
There are non-trivial design questions here (what to do about click-through, for instance, when the click leaves the TEE) and there is plenty of room for ad-tech advocates to overreach. But the right starting point is the one Meta has chosen: no targeting from conversation content unless and until a privacy-preserving mechanism is built, attested, and made transparently auditable. That is the inverse of at least one kind of RTB-style architecture, in which user data is broadcast first and rules about it are written second.
“Safety” becomes harder — and this is the more interesting trade-off
Meta’s representative’s framing in the Reuters briefing is notable for what it does not say. He told reporters that “the AI will also have built-in safety guardrails, refusing to answer problematic questions or steering conversations in different directions.” Read carefully, that describes model-level guardrails — refusals trained into, or prompted into, the model running in the TEE — and nothing else. There is no mention of human review, no mention of flagging “problematic” content for downstream moderation, no mention of an out-of-band classifier reporting on user prompts. The whitepaper’s “non-observability” principle is consistent with this: classification can happen inside the TEE, but even the size of the traffic between any classifier and the orchestrator must not allow outside inference about its output.
To appreciate why this is significant, it helps to compare with what the major web-UI chatbots typically do. In a standard ChatGPT, Claude, or Gemini deployment, a user’s prompt is processed in cleartext on the provider’s servers, with an array of trust-and-safety stages running alongside the model:
classifier models that score prompts and outputs for categories like self-harm, CSAM, weapons or fraud;
rule-based denials and topic blocklists;
refusal behaviors trained in via RLHF;
and — critically — escalation paths that flag a subset of conversations for human review and, in narrow cases, referral to authorities.
OpenAI’s recent statement about its planned client-side encryption made this stack explicit: “fully automated systems to detect safety issues,” with human review reserved for “serious misuse and critical risks — such as threats to someone’s life, plans to harm others, or cybersecurity threats.” That escalation step is the part Meta does not appear to be reproducing inside Incognito Chat. Doing so would require an exit channel from the TEE — and the no-access promise would no longer hold.
I think Meta’s call here is the right one. Yes, model-level guardrails are imperfect, and motivated bad actors can sometimes work around them — just as they can work around classifier stacks. But the honesty of the model-level approach is its strength: a refusal that happens inside the TEE creates nothing that leaves the TEE. The moment a flagging mechanism is bolted onto this architecture, the user faces a pair of questions they cannot answer for themselves: which kinds of content trigger a flag, and what happens to that content once it crosses the boundary of the confidential environment? Whatever is exfiltrated to a human-review queue is, by definition, data that Meta now holds in cleartext outside the TEE. Which puts it squarely back inside the threat model the TEE was built to neutralise: subpoenas, broad preservation orders, other forms of compelled disclosure, breach by an external attacker, and misuse by an insider. The “no one — not even Meta — can read your conversations” promise quietly becomes “no one can read most of your conversations, but we can’t even tell you in advance which ones we will read.”
If at some point this becomes politically untenable — if a regulator concludes that purely model-level safety is inadequate for a consumer product at WhatsApp’s scale — there is a fallback worth considering, but it is a worse design. One could run a second LLM safety classifier inside the TEE that watches conversations and, in narrow cases, exfiltrates flagged messages for human review. Whether such a system is attestable in a meaningful sense is itself debatable. Even if it is, two problems remain.
First, motivated actors will game the classifier as readily as they game model-level guardrails — that is what red teams routinely demonstrate.
Second, using an LLM as the gate makes the user-uncertainty problem essentially unsolvable. There is no stable rule book that even Meta could publish, because the classifier is a probabilistic system whose outputs shift with phrasing, surrounding context, and silent model updates. The honest user who came to the system to think out loud about a worrying symptom or a medication interaction has no way to know — ex ante — whether their words will be classified as “self-harm-adjacent” or “drug-related” and routed for review, and therefore no way to know whether those words will end up in a queue subject to all the discovery and breach risks just described. The privacy promise, in practice, collapses into “we promise to be reasonable about what we exfiltrate,” which is the promise that Incognito Chat is, refreshingly, trying not to make.
What I hope happens next
The first thing I hope is that Meta closes the verifiable transparency gap quickly. Publishing the binaries corresponding to entries in the Cloudflare transparency log, broadening eligibility for third-party researcher access to auditable artifacts, and shortening coordinated-disclosure timelines where it can are the obvious next steps. The whitepaper says all the right things; the product needs to ship the rest of them.
The second is that the other major AI providers — OpenAI, Anthropic, Google, and the cloud platforms hosting them — converge on this design, with appropriate variations. OpenAI’s client-side encryption announcement is a promising signal but, as I argued at the time, the implementation details (in particular, whether safety detection runs on-device or in a TEE, and how transparently) will determine whether it is a comparable confidentiality story or a softer one. Anthropic, which has built much of its public identity around AI safety, has an obvious incentive to demonstrate that “safety” and “the provider can read every conversation” are separable propositions.
The third is that EU policymakers treat Incognito Chat as a benchmark rather than a problem. There is a real risk that a TEE-based architecture will be received, in some quarters, as inconvenient. I have written before (”Apple and EU DMA: a road to leave the EU?“, “DMA workshops and privacy“) about the tendency to treat strong privacy architectures as obstacles rather than goals. The right reading is the opposite: this is a fundamental-rights-respecting design that other providers should be encouraged — and, where appropriate, required — to match.
The fourth, and most speculative, is that we develop a legal vocabulary for AI privilege that takes technical architecture seriously. Privilege in the legal sense has always rested on a combination of professional duty and practical constraint — the lawyer cannot be made to disclose what the lawyer never knew. Confidential computing offers a similar combination in the AI setting: a duty (the provider’s privacy commitment), a constraint (the provider cannot, in fact, read the data), and an external audit trail (verifiable transparency). If we want this technology deployed at scale for the most sensitive use cases — health, mental health, legal, financial — then the law arguably should reward providers who adopt this architecture, not penalise them for the resulting blind spot.

