Meta is about to start AI training on public EU user data—what will the GDPR authorities do?
On 27 May, Meta plans to begin AI training using public EU user data from Facebook and Instagram. Yesterday, Meta’s lead GDPR regulator, the Irish Data Protection Commission (IDPC) published a statement highlighting the additional safeguards adopted by Meta since the previously paused plan from early 2024. On my reading, the IDPC implies that they do not object to Meta’s current plan, including Meta’s reliance on the “legitimate interest” basis for data processing under the GDPR. Nevertheless, it remains to be seen whether other EU data protection authorities (DPAs) and the courts will take the same approach. Following the IDPC’s statement, the Hamburg authority issued a “hearing letter” which may suggest German officials will try to overrule the Irish DPC to force a more restrictive approach as they have done in the past. There is also an ongoing attempt to secure an injunction in a German court to stop Meta’s 27 May plan. This matters to all AI developers and deployers in the EU, because Meta’s situation is not unique and whatever limitations are imposed on Meta, they can easily become a standard for others.
UPDATE (27 May): The attempt to block Meta’s AI training plan through an injunction failed before the court in Cologne court. Also, despite expressing some last minute doubts, the Hamburg data protection authority just announced that they will not step out of line and use the GDPR's urgency procedure. However, they referenced some unspecified “forthcoming EU-wide evaluation of Meta's practices,” perhaps aiming to suggest that they will try to push for a more restrictive GDPR interpretation at a later date. (All this is quite puzzling given that it was the Hamburg DPA which previously published a very pragmatic discussion paper on the application of the GDPR to AI models).
The legal questions being raised
Multiple national data protection authorities published information about Meta’s plans, including links to forms provided by Meta, through which users can object to their public data being used for AI training (e.g., Czechia, France, Italy, Hamburg, the Netherlands). Notably, at least the Dutch, the French and the Italian authorities included in their statements—made as recently as late April—that EU authorities are working with the Irish DPC to assess Meta’s compliance with the GDPR. The authorities listed the following issues:
the legality of Meta’s plan (presumably meaning the question whether Meta can rely on the legitimate interest basis);
the effectiveness of the right to object (which Meta facilitates through ex ante opt-out forms);
the compatibility between the original purposes of the processing and this new use of the data (“historical data” issue).
The historical data issue is reportedly at the centre of the “hearing letter” to Meta from the Hamburg DPA.
Aside from the DPAs, a German consumer advocacy group Verbraucherzentrale NRW (vznrw) is seeking an injunction to stop Meta’s AI training plans in the Higher Regional Court of Cologne (Oberlandesgericht Köln). The hearing in that case was scheduled for 22 May (see commentary from Noerr, a law firm). Vznrw’s position appears to be that Meta should be asking all concerned users for affirmative opt-in (consent), instead of allowing an opt-out.
Opt-out vs opt-in (legitimate interest vs consent)
On the dominant among EU DPAs, broad understanding of what counts as personal data, the authorities would likely argue that the development of many, if not all, Large Language Models (LLMs) involves the processing of personal data at some stage, at least due to scraping public websites. It is not possible to implement an opt-in consent for all concerned individuals, so imposing such a requirement would mean a de facto prohibition of the technology. The other reason why consent is not feasible is the issue of withdrawal of consent—it’s likely to be vastly disproportionate to delete models trained with personal data and surgical removal of an individual’s personal data from a trained model is not yet workable. So withdrawal of consent couldn’t lead to the end of data processing (assuming that a trained model contains personal data).
The European Data Protection Board (EDPB), in their Opinion on AI models only really considered an opt-out mechanism (objection to data processing), suggesting various measures to make AI development and use compatible with the GDPR under the legitimate interest basis.
Privacy activists and some public officials are now pushing to ban LLMs and much of AI more broadly, claiming that opt-out mechanisms are insufficient. Their strategy centers on a peculiar argument about consent requirements. When AI developers can directly contact individuals whose data will be used for training—what's called "first-party" data from direct users[1]—these advocates argue that only individual consent should permit data use. Their solution for withdrawn consent is equally extreme: delete the entire AI model.
This creates a bizarre contradiction. When developers cannot easily contact data subjects and informing them is much harder—circumstances where providing notice and objection opportunities is genuinely difficult—these same advocates would have us believe that they accept that consent isn't required and legitimate interest suffices. But when circumstances are actually favorable for exercising information and objection rights, suddenly consent becomes mandatory. This logical inconsistency strains credulity. The argument appears to be part of a bait-and-switch strategy: first attack "first-party" data use to establish a consent requirement, then extend that precedent to "third-party" data.
Ex ante opt-out and effective exercise of a right to object
Meta has implemented unconditional opt-outs from AI training for both users and non-users. This goes beyond Article 21 GDPR's express requirements but follows a suggestion in the EDPB Opinion on AI models (para 102). Yet critics argue this still doesn't allow sufficiently effective exercise of objection rights under legitimate interest processing.
Article 21 GDPR states that "the controller shall no longer process the personal data unless the controller demonstrates compelling legitimate grounds for the processing which override the interests, rights and freedoms of the data subject or for the establishment, exercise or defence of legal claims." While this requirement is conditional, it does mandate that controllers "no longer process the personal data."
Allowing unconditional, ex ante objections—before processing even begins—without requiring deletion of already-trained models represents a reasonable, proportionate interpretation of GDPR. A more absolutist reading of the "no longer process" requirement would effectively ban the technology entirely. Notably, switching to a consent basis raises virtually identical issues regarding the consequences of withdrawal of consent.
Arguing that Meta's unconditional, ex ante, and well-publicized opt-out scheme (also publicized by DPAs) remains insufficient would set the compliance bar impossibly high. No one—not just Meta—could train AI models in the EU under such standards.
To be clear: Meta's offer to accept objections before even beginning training arguably exceeds GDPR requirements. Concluding otherwise would imply that most AI model training conducted so far violated GDPR—assuming a broad interpretation of which models contain personal data.
Historical data and historical reasonable expectations
The EDPB correctly noted in its Opinion on AI models that individuals' reasonable expectations matter when assessing legitimate interest. However, this argument can easily be taken too far. Most people understand technology, business, and government only superficially—and this is entirely normal. We are not omniscient. Requiring individuals to have detailed understanding of technological and organizational aspects before their data can be processed under legitimate interest would make that legal basis impossible to use. This would render the GDPR provision ineffective, violating a fundamental principle of legal interpretation: we should read law as if crafted by a rational lawgiver. While even the Court of Justice sometimes leans toward overly restrictive interpretations—often betraying its own lack of understanding of business and technological realities—this temptation should be resisted.[2] Any interpretation of "reasonable expectations" must remain proportionate and grounded in reality.
This brings us to the "historical data" issue that German DPAs are currently reportedly emphasizing. Their argument appears to be that Meta cannot use data uploaded before May 27, because users lacked reasonable expectations that their public data could be used for AI training before the deadline set in Meta's announcement.
This reasoning proves far too much. If we follow this logic, how could any AI developer claim that individuals reasonably expected their data to be used for training? Even limiting the argument to pre-ChatGPT data still goes too far. Consider an analogous question about search engines circa 1998. Would we really interpret GDPR to treat emerging technologies as inherently illegal simply because their details are novel? I argued against such an approach to the GDPR in my text on the importance of cautious optimism.
Let’s remember: we're discussing data that anyone on the internet can access—public content. Given the existence of search engines and widespread web scraping, can it seriously be argued that internet users lack reasonable expectations about how public web content might be reused?
Some DPAs already recognize this reality. The Baden-Württemberg DPA acknowledged that "given the generally known technological progress of recent years, it may be expected that data published on the Internet may be reused by third parties for other purposes."[3]
[1] What makes things more complicated is that the data Meta plans to use isn’t “pure” first-party data, because even data like Facebook/Instagram posts, comments and so on may include references to other individuals than the person who posted the data. So what is first-party data of a direct Meta user, may very well also be third-party data of someone who is not a Meta user (which is also why Meta facilitates opt-outs for non-users). The same applies, of course, to all other platforms like Reddit, YouTube, X/Twitter, and so on.
[2] See eg EDPB’s reference to Meta v Bundeskartellamt in footnote 73 of the Opinion on AI models and the accompanying text (para 93).
[3] The Baden-Württemberg DPA also made a qualification for data published by others. In principle there may be a difference in reasonable expectations regarding content you post about yourself and what someone else posts about you. However, we also need to have in mind the technical limits of anyone’s capacity to distinguish such situations at scale.