Leaked GDPR reform: some good ideas but what about enforcement?
I’ve been arguing that the EU’s data protection law—the GDPR—needs to be reformed, mostly due to the absolutist approach taken to its interpretation by the data protection authorities. Even a year ago this may have seemed far-fetched. However, meagre economic growth is now seriously threatening European long-term prosperity. This, combined with the practical joke of how the EU leads in “AI regulation” while others build, has created enough political momentum to do something.
We are finally starting to see what this may involve, with ideas like the additional pan-EU corporate regime (the 28th regime) or tweaks to the rushed and ill-conceived AI Regulation. Perhaps the biggest battle, however, is going to be waged over the excesses of EU privacy law. In other words, over legal interpretations that became an article of faith for a small but loud and influential minority, a costly burden for EU businesses, and a major annoyance for almost everyone.
So what’s on the agenda for GDPR reform? We now know a bit more, given that draft text has just leaked from the European Commission. Before I consider some of the good ideas included in the draft, I want to address the biggest omission, which is likely to undermine the potential beneficial effects: the problem of privacy absolutism and lack of proportionality in GDPR enforcement. I’m not going to repeat here the details of my diagnosis and my proposed solutions (see eg A serious target for improving EU regulation: GDPR enforcement and The EDPB’s AI Opinion Shows the Need for GDPR Enforcement Reform). I just want to reiterate that merely changing the GDPR’s text, even though welcome, will be insufficient when the same text will then be interpreted and applied by people vehemently opposed to the changes.
Let’s look at the substance of the leaked proposal.
Definition of “personal data”: this is crucial issue because this concept defines the scope of application of the GDPR. So far, data protection authorities have been pushing in only one direction: to make almost any data “personal” and thus to claim jurisdiction over virtually all data. The courts, including the EU Court of Justice (CJEU), have an uneven record on this, but at least recently the CJEU pushed back. And it looks like the European Commission wants to amend the GDPR text with a more sensible reading of “personal data.” In short, the idea is that whether something counts as personal data for you (and whether the GDPR applies to you) depends on whether you are capable of identifying the individuals to whom the data relates. I’d say that this is a very good idea.
One thing that the Commission should clarify is that examples of factors that could help identify an individual in the definition of “personal data” (“an identification number, location data, an online identifier…”) should not be considered as definite “identifiers” when they don’t actually allow pointing to a specific person. In other words, using a pseudonymous unique identifier should not mean that we’re dealing with personal data if we cannot say who is the real person behind that identifier.
Some privacy activists (e.g. NOYB) are complaining that this will prompt many organisations to separate the processing of data allowing the identification of individuals (e.g. let other entities specialised in GDPR compliance process handle that processing) and otherwise to only process pseudonymous data (without being able to identify individuals). But I don’t really see how this is a problem. Sounds to me like a big practical win for data minimisation and privacy overall.
Definition of sensitive data: “special categories of data” like health data, data on political opinions or religious beliefs are especially protected by the GDPR. Data protection authorities have been advancing an interpretation that in many practical contexts processing such data requires prior consent of the individuals (data subjects). If such a stringent regime applies only to really sensitive data, this may seem understandable (although serious questions should still be asked whether all current “special categories of data” should be protected in the same way). But privacy absolutists want the restrictions also to apply to any data that could allow someone to infer sensitive information, even when no such inference is being made or even likely to be made. This view would expand the strictest regime disproportionately. The Commission aims to address this by clarifying the definition of sensitive data, specifying that data should be treated as such when it “directly reveals” e.g. health status.
Sensitive data and AI training: restrictions on the processing of special category data are particularly challenging for AI developers and users. The potential requirement of asking for consent of all people whose data was included in AI training datasets (eg sourced from the open web) would amount to a de facto prohibition on AI development, especially frontier large language model development.
AI developers don’t want to process sensitive data—they don’t even want to process personal data. When this is technically feasible, they use automated means to exclude such data from training data. There are, however, at least two potential legal problems. First, someone could argue that even if they delete such data, they already processed it before deleting and in the act of deleting. Second, even when state of the art measures are used to exclude personal/sensitive data, it is currently impossible to fully exclude the possibility that some such information will likely be reproduced by a model. To entirely avoid any incidental processing of personal/sensitive data in AI training would require entirely infeasible manual curation of data sources. If legally obligatory, this would amount to a de facto prohibition on LLMs.
One EU data protection authority—the French CNIL—laudably acknowledged this, while providing pragmatic guidance. But it is not a secret that CNIL’s approach faces opposition among the more absolutist privacy officials and activists, so there is no chance that privacy enforcers will adopt it voluntarily across the EU.
The Commission’s proposal is to allow the processing of special category data for AI training and use with some conditions, which appear to align with the state of the art practices that AI labs are already implementing. In principle, this is also a welcome development.
One worry I have is whether the rules will be interpreted with sufficient flexibility so as not to amount to a prohibition of open-weights distribution of AI models. Note in this context the following provision for what must be done if it is discovered that some sensitive data is “in” an AI model: “If removal of those data requires disproportionate effort, the controller shall in any event effectively protect without undue delay such data from being used to produce outputs, from being disclosed or otherwise made available to third parties.” Depending on interpretations, this could be a serious problem for open-weights AI.
Cookie consent law: the EDPB’s recent guidelines on the “cookie consent” law (Article 5(3) of the ePrivacy Directive) is among the most glaring examples of the disproportionate and absolutist approach of EU privacy enforcers. On the EDPB’s interpretation “consent would be required for all Internet communication unless very limited exceptions apply (even more restrictive than under the GDPR)” (Consent for everything? EDPB guidelines on URL, pixel, IP tracking). The Commission proposes to address the obvious problems with the cookie consent law by partially moving the rules from the ePrivacy Directive to the GDPR, with some modifications.
One big problem with the Commission’s proposal is that they would retain the “cookie consent” ePrivacy rules for non-personal data (ie data not covered by the GDPR). I agree with other commentators that this makes no sense. Leaving those rules in place would not solve the cookie pop-up banner problem. More importantly, it would mean that non-personal data is protected in a more stringent and less pragmatic way than personal data.
To the extent there is a need for a legal rule on terminal integrity protection, it needs to be drawn much more precisely to target specific risks—the EDPB’s guidance shows that privacy enforcers cannot be trusted to proportionately interpret and apply broadly phrased rules. One reason to retain some explicit terminal integrity protections is that removing them entirely could further undermine the security protections for, eg, Android and iOS users which are currently being eroded by how the European Commission is applying the Digital Markets Act. Also, EU-law level terminal integrity protections are helpful in defending users against state-mandated incursions.
There are other ideas in the leaked draft, but I will end my comment here. It’s very hard to say how much of the current draft will make it into law given the inevitable opposition. And even if some of those changes do become law, they will likely be watered down significantly by privacy enforcers hostile to such changes. Hence my plea for addressing what I see as a bigger problem: the enforcement framework.

