Hashed with a dash of salt
Hashing is not anonymization; hashed data is still personal data.
(if you don’t make it through please do answer the survey at the bottom)
Recently, someone messaged me on LinkedIn and asked if a solution suggested at a conference would work.
The solution proposed using a server-side setup and hashing the IP addresses in the URL outbound requests to avoid fines for sharing personal data.
I hope your alarm bells are going off.
If not, keep reading.
TL;DR:
Hashing is not anonymisation. Adding salt won't help, either. Personal data is either identifiable (so personal data) or anonymous (not personal data anymore). It's binary. Quasi-something does not work. Pseudo-personal data is not a legal concept. Hashing? Use it for the integrity and security of the data but still treat it as personal data.
What do I mean by saying that it's binary?
The GDRP defines personal data as:
-> any information relating to an identified or identifiable natural person.
AND
-> an identifiable natural person is one who can be identified, directly or indirectly.
If the personal data is anonymous, then it's not identifiable; hence, it is not personal data. If it's not anonymised then its identifiable, so personal data. It either is or is not. There is no in-between. There is no well-it's-sort-of-hidden-so -it's-ok. Or, they probably can't figure it out or reverse it, so it's okay.
Nope, none of that maybe, probably, should be ok type stuff is going to work.
It is. Or it is not.
Let's take some data. Personal data, that is. Email should work.
Now, let's assume I've come to your website (after clicking on a shopping ad), and you've got that bright orange pop-up asking me if I would want 25% off on my next purchase - all I need to do is give you my email address. I give you my email. All good and I get my 25% off. And I've given my consent along the way. I'm okay with receiving all the promotion emails you can manage to send.
You want to tell Google that I am now an engaged customer. You could add me to a list. Google wants all the info: my email, IP, what I clicked on, when I clicked on it, what ad I came from, etc.
Usually, this just happens - it's easy. You've added the tracking code to the site, and it sends it all to Google (providing you have consent, of course). But you've got this idea that it might not be ok. So you consider switching from a client-side setup to a server-side setup to better control your data and who gets what.
Good idea. But now you get to decide how to process the data before sending it to random third parties. Hashing is the obvious one.
Even better, now that you've hashed it, you can send it without consent - it's anonymous, after all.
STOP.
I often hear marketers describe this type of decision-making process, and it makes sense to some extent. However, the data is NOT ANONYMOUS. It's still personal data, and you must treat it as such.
Hashing and salted hashing are security measures - not anonymisation techniques. They are great for maintaining the integrity of the data. Or to securely store data such as passwords. They protect the data from brute force attacks.
Hashing essentially keeps things safe, not hidden (i.e., anonymous).
(skip this part if your intelligence is easily offended)
The way I explain it to my niece is:
The hash is like a magic blender. You throw your password into the blender, and it turns into juice - a scrambled mix of letters and numbers. And once it’s turned into juice, there’s no way to turn it back into the original password.
When you want to log in, the computer doesn’t check your password. Instead, it checks the juice you made and compares it to the juice it saved when you first set up your password. If they match, you’re in.
But here’s the problem: a hacker (not the good kind) could try to guess what your juice is made of by using a list of common passwords (like "1234" or "password"). If they figure out the recipe, they can get in, too. They use this thing called Rainbow Tables a lot. It's like a look-up table for all the popular passwords and their respective juice.
So, we add some glitter—called "salt." The glitter is different for every person. When you blend your password and glitter together, it makes juice that’s totally unique to you. Even if someone else has the same password, their juice will look completely different because their glitter is different. This makes it much harder for bad guys to guess what’s in your juice.
Still with me?
Great.
So then, why is the hashing not anonymous:
It’s tied to you: Even though your password is turned into “juice with glitter,” the same process happens every time you log in. So the system still knows, “Ah, this glittery juice belongs to Siobhan.”
Salt isn’t a disguise: Salt just makes it harder for bad guys to guess your password. But the system that saved your salt and hash combo can still connect it back to you.
Google, from above, will match the hashed email you sent them with its customer match list of hashed emails. It's pretty much guaranteed that yours is on there. So again, it may be juice at this point, but it's still juice that is very much your juice.
As the Opinion of the WP29 states (Opinion 5-2014):
"Pseudonymisation reduces the linkability of a dataset with the original identity of a data subject; as such, it is a useful security measure but not a method of anonymisation."
Hashed personal data is still personal as it is pseudonymised and not anonymised.
Processing hashed data (and the process of hashing it) still requires a legal basis (consent, legitimate interest, etc.).
Hashed data still requires you to have a purpose for processing it.
So what can you do about it all?
In short, treat all personal data as personal data. Ananoymisation is becoming harder and harder and it is extremely hard to get around likability, the ability to single it out, and the ability to deduce what its. Hit all three you are a good but it’s hard.
There is some case law that will help us understand this better soon but for now my best advice is to say - treat it as personal data unless you are sure it’s anonymised and that it is highly unlikely to be reversed.
And again, if it’s hashed it is pseudonymized - not anonymised - hence still personal data.
Some related reading if you want to dive deeper:
WP29 Opinion 05/2014 on Anonymisation Techniques
Case to watch: C-413/23 P SRB v EDPS on anonymisation (pending)
Estimating the success of re-identification in incomplete datasets using generative models
When ‘Anonymous’ Data Sometimes Isn’t
No, Hashing still doesn’t make you data anonymous
WTF or FTW?
WTF. I mean, most technical marketers I know fully understand what hashing is and its use cases. But still claim it's fully anonymised. I suppose there could be some confusion, but in the end, your DPO or legal team will know the difference by now. I'm not saying don't use it - I think it's a good option for security. But it's not your solution to process that data however you want - if you hashed personal data, it's still personal data.
x
Siobhan
P.S: If you are still here can you do me a favour?
I’m working on improving this newsletter by getting it more consistent and providing more value. I’ve also been updating landing pages etc and realised that the name means nothing to most of you and who signs up for something they don’t know what it is or who it’s for. (click on your answer below)