I don't understand why TOR Browser thought that was a good idea either. It seems...

godelski · on Aug 17, 2023

Honest question, what's the upside of not doing this? You're already identified as a Tor user via the IP address. But wouldn't a unique, for example, canvas fingerprint just deanonymize you further? A shared fingerprint just makes you indistinguishable from others Tor users. Which you're already being classified as and can't escape that printing.

autoexec · on Aug 17, 2023

IP addresses won't necessarily ID a TOR user unless all exit nodes are known and being checked for. The TOR browser fingerprint stands out like a sore thumb though.

The shared fingerprint makes TOR users indistinguishable from other TOR users unless/until a single identifying factor isn't accounted for at which point all TOR users are identifiable on every connection, across time, different domains, etc. The sameness of TOR user's fingerprints + even just one consistent identifying feature means TOR users could be individually tracked.

A unique canvas fingerprint can be used to track you, but as long as it's differently unique on every request it can't be used to track you because the resulting fingerprint will always be different.

The "hide in the crowd" trick of trying to make a bunch of different people's browsers look identical isn't a bad thing, it's just extremely fragile. Still, it's better than nothing. Making all browsers randomize their fingerprint every time defeats tracking just as well as the "hide in the crowd" trick does (when that trick is 100% perfect) but also adds resilience and flexibility

derefr · on Aug 17, 2023

Tor exit nodes are self-identifying. There’s a DNS-based reverse-IP API you can use to ask if an IP address is a Tor exit node.

autoexec · on Aug 17, 2023

Good to know! Weird that stuff like https://www.dan.me.uk/tornodes and https://www.ipqualityscore.com/tor-ip-address-check are still around.

godelski · on Aug 17, 2023

> IP addresses won't necessarily ID a TOR user unless all exit nodes are known and being checked for.

Forgive my naivety, I don't really know Tor that well or even use it, but aren't nearly all exit nodes known and aren't they routinely checked for? It does not seem like a difficult thing to check for. I mean when I googled to check it seems like it is easy and Tor even provides a tool and publishes the 2188 addresses[0,1,2]. So... I'm quite confused about your assumption because a quick googling is leading me to believe that this is a rather known thing and doesn't require anywhere near state level action. I mean people routinely scan the entire internet and those posts don't even make it to HN anymore because they are so easy.

> The shared fingerprint makes TOR users indistinguishable from other TOR users unless/until a single identifying factor isn't accounted for at which point all TOR users are identifiable on every connection, across time, different domains, etc. The sameness of TOR user's fingerprints + even just one consistent identifying feature means TOR users could be individually tracked.

This is a great point, and I get it. But I'm not sure how this is different from normal situation. Doesn't this mean a misconfiguration of the Tor browser? One or two metrics may not be enough entropy to have confidence in an identity, though certainty you're right that it is of concern. I'm just trying to intuit the entropy difference. I'd wager it matters which metric is broken. But the question is when we start undoing Tor fingerprint overrides, at what point does the entropy decease before it starts increasing again? (as you're suggesting) Is that enough information to confidently identify a person? I honestly have no idea. This is a question since you're stating this is a cause for concern.

> A unique canvas fingerprint can be used to track you, but as long as it's differently unique on every request it can't be used to track you because the resulting fingerprint will always be different.

Is that true? I heard that Canvas Fingerprint randomizers actually decrease anonymity for the average user (i.e. done without other measures such as what Tor and Mullvad are doing). Due to noise being information itself, and is thus itself a fingerprint. You just call the function multiple times and look for differences or call different functions and look for similarities (i.e. the return const value). Maybe not as clear of an identifier as a normal canvas fingerprint, but it does constitute good information as most browsers aren't randomizing. I mean one piece of information alone isn't enough, that is why they collect several. You aren't being identified by only your canvas fingerprint.

> isn't a bad thing, it's just extremely fragile. Still, it's better than nothing.

I'm just asking what your alternative is. Btw, Tor and Mullvad __are__ randomizing[3]. So what is your complaint and what is your suggestion?

[0] https://metrics.torproject.org/exonerator.html

[1] https://2019.www.torproject.org/projects/tordnsel.html

[2] https://ipdata.co/blog/tor-detection/

[3] https://mullvad.net/en/browser/hard-facts

> privacy.resistFingerprinting.autoDeclineNoUserInputCanvasPrompts set to true

> privacy.resistFingerprinting.randomDataOnCanvasExtract set to true

autoexec · on Aug 17, 2023

> Doesn't this mean a misconfiguration of the Tor browser?

Not necessarily. It'd just mean that someone figured out a novel way to coax some bit of data from the browser that hadn't been considered or adequately accounted for.

> One or two metrics may not be enough entropy to have confidence in an identity, though certainty you're right that it is of concern.

It'd be less worrying if TOR users were more common, but since so few people use TOR at all, and fewer still will use it for any given site/service it means that what would be a low confidence metric for normal traffic might be all you need to track a TOR user.

> I heard that Canvas Fingerprint randomizers actually decrease anonymity for the average user...You aren't being identified by only your canvas fingerprint.

Canvas randomizations are likely to increase uniqueness which is different from increasing the ability to be tracked. If it's implemented in a way that makes it detectable and/or predictable it could increase the likelihood of being trackable depending on the situation. Canvas randomization is useful, just less useful for easily identifiable browsers which document that they do it since it can just be ignored in those cases.

> I'm just asking what your alternative is.

I think it'd be a robust system where many values were randomized in ways that were logical and consistent. For example, a user agent that implies a certain OS would expose a randomized set of values typical to that OS (fonts, drivers, add-ons, GPU, etc) and randomizations would be appropriate (even customizable) by context (session, window, tab, domain, request) so a website making multiple calls to a function would see consistent results while another (unrelated) website would see you as someone else entirely.

This way you'd always appear as a new unique visitor and if someone comes up with some clever trick to expose some new bit of data you'd still be indistinguishable from every other unique visitor with that bit of data. The vast majority of users on the internet are basically always showing up as "randomized" on first visit. It'd mean you could visit the same website 4 days in a row, but each time you'd just show up as someone new who stopped by, browsed around a bit maybe, and then never came back.

The devil's in the details though I guess.

swapfile · on Aug 18, 2023

Forgive my pedantry but it's not TOR[1] and never was, this is bordering on painful to read.

[1]: https://support.torproject.org/#about_why-is-it-called-tor

autoexec · on Aug 18, 2023

I appreciate the pedantry! It's a habit I should break from. Can we agree at least that it's not as bad as ToR?

Bu9818 · on Aug 17, 2023

As a side note, Tor Browser/Mullvad Browser does randomize canvas (and this changes every time you restart the browser or press New Identity). I don't remember what the reason for randomizing this specific feature is for, maybe it had better compatibility.

godelski · on Aug 17, 2023

Were you intending to respond to me or the parent to my comment. They are the one that said Tor doesn't randomize.

jorams · on Aug 17, 2023

It seems to me whether you're going to make fingerprintable properties be the same or randomize them, you're always going to need to explore every angle. Otherwise a bad actor can just ignore all the properties you randomize and focus on what's left.

autoexec · on Aug 17, 2023

Very few data points used in browser fingerprinting are 100% unique to an individual. Multiple data points are combined to form a hash that is unique to an individual. Most people have a unique fingerprint.

You can sort out your TOR browser traffic by user agent then focus on a single data point to track a small number of those users (probably to the individual level because TOR browser traffic is uncommon) but a website can't always know what's been/being randomized and can't separate out the randomized users from everyone else with a unique fingerprint.

Bu9818 · on Aug 17, 2023

The Whac-A-Mole game still exists when you randomize values, right?

autoexec · on Aug 17, 2023

To a certain extent. You don't have to make sure you're catching and accounting for 100% of every possible data point that might be collected by a browser if you're randomizing everything else though. Random value + consistent individual value will always produce a changed hash.

godelski · on Aug 17, 2023

If you randomize everything that sounds like a pretty identifiable signal tbh. Unless a very large number of people are also performing that randomization. A large number of people specifically in whichever discriminating group you belong to, which might be something out of your control.

Bu9818 · on Aug 17, 2023

Fair enough, it may be more reliable against general/naive approaches like commercial uses though a sufficiently skilled adversary may only consider the fingerprinting techniques they have missed (one specifically targeting TB users).