Contribute to mdp/linkedin-extension-fingerprinting development by creating an account on GitHub.
Looks like Firefox is immune.
This works by looking for web accessible resources that are provided by the extensions. For Chrome, these are are available in a webpage via the URL chrome-extension://[PACKAGE ID]/[PATH] https://developer.chrome.com/docs/extensions/reference/manif...
On Firefox, web accessible resources are available at "moz-extension://<extension-UUID>/myfile.png" <extension-UUID> is not your extension's ID. This ID is randomly generated for every browser instance. This prevents websites from fingerprinting a browser by examining the extensions it has installed. https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web...
And they said that using a browser with sub-5% market share would cause us to miss out on the latest and greatest in web technology!
The latest and greatest is not great for you, but for them.
The real friction in browser hopping isn't features — it's keeping your workflow portable. Bookmarks especially. Each browser has its own sync silo (Chrome → Google, Firefox → Mozilla, Safari → iCloud).
For multi-browser setups (Firefox for fingerprint resistance, Chrome for the sites that only work there), cross-browser bookmark sync is weirdly undersolved. Xbrowsersync, marksyncr, and a few others exist but most people don't know about them.
Anecdote: yesterday i exported my bookmarks into an html file and then asked for a script that will make a webpage out of them. with a search. and favicon download from domain. better than any bookmark bar imho.
This is a great idea, thanks. I built an IPv6 only webhost in Digital Ocean a while ago as a learning exercise and it’s been sitting idle. Making a personal portal sounds like a fun project.
I use floccus.org to sync between Chrome and Zen browser, works flawlessly! It wasn't that difficut to find, once I had the two browser setup (as in the end I refufsed to fully switch to Zen), just searched extensions, and setup this up in a minute. It also syncs to google drive and bunch of 3rd party bookmark apps.
Checkout marksyncr.com for bookmarks
chrome was made by ex-firefox devs, chrome is still not as good!
Anecdotally, I sometimes notice my computer fan spinning ferociously... it's almost always because I have left a firefox tab with linkedin open somewhere.
Are they bit coin mining or are they just incompetent?
Judging from GP's description of how extension IDs work in Firefox, I wouldn't be surprised if LinkedIn were trying to brute-force those UUIDs!
If the two are indeed "Linked", I see a case for users-first browsers to show system metrics right along the page.
I've noticed similar issues with the web version of MS Teams.
You can actually see what tabs are hogging CPU by pressing SHIFT-ESC to open the task manager (about:processes) inside Firefox.
Considering the app was a battery catastrophe I’m confident in the latter, even if your question could be read as rhetorical.
It’s probably some feature they sell to recruiters to grab your attention. :)
Maybe it's trying (and failing) to access your browser extensions? In a loop?
Yeah, but they don't know which specific one of Firefox's last dozen users I am.
Yes, is it now?
https://fingerprint.com/
https://coveryourtracks.eff.org/
https://abrahamjuliot.github.io/creepjs/
I don't have Firefox or another browser installed right now, but the last time I checked, every browser was detected, especially on the first link.Further, When I used Tor, a few sites, like Google, showed me Captchas for a while afterward, when using my _normal_ browser.
Further I heard that sites like PayPal are giving me black karma when I try to avoid Fingerprinting by using e.g. Tor.
I actually don't even care too much if they try to detect, that I am the X from last time.
The issue is them selling the data, or using it in unrelated locations, or trying to detect me as a person. And their programmers are not enforced and rewarded when they report such behavior to law agencies / the public. And the law is not punishing it.
This is probably a naive question, but...
Doesn't the idea of swapping extension specific IDs to your browser specific extension IDs mean that instead of your browser being identifiable, you become identifiable?
I mean, it goes from "Oh they have X, Y , and Z installed" to "Oh, it's jim bob, only he has that unique set of IDs for extensions"
It's not a naive question. This comment says it's not possible to do that: https://news.ycombinator.com/item?id=46905213
Oh, it's (re)randomised upon each restart, whew, thanks for the heads up
edit: er, I think that that also suggests that I need to restart firefox more often...
The webpage would have to scan the entire UUID space to create this fingerprint, which seems unlikely.
Just have a database of UUIDs. Seems pretty trivial to generate and sort as it's only 16 bytes each.
That's actually a bright idea! Have you ever thought about applying for VC funds?
Once you deliver that, you can also think about a database of natural numbers!
But that has no moat. Anyone can generate a database of natural numbers using SOTA models.
lol
Let's go a step further and just iterate through them on the client. I plan on having this phone well past the heat death of the universe, so this is guaranteed to finish on my hardware.
function* uuidIterator() {
const bytes = new Uint8Array(16);
while (true) {
yield formatUUID(bytes);
let carry = 1;
for (let i = 15; i >= 0 && carry; i--) {
const sum = bytes[i] + carry;
bytes[i] = sum & 0xff;
carry = sum > 0xff ? 1 : 0;
}
if (carry) return;
}
}
function formatUUID(b) {
const hex = [...b].map(x => x.toString(16).padStart(2, "0"));
return (
hex.slice(0, 4).join("") + "-" +
hex.slice(4, 6).join("") + "-" +
hex.slice(6, 8).join("") + "-" +
hex.slice(8, 10).join("") + "-" +
hex.slice(10, 16).join("")
);
}
This is free. Feel free to use it in production.What license is this? Company policy says we can't use Apache licensed stuff.
Free space heater
It exists
The write-up for it is surprisingly interesting! https://eieio.games/blog/writing-down-every-uuid/#toc:entrop...
someone took your joke and made it real
16 bytes is a lot. 4 bytes are within reach, we can scan all of them quickly, but even 8 bytes are already too much.
Kolmogorov said that computers do not help with naturally hard tasks; they raise a limit compared to what we can fo manually, but above that limit the task stays as hard is it was.
"Just" have a database, and then what? I can set up a database of all UUIDs very easily, but I don't think it's helpful.
Where are you storing them, a black hole?
All you need is basic compression, like storing the start and stop points of each block of UUIDs in the database.
Wait, you already linked to everyuuid. Do you think the server it's on uses black hole storage?
Fast writes, very slow reads.
I would store them as offsets within the digits of pi.
I don't think that's the case. I have the Earth View extension installed which shows a random google earth image.
I have this set as my homepage in Firefox as moz-extension://<extension-id>/index.html, and this has not changed since installing the extension. The page still works.
Doing it on restart makes the mitigation de facto useless. How often do you have 10, 20, 30d (or even longer) desktop uptime these days? And no one is regularly restarting their core applications when their desktop is still up.
Enjoy the fingerprinting.
I restart my browser basically every day.
yeah I close out everything as a mental block against anything I'm working on.
I think there's a subset of people that offload memory to their browsers and that's kinda scary given how these fingerprint things work.
There isn't enough energy in the solar system to count to 2^128. Now a uuid v4 number "only" has 2^122 bits of entropy. Regardless, you cannot realistically scan the uuid domain. It's not even a matter of Moore's law, it is a limitation of physics that will stand until computers are no longer made of matter.
You just need to open so many instances and tabs in each instance that it crashes every couple days
Umm, I restart my PC about once a week for security and driver updates.
If you don't, you have a lot more to worry about beyond fingerprinting...
Oh and I'm on LINUX (CachyOS) mind you.
Why does the browser even allow a website to query for installed extensions? I really don't see what the point of that would be.
The website should never be able to tell what's running in my browser, or on my computer in general. The browser renders the page, maybe runs a little Javascript, but there's no reason why it should be able to query anything about my environment.
I wonder how much stuff would break if the Chrome sandboxing was extended to preventing access to chrome-extension:// from Javascript loaded of random websites.
Maybe, but how long are the extension ids? And if they are random, how long to scan a trillion random alphanumeric ids, to find matches?
I presume the extension knows when it wants to access resources of its own. But random javascript, doesn't.
The extension IDs are UUIDs/GUIDs, so 128 bits of entropy. No site is going to be able to successfully scan that full range.
UUIDs are 128 bit long but generally have a bit less entropy than that as they are not just a random number. Still more than enough to make enumeration infeasible though.
And just in case the magnitude of that isn't obvious to people, that means there are 340,282,366,920,938,463,463,374,607,431,768,211,456 total possible UUIDs. Good luck.
ChatGPT told me it can be done though.
It won't disclose how, as it says it has had several users report it. And that it expects 50% of the bounty, and will use it for GPU upgrades.
yes thats how browser fingerprinting works and it is impossible to defeat because there are just too many variations in monitors (relevant for fonts), simple things like user agent, etc.
And browsers trying to mitigate fingerprinting are miserable to use (fixed window size with only Arial available, etc) and probably fingerprintable anyway.
Though LinkedIn in Firefox with uBlock Origin allowing just enough (not sure if that's relevant, just haven't run it without) does not last long without rocketing CPU & memory usage, fan spinning up, etc. (ime, anyway)
In my case LinkedIn consistently crashes Firefox the first time I navigate there on a given day. After I restart FF, all is fine.
Skimming the list, looks like most extensions are for scraping or automating LinkedIn usage. Not surprising as there's money to be made with LinkedIn data. Scraping was a problem when I worked there, the abuse teams built some reasonably sophisticated detection & prevention, and it was a constant battle.
In order to create the data source that LinkedIn's extension-fingerprinting relies on to work, someone (at LinkedIn*?) almost certainly violated the Chrome Web Store TOS—by (perversely*) scraping it.
* if LinkedIn didn't get it from an existing data source
Programmers don't appreciate the fact that you can just violate terms of service. You can just do it. It's okay. The police won't come after you. Usually.
I think the point is more "in order to prevent people from scraping their site, which is against their ToS, they scraped some other site, against its ToS".
Read "in order to have more money, I did things that caused other people to have less money"
When someone who sees the world through a lens of morality notices somebody operating without morality, it is startling.
And it deserves a call out! The benefits to being so cynical that you’re numb to it come with a lot of tradeoffs
Indeed. I read a lot of comments like these one you are responding on HN. It seems like there is a type of person who thinks that writing down what their rules are has some magical power.
“This isn’t what it was intended for”. Who cares?
A long long time ago in a galaxy far far away I would encounter warnings on pirating websites saying “If you are an FBI agent you are not allowed to continue on this site”. Imagine their utter disbelief and shock if they were to be arrested by an FBI agent that clicked past the warning anyway.
I agree is must be programmers as a type that like rules a lot and, they think, what a perfect world it could be if people would follow them.
I'd ask who you think you have me confused for or where you got that quote from, but I know how little it matters insofar as getting you to recognize whatever delusion led to your comment.
3000 extensions is few enough that a small team could download each extension manually over a few months. You don't need to scrape at all.
In the first place, no one said they needed to, only that they probably did.
Secondly, it's not "3000 extensions". They didn't somehow magically divine that the 2953 (+/-47) extensions we see here were the ones that they needed to download in order to be able to exploit the content-accessible resources described in their extension manifest. They looked at a much larger set, and it got filtered down to these 2953 that satisfied the necessary criteria.
Lol no, did you even read the list? You could pay someone to just search "LinkedIn" and "talent" and "recruiting" on the chrome web store and download each extension. It's probably harder to automate this than it is to do it manually. This is something you could develop in an afternoon and pay a small team of people to do for pennies on the dollar. Even ten thousand extensions is nothing. Spread that over years and this is trivial.
For someone choosing to be so obnoxiously condescending, you are excruciatingly stupid.
a problem for linkedin != "a problem". The real problem for people is the back room data brokering linkedin and others do.
from the code doesn't look like they do anything if they have a match, they just save all the results to a csv for fingerprinting?
"The code" here you're referring to (fetch_extension_names.js[1]) isn't and doesn't claim to be LinkedIn's fingerprinting code. It's a scraper that the researcher behind this repo wrote themselves in order to create the CSV of the data that they're publishing here.
LinkedIn's fingerprinting code, as the README explains, is found in fingerprint.js[2], which embeds a big JSON literal with the IDs of the extensions it probes for. (Sickeningly enough, this data starts about two-thirds of the way through the file* and isn't the culprit behind the bulk of its 2.15 MB size…)
* On line 34394; the one starting:
const r = [{
id: "aacbpggdjcblgnmgjgpkpddliddineni",
file: "sidebar.html"
1. <https://github.com/mdp/linkedin-extension-fingerprinting/blo...>2. <https://github.com/mdp/linkedin-extension-fingerprinting/blo...>
thanks, my fault for not reading the read me and just doing a quick read of the code.
By looking the list it seems like it is not really “sophisticated”. It is just list based on names (if there is a “email” in the name). Majority of extensions do not even ask for permissions to access linkedin.com.
I had the pleasure of scraping LinkedIn for a client. Great fun.
Wont someone think of poor little LinkedIn, a subsidiary of one of the largest data brokers in the world?
Why frame what you are trying to say like that? Businesses of all sizes deserve the ability to protect their businesses from abuse.
Do they respect my data? Why do they get to track me across sites when I clearly don't want them to but someone can't scrape their data when they don't want them to. Why should big companies get the pass but individuals not? They clearly consider internet traffic fair game and are invasive and abusive about it so it is not only fair to be invasive and abusive back, it is self defense at this point.
They don’t need to track your web browser when they’re owned by Microsoft, because they track every action at a lower level.
Weird, I don't use Windows as an OS but have linkedin. I'd believe the concern and disregard of Linkedin's concern is fair game.
What lower level? Microsoft owns internet?
The operating system. For example see the Windows 11 screenshot debacle/scandal.
Are you talking about Recall, which got such huge negative press they delayed it a year and added a clear opt-in? And never sent anything off the device itself?
If anyone has evidence of constant tracking and reporting then please share it.
Well, I won't touch Windows 11 with a ten feet pole and I don't know if what I am referring to is called "Recall". Not that much into the MS terminology. I also read about Windows 11 having all kinds of shenanigans to suddenly upload data into onedrive. Wouldn't be surprised, if that also included screenshots, or could "accidentally" lead to that happening. Screenshotting every few seconds is unacceptable even if it stays on the device per se. Once data exists, it has potential to leak, and we have not even started considering malware infection yet. Huge risk to people's privacy and safety online.
We can stop pretending all it alright at some point, can't we? We don't need more enshittification. Windows 11 is already a disaster, that no one wants. It already starts with its idiotic HW requirements, trying to make perfectly fine HW obsolete. $$$
There was a lot of pushback to Recall for a reasons, yes. But it's not what you described, and criticism works a lot better when it's accurate.
For suddenly putting your documents into onedrive, that's real but it started years ago in windows 10.
“They” is an in incredibly useful tool.
You do realize anti-scraping measures are one way of protecting your data too?
In this context, "protecting" means the interest of linkedin who aggressively sells the data. Users that give data to linkedin are not protecting their data either way.
Because you signed up to a set of terms and conditions saying LinkedIn can use your data in this way
What if I signed up before those ToS said they could use my data in this way?
Oh right, companies change ToS and EULA and "agreements" without notice, without due process, and without recourse.
I have no problem changing how I use "their" data in such situations.
> Oh right, companies change ToS and EULA and "agreements" without notice, without due process, and without recourse.
Companies change their terms of service all the time. They usually send emails about it.
I've responded to decline them a handful of times and asked for my account to be deleted. I chuckle slightly at the work it creates, but sometimes it has been easier to close an account that way.
No one likes paying taxes but they still do it. They could just not work and not have money and therefore not need to pay tax.
Except what you have to pay each year for the privilege of staying in "your" house.
I didn't want the web to turn into monolithic platforms. I abhor this status quo.
You cannot function without these enterprises, but that doesn't mean they're ideal or even ethical.
Microsoft wins because of network effects. It's impossible to compete. So I think it should be allowed to assail their monopoly here by any means. It's maximally fair for consumers and for free markets.
Ideally capitalism remains cutthroat and impossible to grow into undislodgeable titans.
Even more ideally, this would become a distributed protocol rather than a privately owned and guarded database.
That doesn't actually mean anything
I think they framed it this way because they don't consider scraping abuse (to be fair, neither do I, as long as it doesn't overload the site). Botting accounts for spam is clear abuse, however, so that's fair game.
No, I consider all data collection and scraping egregious. From that perspective, LinkedIn is hypocritical when Microsoft discloses every filesystem search I do locally to bing.
Are you not scraping a site with your eyeballs when you view a site?
By that logic I can charge you for looking at me.
I agree. Maybe that logic (which is your logic) isn't very good.
You’re just making yourself look dumb by drawing invalid comparisons and an inaccurate understanding of my logic.
When they scrape, it’s innovation. When you scrape, it’s a felony.
I'm sure there are issues with fake accounts for scraping, but the core issue is that LinkedIn considers the data valuable. LinkedIn wants to be able to sell the data, or access to it at least, and the scrapers undermine that.
They could stop all the scraping by providing a downloadable data bundle like Wikipedia.
thinking more about, I don't think its a terrible thing that they prevent scraping. Their listings are already suffering from being flooded with garbage applications and having to sift through tons of noise. allowing scraping would just amplify that and make the platform almost entirely worthless.
I "scrape" linkedin in a roundabout way for personal use, and really what Ive found is that i should just maybee not bother at all. I can't get through the noise even when im applying at places that heavily match my skillset, and just get automated rejection emails.
LLMs scrape Wikipedia all the time, or at least attempt to.
The data bundle doesn't help that at all.
That's true, the normal scraping would still happen, but it would eliminate this side business of trying to re-sell LinkedIn's data.
What is abuse? Is it anything that reduces my profit margin? Or is it anything that makes the world a worse place? The Flock CEO called Deflock terrorism, is he right?
this exchange -- obvious critical / perhaps insurrection speech versus a stable voice of business economics -- should be within the purview of an orderly and predictable legal environment. BUT things moved quickly in the phone battles. Some people say that the legal system has never caught up to the data brokering, and in fact the surveillance state grew by leaps and bounds.
So, reasonable people may disagree. This is a fine place to mention it .. what if individual profiles built at LinkedIn are being combined with illegitimate and even directly illegal surveillance data and sold daily? Everyone stand up and salute when LinkedIn walks in the room? there has to be legal and direct ways to deal with change, and enforcement to complete an orderly and predictable economic marketplace.
>BUT things moved quickly in the phone battles. Some people say that the legal system has never caught up to the data brokering, and in fact the surveillance state grew by leaps and bounds.
Partially by discrepancy in how responsive you can be or comprehensive you must be to win the next round of cat-and-mouse, and partially because a private/corporate surveillance apparatus is useful to a government that might otherwise be hampered by constitutional bounds.
We enjoy the fruits of an LLM or two from time to time, derived from hoards of ill gotten data. Linkedin has the resourses to attempt to block scraping, but even at the resource scale of LI I doubt the effort is effective.
I am not denying that scraping is useful. If it wasn't people wouldn't do it. But if the site rules say you aren't allowed to scrape, then I don't think people should be hostile towards the people enforcing the rules.
Well, they can try to enforce the rules; that's perfectly fair. At the same time, there are many methods of "trying" which I would not consider valid or acceptable ones. "Enforcing the rules" does not give a carte blanche right to snoop and do "whatever's necessary." Sony tried that with their CD rootkits and got multiple lawsuits.
the abuse>using the information they publish to the public
Yes, until it becomes abusive and malignly affects innocents.
[dead]
The big social media businesses deserve a Teddy Roosevelt character swooping in and busting their trusts, forcing them to play ball with others even if it destroys their moats. Boo hoo! Good riddance. World's tiniest violin.
This is a popular position across the aisle. Here's hoping the next guy can't be bought, or at least asks for more than a $400M tacky gold ballroom!
I mean, regardless of who they are or even if you don’t like what LinkedIn does themselves with the data people have given them, the random third parties with the extensions don’t additionally deserve to just grab all that data too, do they?
Surely they do! The data is in the public internets, aren't they?
They'd put Widevine or PlayReady DRM on the website if they could, I'm sure.
why can't they?
because they're only for video files?
I say the same thing about my start menu sending every action I perform to bing.
Eh. I worked at a company which made an extension which scraped LinkedIn. We provided a service to recruiters, who would start a hiring process by putting candidates into our system.
The recruiters all had LinkedIn paid accounts, and could access all of this data on the web. We made a browser extension so they wouldn’t need to do any manual data entry. Recruiters loved the extension because it saved them time.
I think it was a legitimate use. We were making LinkedIn more useful to some of their actual customers (recruiters) by adding a somewhat cursed api integration via a chrome extension. Forcing recruiters to copy and paste did’t help anyone. Our extension only grabbed content on the page the recruiter had open. It was purely read only and scoped by the user.
Doesn't sound like your operation was particularly questionable, but I can imagine there must be some of those 3,000 extensions where the data flow isn't just "DOM -> End User" but more of a "Dom -> Cloud Server -> ??? -> Profit!" with perhaps a little detour where the end user gets some value too as a hook to justify the extension's existence.
I started their but it felt like a dodgy way (as it could be seen to be illegal). We then just went aloffical and went through Google search API’s with LinkedIn as the target. Worked a treat and was cheaper than recruiter!!!
So when pay the highest scraper, it’s ok! Same data, different manner.
[dead]
[dead]
Chrome is the new IE6. Google set themselves up to be the next Microsoft and is "ad friendly" in all the creepy ways because that's what Google IS an ad company. All they've contributed to security is diminishing the capability of adblockers and letting malware to do bad things to you as consumers.
I fully agree that Chrome is spyware.
However, they do contribute to security: Chrome was first to implement Site Isolation, sandboxing too. These are essential security features for modern browsers. They are also not doing too bad with patching and security testing.
Chrome has become much worse than IE6. Microsoft was not in the business of tracking users and selling ads back then.
It certainly doesn’t feel like I have a worse UX, as a daily chrome user.
That's because you're not aware enough of being spied on at every single step you make. The issues are now more or less invisible (the tracking being more, and the lobotomized adblockers being less)
Unfortunately, yes.
He who controls the Ads, controls the Internet.
> Google set themselves up to be the next Microsoft
Google became a monopoly. All monopolies do this.
there's a step before that. Google is a pure capitalist enterprize>pure capitalism goes to monopoly>all monopolies do this.
Pure unregulated market, that doesn't guarantee free market assumptions does that. Capitalism doesn't need it. Without mechanisms that allow for the free entry/exit of competitors, fair and simultaneous access to information, preventing cartels/price fixing, .... a bunch of assumptions for perfect free market to happen, the market will tend towards monopolies due cumulative advantage (in econ. known as Matthew effect), since small advantages compound into dominance.
Brave feels like using Chrome. The transition was seemless even as a developer who uses the devtools. Obviously that's because it's almost the same code, but Brave is much more privacy friendly right?
Brave was found to be mostly different adware years ago I thought. It's a degoogle'd chrome essentially, but replaced with their adware instead of google's.
If you want a clean chrome, use ungoogled-chromium. Like IE6, some stuff just doesn't work in librewolf (less scummy firefox), so I use ungoogled-chromium when so, and I just don't do anything googleish on it that it latches onto google again.
Imagine being the nerd that is still using Chrome in the YOL 2026.