Crawlers can collect data you type even before you hit send
Many websites come with web forms, for example to login to an account, create a new account, leave a public comment, or contact the owner of the website. What most Internet users may not know is that the data that is entered into the sites can be collected by third-party trackers, even before the data is sent.
A research team from KU Leuven, Radboud University and the University of Lausanne analyzed data collection from third-party trackers on the top 100,000 global websites. The results have been published in the research article Leaky Forms: A Study of Email and Password Exfiltration Before Form Submission.
The leaked data included personal information such as the user’s email address, names, usernames, messages that were entered into forms, and also passwords on 52 occasions. Most users are unaware that third-party scripts, including trackers, can collect this type of information when they write to sites. Even when submitting content, most can expect it to be confidential and not leaked to third parties. Browsers do not reveal activity to the user; there is no indication that third party scripts collect data.
Results differ by location
Data collection differs based on the user’s location. The researchers evaluated the effect of user location by running the tests from locations in the European Union and the United States.
The number of email breaches was 60% higher for the United States location than for the European Union location. In numbers, emails were leaked at 1,844 sites connecting to the top 100,000 websites in the European Union and 2,950 sites connecting to the same set of sites in the United States.
The majority of sites, 94.4%, that filtered emails when connecting from the EU location also filtered emails when connecting from the US.
Leakage when using mobile web browsers was slightly less in both cases. 1,745 sites leaked email addresses when using a mobile browser from a location in the European Union, and 2,744 sites leaked email addresses from a location in the United States.
According to the research, more than 60% of leaks were identical on desktop and mobile versions.
Desktop and mobile websites where emails are filtered to tracking domains overlap substantially, but not completely.
One explanation for the difference is that the mobile and desktop crawls were not done at the same time, but rather one month apart. Some trackers were found to be active only on mobile or desktop sites.
The researchers suggest that stricter European privacy laws play a role in the difference. The GDPR, General Data Protection Regulation, applies when sites and services collect personal data. Organizations that process personal data are responsible for complying with the GDPR.
The researchers believe that third-party email exfiltration “may violate at least three GDPR requirements.”
First, if such exfiltration occurs surreptitiously, it violates the principle of transparency.
Second, if such exfiltration is used for purposes such as behavioral advertising, marketing, and online tracking, it also violates the principle of purpose limitation.
Third, if email exfiltration is used for behavioral advertising or online tracking, the GDPR generally requires prior consent from the website visitor.
Only 7,720 sites in the EU and 5,391 sites in the US showed consent pop-ups during connections; that’s 7.7% of all EU sites and 5.4% of all US sites.
The researchers found that the number of leaking sites decreased by 13% in the US and 0.05% in the EU by rejecting all data processing via consent popups. Most internet users can expect a 100% reduction when they don’t consent, but apparently this is not the case. The low decline in the EU is likely due to the low number of websites with detected cookie pop-ups and observed leaks.
Site categories, trackers and leaks
The researchers added sites to categories such as fashion/beauty, online shopping, gaming, public information, and pornography. Sites in all categories, with the exception of porn, leaked email addresses according to the researchers.
Fashion/beauty sites leaked data in 11.1% (EU) and 19.0% (US) of all cases, followed by Online Shopping at 9.4% (EU) and the 15.1% (US), General News at 6.6% (EU) and 10.2% (US), Software/Hardware at 4.9% (EU) and Business at 6.1% (USA).
Many sites embed scripts from third parties, usually for advertising purposes or website services. These scripts can track users, for example, to build profiles to increase advertising revenue.
The top sites that leaked email address information differed by location. The top 3 sites for visitors from the EU were USA Today, Trello and The Independent. For US visitors, it was Issuu, Business Insider, and USA Today.
Further analysis of the trackers revealed that a small number of organizations were responsible for the majority of form data breaches. The values were once again different depending on the location.
The five organizations that operate the largest number of crawlers on sites leaking form data were Taboola, Adobe, FullStory, Awin Inc., and Yandex in the European Union, and LiveRamp, Taboola, Bounce Exchange, Adobe, and Awin in the United States.
Taboola was found on 327 sites when visited from the EU, LiveRamp on 524 sites when visited from the US.
Protection against third parties leaking form data
Web browsers do not reveal to users whether third-party scripts collect data that users enter on the sites, even before submitting it. While most, with the notable exception of Google Chrome, include anti-tracking functionality, they appear to be inadequate in protecting user data from this form of tracking.
The researchers ran a small test with Firefox and Safari to find out if the default anti-tracking feature blocked data exfiltration in the sample. Both browsers failed to protect user data in the test.
Browsers with built-in ad-blocking features, like Brave or Vivaldi, and ad-blocking extensions, like uBlock Origin, offer better protection against data leakage. Mobile users may use browsers that support extensions or include ad blocking by default.
The researchers developed the LeakInspector browser extension. Designed to inform users of tracking attacks and to block requests that contain personal information, LeakInspector protects user data while it is active.
The source of the extension is available on GitHub. The developers were unable to submit the extension to the Chrome Web Store, as it requires access to features that are only available in Manifest 2. Google accepts Manifest 3 extensions only in the Chrome Web Store. A Firefox extension is being published on the Mozilla Add-ons Store for Firefox.
Now you: What is your view on this?