What is Device Fingerprinting?

Bhavya Swaroop
14 min readJul 12, 2021

When a users access your platform, they do it with two tools : A device with a web browser or mobile application, and an Internet connection which retrieves an IP address. This creates two data sources. They are present at signup, login, checkout, or even when browsing a page. We can extract useful information from these data points.

Combining information about a browser and device is called fingerprinting. It gives a clear picture of how the user is connecting to the service. It lets us recognize users and understand user behavior.

How Does Device Fingerprinting Work?

Device fingerprinting uses various methods of data collection to create as clear a picture as possible of a device. Passive methods, which absorb general information sent by the computer each time a user connects to the internet, provide less specific data and for this reason most of today’s methods of collecting fingerprint data are active methods. These methods use scripts and other tools to request specific information from a device.

One of the most effective active methods is canvas fingerprinting. Used on both computers and mobile, canvas fingerprinting involves using a script that interacts with the graphics element, or canvas, of an HTML5 web page. The script instructs the canvas to draw a hidden image on the screen, and then records the information represented in the image — such as screen resolution, fonts and background colors. A WebGL API completes a similar task, but with a 3D image, and collects information about the graphics card. JavaScript — a powerful programming language that runs much of the web — can also also can be a powerful tool for data collection, and JavaScript fingerprint trackers relay characteristics like browser type, time zone, installed plugins and language settings.

An ExampleThousands of computers, for example, could be running a Windows operating system. However, far fewer of them would be running a Windows operating system that is also on Japan Standard Time, using Firefox 64.0.2 with an Adobe Reader plugin and cookies turned off, a GTX 1050 graphics card and a special font installed.

A Closer look to the Properties or Features

Considering the data that is collected as features in browser fingerprint implementation. Here are few widely used features.

Display Properties

Listing display-specific and visual features, this category provides us with information that can reliably be used as browser fingerprint features, since these rarely change during regular use of a web browser. The value of these features can only be changed by changing the display or changing low-level display settings.

  1. Screen Size: Through JavaScript, the total width and height of the user’s screen, in pixels, can be
    accessed. This size is not representative of what is accessible to the web browser window, unless the
    is in full-screen mode. Accessed through — window.screen.width and window.screen.height.
  2. Available Size: window.screen.availWidth and window.screen.availHeight are similar to screen size. They return the dimensions of the portion of the screen available to the.
  1. Color-Depth : Color depth or bit depth, is the number of bits that represent the color of
    each pixel on the display that contains the browser window. For example, 1-bit color
  2. depth would mean a black and white screen, 8-bit would mean each pixel can be rep-
    resented with one of 256 colors. Most modern displays use 24-bit color representation,
  3. also known as True color.
    The color depth of a display is accessible through window.screen.colorDepth.

Browser features

  1. User-Agent String — List of tokens describing system and the browser that are being used to view the website.
    Information contained in the user-agent string includes:

• system information
– platform
– operating system
– CPU
• rendering engine compatibility
• browser information
– name
– version
– build number

Web browsers thus make their user-agent strings
easy to spoof, to enable users to request the entire content of a website when necessary.
This might decrease browser fingerprint accuracy if the user chooses to change their
user-agent to a commonly used one. By doing so poorly, however, users might create a
unique user-agent capable of identifying them precisely.

2. AdBlock — Users install ad-blocker to have un-interrupted experience. It is possible to detect the presence of an ad-blocking plugin by mimicking the behavior by creating a page element that acts as
a web advertisement inserting it into the DOM, and finally checking whether it is actually present in the DOM or not. With an ad-blocking plugin is installed, such page element will be filtered out upon
insertion. This allows us to determined whether the web browser has an ad-blocking
plugin installed or not. This method will not detect all ad-blocking plugins. However, on the same web
browser instance, its results will be consistent, as long as the user does not add or
remove plugins. Since consistency is a key property of a browser fingerprint.

3. Do not Track Header

It indicates the user’s tracking preference, with its value being either true or false. By
setting it to true, the user expresses their preference not to be tracked for purposes of
online advertisement and personalized content.

The DNT property value can be accessed using window.navigator.doNotTrack.

4. Cookies- A pieces of data stored in the browser that can
be accessed by both the server and the web browser. They usually hold user-specific
data, such as the login authentication token, or the items in their shopping cart. Accessing this information through navigator.cookieEnabled is also possible, albeit not reliable, as it can easily be spoofed, and may not work in certain site-specific cases.

5. Local Storage

Very similar to cookies. While both are used to store data on the client-side, there are a few key differences.

• Local storage can only be accessed by the client-side, while cookies can be read
by both server and the client-side.
• Unike cookies, data stored in local storage never expires.
• Cookies can store up to 4KB of data per domain, while local storage usually has
a limit of 5 MB.

6. Session Storage
Almost the same as Local storage, and uses the same API, with one important difference. Session storage data is only stored until the browser window or tab is closed.

7. Indexed Database

Transactional database that works entirely within a browser. While local storage and session storage are useful for storing small amounts of data, indexed database is ideal for large amounts of structured data. It supports more complicated operations, such as search, and is useful for complex web applications, such as web email clients. Instead of transferring considerable amounts of data over the internet
with each request, the data can be stored and operated on the client-side.

8. Binary Behaviors
Can be used to store and load persistent data on the client-side by setting special attributes on the DOM elements.
We detect the availability of binary behaviors by trying to create an element, and adding behaviors to it by calling document.createElement(“div”).addBehavior.

9. Plugins

Third-party libraries that can be used by the web browser for displaying animations, applets, or PDF files inside web pages.

Commonly used plugins include:
• Adobe PDF Reader — viewing PDF files
• Shockwave Flash — interactive applications and games
• Java Applet Plug-in — interactive components

Note : Browser plugins are not same as browser extensions.

Browser plugins cannot affect browser behavior, cannot add browser menus, do not automatically process the content of the web page, and have to be inserted into websites.

In contrast, browser extensions can affect browser behavior by filtering or altering website content, or adding new functionality to the browser.
Examples of commonly used browser extensions include:
• AdBlock Plus — advertisement content filtering
• Grammarly — checking the spelling and grammar of user input on websites
• Momentum — to-do list and welcome screen
• Google Translate — text translation

10. IE Plugins

Internet Explorer conceals its plugin information similarly to Firefox. It
does not return an iterable array. Instead, all the plugins have to be queried for.

System Properties

Browser fingerprint features that do not depend on JavaScript or the web browser, and are closely tied to the operating system and hardware properties instead.

  1. Time-zone

This information of a device by accessing the getTimezoneOffset() method on any Date object, i.e. by using new Date().getTimezoneOffset(). This method returns the time zone difference between UTC and the date set in the host device in minutes.

2. Date Format

The method toLocaleString(), executed on any Date object, returns a date string in a format that respects the browser’s locale (i.e. language preference). In older implementations, the format of the date string returned by this method depends entirely on its implementation. Both implementation and string format differences are helpful in browser identification.

3. Languages

Capturing single language or a list of languages ordered from most
to least preferred. Language attributes that is collected are:
• window.navigator.languages
• window.navigator.userLanguage
• window.navigator.browserLanguage
• window.navigator.systemLanguage

4. Tangent

JavaScript calculates the value for the tangent using Mathematical Markup Language (MathML), a low-level specification of mathematical content on the web. However, the specification of MathML is not yet complete. Many browsers thus do not support this functionality and, to calculate its value, a CSS fallback is used instead. This brings even more diversity into the values of this feature.

5. Fonts

Getting the full list of fonts installed on a system is not possible via JavaScript. However, there is a way to detect whether a given font is or is not installed on the system by only using JavaScript and CSS.

Hardware Properties A lot of information about the hardware used to run the browser can be accessed through JavaScript and these values are often hard to change. For example, users are
not able to manually replace a CPU inside their laptop. However, it is not impossible
to spoof the results of these tests.

  1. Platform
  2. Websites use this information to display appropriate content on certain devices. For
    example, a website might serve a simpler version of the UI for TV platforms, and a
    more advanced UI for desktop platforms. Examples of platform strings include:
    • MacIntel
    • Win32
    • Android
    • WebTV OS
  3. CPU class This property, however, is implemented in Internet Explorer only, and will only
    recognize these CPU classes:
    • 68K — Motorola processor
    • Alpha — DEC processor
    • PPC — Motorola processor
    • x86 — Intel processor
    • Other — Unknown processor type
  4. Hardware Concurrency This property returns the number of logical cores that are
    available on the system. Browsers may choose to report a lower number because the
    browser assumes it will occupy several logical cores by itself.
  5. Touch Compatibility — In order to be able to take advantage of touch and multi-touch gestures, it is useful for websites to know that the device supports them. Web applications handle such events by listening to event listeners, such as onClick or onTouchMove. The data collected includes:
  6. • navigator.maxTouchPoints — the maximum number of separate touch points
    that the touch screen is able to detect
    • msMaxTouchPoints — same as above, but for Internet Explorer and Edge browsers
    • “ontouchstart” in window — detecting whether “onTouchStart”, the most basic
    touch event, is available in the browser; returns value true or false
  7. WebGL- Web Graphic Library API is a JavaScript API used for rendering interactive
    2D and 3D content. This allows GPU-accelerated image processing to take place inside
    a web browser without the use of plugins. Graphics can be displayed in a <canvas>
    element specified by the HTML5 standard.

HTTP Headers They are pieces of additional information sent with each HTTP request between the server and client (where, in our situation, the client is a web browser), used to communicate the operating
parameters of HTTP transactions. This is the only category of browser fingerprint features that is collected on the server side.

Accept

Accept Encoding

Accept Language

User-Agent

Orthogonal features

All previous features were quite straightforward. Most of them simply required us to read and store a single piece of information. If we were using all but the orthogonal features, we would join them all into a single string and generate its hash. Comparing their hashes would allow us to know whether two browser fingerprints are the same or not. The orthogonal features are similar to this but work as a single package. We use a variety of behaviors of different parts of these methods to bring as much entropy into a single feature as possible.

  1. Canvas Printing It is browser fingerprinting by generating images using the same rules on different browsers, and comparing them. Rather than comparing images pixel by pixel, we compare the hash of their bitmaps, exported in base64 format. This allows us to determine whether they came from the same browser or not.

The different results of the canvas fingerprinting method are due to inconsistencies between different systems, browsers, and implementations. We reviewed existing research and included our own findings to get as much entropy from canvas fingerprints as possible. The following text summarizes the key causes of these differences.

Typeface inconsistency

Several typefaces (or fonts), such as Arial, Times, Helvetica, or Georgia, can be found
on almost every system because they are usually part of the operating system itself. On
different systems, however, these typefaces may differ slightly.

Figure 3.5: 13 ways to render 20px Arial [25]

Typeface Fallback

When a font is not available on a system, a fallback font is used instead. Attempting to render a text with a fictional font allows us to guarantee that the system’s fallback font will be applied to this text instead. Fallback fonts are OS and browser specific. Using this method therefore enables us to increase the entropy of the canvas fingerprint.

Sub pixel font smoothing

Displaying a font on a computer display means using a few squared pixels to represent a vector image visually. No standard definition of how this should be achieved exists. Companies like Apple, Microsoft, Adobe, and many others thus use different font rendering engines with different algorithms for this task.

Anti Aliasing

Anti-aliasing is a method of minimizing the distortion of shapes when representing vector graphics or high-resolution images as smaller images. It is similar to font smoothing, but can be applied to any graphic, not just fonts. Anti-aliasing in the HTML5 canvas element is controlled by the browser, and is turned on on some browsers and turned off on others. Implementations of anti-aliasing algorithms may differ slightly across browsers. This accounts for the differences on images drawn on different browsers and systems.

Canvas Winding

Winding and even-odd rules are algorithms for filling vector shapes. Since not all browsers support winding and even-odd fill rules in the HTML5 canvas, this is included in the browser fingerprint.

Emojis

Since emojis are transmitted in a non-graphical way, as Unicode characters, it is up to the browser or the operating system that is running the browser to decide what design an emoji will have.

2. Audio Fingerprint

Similar to canvas fingerprint but here an audio signal is generated. To generate it, AudioContext interface is used, which works by linking modules, called AudioNodes, together into a graph. These modules can generate, process, play, or store an audio signal. This is, in many ways, similar to real-life musical instruments where, for example, an electric guitar generates an audio signal, which is then processed by
effects like an echo or a phaser, and finally played through speakers.

A lot of powerful fingerprint data can be obtained from hardware sensors, such as the accelerometer, the GPS, the camera or the microphone. All of these sensors, however, require user consent prior to being accessed. Most web applications do not have a real use for such information, and a pop-up window requiring the user to allow access to their GPS data would be quite suspicious.

Implementation

Combination of JavaScript, PHP, and MySQL in order to collect fingerprints from our promotional website and from the WordPress website.In order to be able to collect fingerprints from the web app, our implementation had to respect its internal code standards, terms of service, and privacy policy.

Errors as a source of additional entropy

Browser ID

Pairing all the browser fingerprints collected from a browser. Doing this required storing this ID in the browser, on the client-side, and sending it to our API endpoint together with each fingerprint.Browser ID

Preventing Device Fingerprinting

While it is not possible to avoid being fingerprinted altogether, there are, nevertheless, ways to prevent getting identified. As explained above, browser fingerprint identification works by collecting a predefined set of features from a browser, and comparing these values to the values it had previously collected. If it finds a match or an algorithm identifies a fingerprint as belonging to a specific user, it will assume that
these two fingerprints came from the same browser.

Fingerprint with common values

Randomizing browser values

Blocking fingerprinting scripts

It is also possible to block fingerprinting scripts completely, using privacy extensions, such as Ghostery , Privacy Badger and others. These extensions use a list of unwanted scripts that will get blocked upon detection. If a script on a website is blocked by too few browsers, the latter will be easy to identify.

Response of browser developers

Firefox is a really good example of how browser developers can fight for the privacy of browser users. It has a built-in setting called “Resist fingerprinting”. Provided that it is enabled, the following is an example of what privacy measures will be applied:

• User is notified when a script is trying to extract bitmap from HTML5 canvas,
and the latter will not be able to do so unless the user agrees.
• Both navigator.plugins and navigator.mimeTypes are hidden. They cannot be accessed as iterable lists. Instead, they have to be queried for when a script wants to check if a certain plugin or mimeType is supported.
• Third-party cookies are disabled.
• Time precision is reduced.

GDPR in context of browser fingerprinting

Another way of fighting for online privacy is enforcing the use of rules and standards by law. One recent example of such approach is the release of the General Data Privacy Regulation adopted by the EU on 27th of April 2016. This regulation aims to give privacy back to the citizens by regulating how personal data can be processed and collected.

Personal data is any information relating to an individual, whether it relates to his or her private, professional or public life. It can be anything from a name, a home address, a photo, an email address, bank details, posts on social networking websites, medical information, or a computer’s IP address.

Since the IP address is basically a subset of the fingerprint, browser fingerprints can be considered personal data, and GDPR also applies to them. This means, in short, that websites have to ask for explicit user consent prior to collecting and storing their browser fingerprint.

Results and discussion

The only way we can determine whether two fingerprints come from the same browser is by storing a unique ID in the user’s browser. Whenever a browser without such an ID visited web app, it was generated and stored it in its cookies, as well as in local storage, in order to make it more robust. However, this browser ID is deleted every time a user decides to delete their cookies and local storage, and is hidden if a user uses the privacy browser mode (also known as “incognito mode”). This means that an unknown error will be present in our results. However, it also means that the real entropy of dataset can only be higher.

--

--

Bhavya Swaroop

Product Manager. Interested in Design Teardown| Product Deconstruct| Strategy | OTT & E-Commerce