From 917339d922caaa8a65f582a602c32bf7a9bca5e4 Mon Sep 17 00:00:00 2001 From: nielsbasjes Date: Sun, 12 Jan 2025 12:11:28 +0000 Subject: [PATCH] deploy: 546f5147dba7f8101b33c3ea20ef998b5d32a04b --- 404.html | 2 +- categories/index.html | 6 +++--- css/format-print.css | 4 ++-- css/print.css | 2 +- css/swagger.css | 4 ++-- css/theme.css | 2 +- developer/basedesign/index.html | 8 ++++---- developer/building/index.html | 8 ++++---- developer/index.html | 6 +++--- developer/makingnewrules/index.html | 8 ++++---- developer/reportingissues/index.html | 8 ++++---- developer/shadingdependencies/index.html | 8 ++++---- expect/fieldvalues/index.html | 8 ++++---- expect/index.html | 6 +++--- expect/limitations/index.html | 8 ++++---- expect/manipulations/index.html | 8 ++++---- expect/performance/index.html | 8 ++++---- expect/tryit/index.html | 8 ++++---- index.html | 8 ++++---- other/article/index.html | 6 +++--- other/index.html | 6 +++--- other/relatedprojects/index.html | 8 ++++---- search/index.html | 8 ++++---- tags/index.html | 6 +++--- udf/apache-beam-sql/index.html | 8 ++++---- udf/apache-beam/index.html | 8 ++++---- udf/apache-drill/index.html | 8 ++++---- udf/apache-flink-table/index.html | 8 ++++---- udf/apache-flink/index.html | 8 ++++---- udf/apache-hive/index.html | 8 ++++---- udf/apache-nifi/index.html | 22 +++++++++++----------- udf/apache-pig/index.html | 8 ++++---- udf/commandline/index.html | 8 ++++---- udf/elastic-logstash/index.html | 8 ++++---- udf/elastic-search/index.html | 8 ++++---- udf/index.html | 8 ++++---- udf/logparser/index.html | 8 ++++---- udf/snowflake/index.html | 18 +++++++++--------- udf/snowplow/index.html | 6 +++--- udf/trino/index.html | 8 ++++---- using/clienthints/index.html | 8 ++++---- using/index.html | 8 ++++---- using/kubernetes/index.html | 8 ++++---- using/license/index.html | 8 ++++---- using/memoryusage/index.html | 8 ++++---- using/webservlet/index.html | 8 ++++---- 46 files changed, 176 insertions(+), 176 deletions(-) diff --git a/404.html b/404.html index 59bf01173..4e25e2b5b 100644 --- a/404.html +++ b/404.html @@ -1,2 +1,2 @@ 404 Page not found | Yauaa - Yet Another UserAgent Analyzer -

44

Not found

Whoops. Looks like this page doesn't exist ¯\_(ツ)_/¯.

Go to homepage

\ No newline at end of file +

44

Not found

Whoops. Looks like this page doesn't exist ¯\_(ツ)_/¯.

Go to homepage

\ No newline at end of file diff --git a/categories/index.html b/categories/index.html index 3ab25a747..844a198c1 100644 --- a/categories/index.html +++ b/categories/index.html @@ -1,10 +1,10 @@ Categories | Yauaa - Yet Another UserAgent Analyzer -

Categories

\ No newline at end of file diff --git a/css/format-print.css b/css/format-print.css index dfdec0ec9..4745f80d2 100644 --- a/css/format-print.css +++ b/css/format-print.css @@ -1,5 +1,5 @@ -@import "theme-relearn-light.css?1736682538"; -@import "chroma-relearn-light.css?1736682538"; +@import "theme-relearn-light.css?1736683886"; +@import "chroma-relearn-light.css?1736683886"; #R-sidebar { display: none; diff --git a/css/print.css b/css/print.css index 7e9743798..ba17ea7d7 100644 --- a/css/print.css +++ b/css/print.css @@ -1 +1 @@ -@import "format-print.css?1736682538"; +@import "format-print.css?1736683886"; diff --git a/css/swagger.css b/css/swagger.css index ff90a534f..2d0ab2c48 100644 --- a/css/swagger.css +++ b/css/swagger.css @@ -1,7 +1,7 @@ /* Styles to make Swagger-UI fit into our theme */ -@import "fonts.css?1736682538"; -@import "variables.css?1736682538"; +@import "fonts.css?1736683886"; +@import "variables.css?1736683886"; body{ line-height: 1.574; diff --git a/css/theme.css b/css/theme.css index 7cb381630..545eba18e 100644 --- a/css/theme.css +++ b/css/theme.css @@ -1,4 +1,4 @@ -@import "variables.css?1736682538"; +@import "variables.css?1736683886"; @charset "UTF-8"; diff --git a/developer/basedesign/index.html b/developer/basedesign/index.html index bc5ad7b2c..ffbd2cea3 100644 --- a/developer/basedesign/index.html +++ b/developer/basedesign/index.html @@ -7,7 +7,7 @@ The reason this system (historically) works is because a lot of website builders do a very simple check to see if they can use a specific feature.">Base Design | Yauaa - Yet Another UserAgent Analyzer -

Base Design

Parsing Useragents

Parsing useragents is considered by many to be a ridiculously hard problem. +

Base Design

Parsing Useragents

Parsing useragents is considered by many to be a ridiculously hard problem. The main problems are:

  • Although there seems to be a specification, many do not follow it.
  • Useragents LIE that they are their competing predecessor with an extra flag.

The pattern the ’normal’ browser builders are following is that they all LIE about the ancestor they are trying to improve upon.

The reason this system (historically) works is because a lot of website builders do a very simple check to see if they can use a specific feature.

if (useragent.contains("Chrome")) {
     // Use the chrome feature we need.
 }

Some may improve on this an actually check the (major) version that follows.

A good example of this is the Edge browser:

Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.10136

It says it:

  • is Mozilla/5.0
  • uses AppleWebKit/537.36
  • for “compatibility” the AppleWebKit lie about being “KHTML” and that it is similar to “Gecko” are also copied
  • is Chrome 42
  • is Safari 537
  • is Edge 12

So any website looking for the word it triggers upon will find it and enable the right features.

In 2014 an RFC for HTTP was released (RFC 7231 section 5.5.3) which now explicitly states:

... implementations are encouraged not to use the product
@@ -21,12 +21,12 @@
 Such a matcher then tells this class is has found a match for a certain attribute with a certain confidence level (0-10000000).
 In the end the matcher that has found a match with the highest confidence for a value ‘wins’.

High level implementation overview

The main concept of this useragent parser is that we have two things:

  1. A Parser (ANTLR4) that converts the useragent into a nice tree through which we can walk along.
  2. A collection of matchers.
  • A matcher triggers if a set of patterns is present in the tree.
  • Each pattern is detected by a “matcher action” that triggers and can fill a single attribute. If a matcher triggers a set of attributes get set with a value and a confidence level
  • All results from all triggered matchers (and actions) are combined and for each individual attribute the ‘highest value’ wins.

As a performance optimization we walk along the parsed tree once and fire everything we find into a precomputed hashmap that -points to all the applicable matcher actions. As a consequence

  • the matching is relatively fast even though the number of matchers already runs into the few hundreds.
  • the startup is “slow”
  • the memory footprint is pretty big due to the number of matchers, the size of the hashmap and the cache of the parsed useragents.

A much more in depth explanation can be found in the documentation on how to create new rules

\ No newline at end of file + 
\ No newline at end of file diff --git a/developer/building/index.html b/developer/building/index.html index 8156756ed..8e4de7c11 100644 --- a/developer/building/index.html +++ b/developer/building/index.html @@ -3,7 +3,7 @@ A Linux class machine (can be a VM). Some of the build scripts rely in bash/sed/grep and related tools, so it will not build on a Windows machine. I’m unsure if it will build on a Mac. The normal build tools for a Java project JDK 8, 11, 17, 21 and 23 all need to be installed and defined in the ~/.m2/toolchains.xml All of these are needed to ensure the code works in all UDFs. Some of them only run on Java 8 (like Hive), some (like Flink) only work on Java 11 and some UDFs (like ElasticSearch and Trino) only work on Java 17. The ./start-docker.sh script launches a docker based build environment with all needed tools and configs.">Building from source | Yauaa - Yet Another UserAgent Analyzer -

Building from source

Building

Requirements:

  • A Linux class machine (can be a VM).
    • Some of the build scripts rely in bash/sed/grep and related tools, so it will not build on a Windows machine. I’m unsure if it will build on a Mac.
  • The normal build tools for a Java project
    • JDK 8, 11, 17, 21 and 23 all need to be installed and defined in the ~/.m2/toolchains.xml
      • All of these are needed to ensure the code works in all UDFs. +

        Building from source

        Building

        Requirements:

        • A Linux class machine (can be a VM).
          • Some of the build scripts rely in bash/sed/grep and related tools, so it will not build on a Windows machine. I’m unsure if it will build on a Mac.
        • The normal build tools for a Java project
          • JDK 8, 11, 17, 21 and 23 all need to be installed and defined in the ~/.m2/toolchains.xml
            • All of these are needed to ensure the code works in all UDFs. Some of them only run on Java 8 (like Hive), some (like Flink) only work on Java 11 and some UDFs (like ElasticSearch and Trino) only work on Java 17.

        The ./start-docker.sh script launches a docker based build environment with all needed tools and configs.

        and then simply do:

        mvn clean package

        Toolchains

        This is the content of my ~/.m2/toolchains.xml on my Ununtu 22.04 LTS machine.

        <?xml version="1.0" encoding="UTF8"?>
         <toolchains>
           <toolchain>
        @@ -42,12 +42,12 @@
               <jdkHome>/usr/lib/jvm/java-21-openjdk-amd64</jdkHome>
             </configuration>
           </toolchain>
        -</toolchains>
\ No newline at end of file + 
\ No newline at end of file diff --git a/developer/index.html b/developer/index.html index c8fe54f76..1d754ee22 100644 --- a/developer/index.html +++ b/developer/index.html @@ -3,12 +3,12 @@ Building from source Base Design Making new rules Shading dependencies Reporting issues">Development | Yauaa - Yet Another UserAgent Analyzer -
\ No newline at end of file diff --git a/developer/makingnewrules/index.html b/developer/makingnewrules/index.html index c372ca5ea..2774903dd 100644 --- a/developer/makingnewrules/index.html +++ b/developer/makingnewrules/index.html @@ -3,7 +3,7 @@ Base problem: They all lie When looking at useragents it is clear that almost all of them include the name of predecessors/competitors with which they are supposed to be compatible with.">Making new rules | Yauaa - Yet Another UserAgent Analyzer -

Making new rules

Detecting new useragent patterns

When you find a useragent for which one or more of the fields are wrong there is the need to change the patterns and rules +

Making new rules

Detecting new useragent patterns

When you find a useragent for which one or more of the fields are wrong there is the need to change the patterns and rules that are used by this system for classifying these attributes. In order to write rules this first described how the system works and what tools have been created to make writing new rules easier.

Base problem: They all lie

When looking at useragents it is clear that almost all of them include the name of predecessors/competitors with which they are supposed to be compatible with.

So in general there is a ranking in the patterns; some are more true than others.

Solution overview

The way this system solves all of this is by employing several steps:

  1. The user agent string is parsed into a tree using Antlr4.
  2. This tree is matched against a set of “Matchers” @@ -331,12 +331,12 @@ - 'Something: 1: agent.product.(1)name="AppleWebKit"^.version'

it may find after backtracking that the 5th product matches

agent.(5)product.(1)version

Yet when defined like this

variable:
 - 'productname: agent.product.(1)name'
 extract:
-- 'Something: 1: @productname="AppleWebKit"^.version'

it will stay at the first product and never find the 5th product at all.

\ No newline at end of file diff --git a/developer/reportingissues/index.html b/developer/reportingissues/index.html index 03b04b39c..aa4176098 100644 --- a/developer/reportingissues/index.html +++ b/developer/reportingissues/index.html @@ -19,7 +19,7 @@ However… These are not bugs I get quite a few bug reports and questions that Yauaa is not extracting the right version number from the provided User-Agent. Key thing to know There are so many manipulations and lies in the User-Agents that simply looking at the User-Agent will yield the wrong answer. Yauaa will try to give the best possible answer and some classes of lies are reported as such.">Reporting issues | Yauaa - Yet Another UserAgent Analyzer -

Reporting issues

Introduction

All software has bugs and things that it should do better.

Yauaa is no exception; there are bugs, inaccuracies and there is lots of room for improvement.

So if you find something please report it via the issue tracker.

However…

These are not bugs

I get quite a few bug reports and questions that Yauaa is not extracting the right version number from the provided User-Agent.

Key thing to know

There are so many manipulations and lies in the User-Agents that simply looking at the User-Agent will yield the wrong answer. +

Reporting issues

Introduction

All software has bugs and things that it should do better.

Yauaa is no exception; there are bugs, inaccuracies and there is lots of room for improvement.

So if you find something please report it via the issue tracker.

However…

These are not bugs

I get quite a few bug reports and questions that Yauaa is not extracting the right version number from the provided User-Agent.

Key thing to know

There are so many manipulations and lies in the User-Agents that simply looking at the User-Agent will yield the wrong answer. Yauaa will try to give the best possible answer and some classes of lies are reported as such.

So in addition to simply looking at the User-Agent it will also overrule these values if a documented manipulations is detected.

Most incorrect reports are about a Chromium/Chrome/Edge/… browser that shows ?? as the version of the Operating System but just looking at it you can clearly read a version. The Chromium team have clearly documented that they are removing information from the User-Agent header and replace parts with fixed values that are almost meaningless.

See: https://www.chromium.org/updates/ua-reduction/

Frozen Windows versions

Take for example this User-Agent:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.0.0 Safari/537.36
 

Most people expect to get

OperatingSystemNameVersion           : 'Windows 10.0'
 AgentNameVersion                     : 'Chrome 100.0.0.0'
@@ -30,12 +30,12 @@
 

but instead they get

OperatingSystemNameVersion           : 'Android ??'
 

Again: This is not a bug.

This example was recorded on an Android 11 system and there is nothing in the User-Agent to extract this anymore.

Your best workaround

At this point in time (mid 2022) the best way around much of these manipulations and lies is by asking for and recording the User-Agent Client Hints on your website.

If you ask for these User-Agent Client Hints you can get something like these extra request headers in addition to the User-Agent from the browser.

HeaderValue
Sec-Ch-Ua" Not A;Brand";v=“99”, “Chromium”;v=“100”, “Google Chrome”;v=“100”
Sec-Ch-Ua-Arch“x86”
Sec-Ch-Ua-Full-Version-List" Not A;Brand";v=“99.0.0.0”, “Chromium”;v=“100.0.4896.75”, “Google Chrome”;v=“100.0.4896.75”
Sec-Ch-Ua-Mobile?0
Sec-Ch-Ua-Model""
Sec-Ch-Ua-Platform“Windows”
Sec-Ch-Ua-Platform-Version“0.1.0”
Sec-Ch-Ua-Wow64?0

With all of this extra information Yauaa can now correctly report the above mentioned Windows 7 example as:

OperatingSystemNameVersion           : 'Windows 7'
 AgentNameVersion                     : 'Chrome 100.0.4896.75'
-
\ No newline at end of file + 
\ No newline at end of file diff --git a/developer/shadingdependencies/index.html b/developer/shadingdependencies/index.html index 4b36f67ee..26b960e91 100644 --- a/developer/shadingdependencies/index.html +++ b/developer/shadingdependencies/index.html @@ -7,7 +7,7 @@ The base structure of this project is we have a module with the functionality and a set of ‘UDFs’ that wrap this functionality so that it can be used in external processing frameworks (like Flink, Hive, etc.)">Shading dependencies | Yauaa - Yet Another UserAgent Analyzer -

Shading dependencies

Introduction

This is a summary of the reasons WHY I have done the shading in this project the way it is now.

If someone has suggestions/hint on how this can be done better I’m really curious what the ‘right’ way of doing this is.

The base structure of this project is we have a module with the functionality and a set of ‘UDFs’ +

Shading dependencies

Introduction

This is a summary of the reasons WHY I have done the shading in this project the way it is now.

If someone has suggestions/hint on how this can be done better I’m really curious what the ‘right’ way of doing this is.

The base structure of this project is we have a module with the functionality and a set of ‘UDFs’ that wrap this functionality so that it can be used in external processing frameworks (like Flink, Hive, etc.)

Base goal

This library and the UDFs should be easy to use for all downstream users that want to use this in their projects.

Problem 1: Problematic dependencies

Some of the dependencies (Antlr4, Spring and SnakeYaml) have proven to be problematic for downstream users who need different versions of these in the same application.

Solution 1: Shade and relocate

So for only these we include and relocate the used classes into the main jar.

In the pom.xml

<plugin>
   <groupId>org.apache.maven.plugins</groupId>
@@ -60,12 +60,12 @@
   494  2019-08-23 12:26 nl/basjes/shaded/org/springframework/core/io/ResourceLoader.class
   487  2019-02-13 05:32 org/springframework/core/io/ResourceLoader.class

I filed a bug report/ missing feature for this in the Maven shade plugin: https://issues.apache.org/jira/browse/MSHADE-326

For which I’ve put up a pull request: https://github.com/apache/maven-shade-plugin/pull/26

Solution 3: Manually exclude them

So we exclude these 4 shaded dependencies in all modules in this project so they are no longer included double in the final jars.

Problem 4: No such classfile …

Which gives rise to a new problem: When building/developing these modules the code will complain about missing dependencies. The dependencies have been shaded, relocated and excluded … which means that any code looking for the ‘original’ -class name will find it to be missing.

Solution 4: Include as ‘provided’

The final step I had to take was to include these 4 dependencies again as ‘provided’ in all modules in this project.

Additional notes

  • Immediately setting these dependencies to ‘provided’ causes them not to be included by the shade plugin.
  • Using the optional setting on the dependency caused “missing classes” errors in IntelliJ
  • The open issue at the maven/maven-shade-plugin end for problems 3 and 4: https://issues.apache.org/jira/browse/MSHADE-326
\ No newline at end of file + 
\ No newline at end of file diff --git a/expect/fieldvalues/index.html b/expect/fieldvalues/index.html index 153ff236e..de247d805 100644 --- a/expect/fieldvalues/index.html +++ b/expect/fieldvalues/index.html @@ -3,7 +3,7 @@ The Device: The hardware that was used. The Operating System: The base software that runs on the hardware The Layout Engine: The underlying core that converts the ‘HTML’ into a visual/interactive The Agent: The actual “Browser” that was used. Extra fields: In some cases we have additional fields to describe the agent. These fields are among others specific fields for the Facebook and Kobo apps, and fields to describe deliberate useragent manipulation situations (Anonymization, Hackers, etc.) Note that not all fields are always available. So if you look at a specific field you will in general find null values and “Unknown” in there as well.">Field values | Yauaa - Yet Another UserAgent Analyzer -

Field values

Output fields

The resulting output fields can be classified into several categories:

  • The Device: +

    Field values

    Output fields

    The resulting output fields can be classified into several categories:

    • The Device: The hardware that was used.
    • The Operating System: The base software that runs on the hardware
    • The Layout Engine: The underlying core that converts the ‘HTML’ into a visual/interactive
    • The Agent: @@ -15,12 +15,12 @@ My reasoning is that it is a system that someone uses to find security problems. With tools like this it is clear it is not a normal visitor that is interested in the content of the website. What is uncertain is the reason behind this: is it to fix problems or to abuse them? -So I classify all of them as hacking oriented tools. |

      DeviceClass

      ValueMeaning
      DesktopThe device is assessed as a Desktop/Laptop class device
      AnonymizedIn some cases the useragent has been altered by anonymization software
      UnknownWe really don’t know, these are usually useragents that look normal yet contain almost no information about the device
      MobileA device that is mobile yet we do not know if it is a eReader/Tablet/Phone or Watch
      TabletA mobile device with a rather large screen (common > 7")
      PhoneA mobile device with a small screen (common < 7")
      WatchA mobile device with a tiny screen (common < 2"). Normally these are an additional screen for a phone/tablet type device.
      Augmented RealityA mobile device with a AR capabilities (like Google Glass)
      Virtual RealityA mobile device with a VR capabilities
      eReaderSimilar to a Tablet yet in most cases with an eInk screen
      Set-top boxA connected device that allows interacting via a TV sized screen
      TVSimilar to Set-top box yet here this is built into the TV
      Home ApplianceA (usally large) home appliance (like a Fridge)
      Game Console‘Fixed’ game systems like the PlayStation and XBox
      Handheld Game Console‘Mobile’ game systems like the 3DS
      VoiceA voice driven device (i.e. ask a question and the page is read aloud). Like Alexa and Google Home.
      Smart DisplayA smart speaker kind of device with a tablet sized screen built in (like Google Nest and Amazon Echo Home)
      CarA Car based browser as found in for example the Tesla vehicles
      RobotRobots that visit the site
      Robot MobileRobots that visit the site indicating they want to be seen as a Mobile visitor
      Robot ImitatorRobots that visit the site pretending they are robots like google, but they are not. Note that in most cases they ARE Robots.
      CloudA cloud based application. Not a Robot or Hacker but a normal application that needs to connect. This includes for example Mastodon servers.
      HackerIn case scripting is detected in the useragent string, also fallback in really broken situations

      OperatingSystemClass

      ValueMeaning
      DesktopThe type of OS you would run on a Desktop or Laptop
      MobileThe type of OS you would run on a Phone, Tablet or Watch
      CloudLooks like a thing that runs in a cloud environment
      EmbeddedApparently embedded into something like a TV
      Game ConsoleA game console like PS4, Xbox
      HackerA hacker, so it can really be anything.
      AnonymizedIt was explicitly hidden
      UnknownWe don’t know

      LayoutEngineClass

      ValueMeaning
      BrowserA regular browser
      Desktop AppA desktop app (often a PWA)
      Mobile AppA mobile app which probably includes a regular webbrowser
      HackerA hacker, so it can really be anything.
      RobotA robot spidering the site
      CloudA cloud based application where it is unclear what kind of layout engine is really used
      SpecialSomething special we cannot fully classify
      UnknownWe don’t know

      AgentClass

      ValueMeaning
      BrowserA regular browser
      Browser WebviewA regular browser being used as part of a mobile app
      Desktop AppA desktop app (often a PWA)
      Mobile AppA mobile app
      RobotA robot that wants to be treated as a desktop device
      Robot MobileA robot that wants to be treated as a mobile device
      Cloud ApplicationSomething running in a cloud that is intended to be Human facing (so not a regular robot)
      ServerSomething running in a cloud that is intended to be Server-to-Server (so not a regular robot)
      Email ClientThis is an email application that did the request
      VoiceA voice driven ‘browser’ (i.e. ask a question and the page is read aloud). Like Alexa and Google Home.
      SpecialSomething special we cannot fully classify
      TestclientA website testing tool
      HackerA hacker, so it can really be anything.
      UnknownWe don’t know

      AgentSecurity

      ValueMeaning
      Weak securityIndicated to use deliberately weakened encryption (usually due to export restrictions or local laws).
      Strong securityIndicated to use strong (normal) encryption.
      UnknownIt was not specified (very common)
      HackerA hacker, so it can really be anything.
\ No newline at end of file + 
\ No newline at end of file diff --git a/expect/index.html b/expect/index.html index 702a3a534..2b39da77a 100644 --- a/expect/index.html +++ b/expect/index.html @@ -11,12 +11,12 @@ As an example the useragent of my phone (from a while ago): Mozilla/5.0 (Linux; Android 7.0; Nexus 6 Build/NBD90Z) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.124 Mobile Safari/537.36 is converted into this set of fields: Field name Value Device Class Phone Device Name Google Nexus 6 Device Brand Google Operating System Class Mobile Operating System Name Android Operating System Version 7.0 Operating System Name Version Android 7.0 Operating System Version Build NBD90Z Layout Engine Class Browser Layout Engine Name Blink Layout Engine Version 53.0 Layout Engine Version Major 53 Layout Engine Name Version Blink 53.0 Layout Engine Name Version Major Blink 53 Agent Class Browser Agent Name Chrome Agent Version 53.0.2785.124 Agent Version Major 53 Agent Name Version Chrome 53.0.2785.124 Agent Name Version Major Chrome 53">What to expect | Yauaa - Yet Another UserAgent Analyzer -

What to expect

This library extracts as many as possible fields from the provided User-Agent value and (if available) the provided Client Hints.

As an example the useragent of my phone (from a while ago):

Mozilla/5.0 (Linux; Android 7.0; Nexus 6 Build/NBD90Z) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.124 Mobile Safari/537.36

is converted into this set of fields:

Field nameValue
Device ClassPhone
Device NameGoogle Nexus 6
Device BrandGoogle
Operating System ClassMobile
Operating System NameAndroid
Operating System Version7.0
Operating System Name VersionAndroid 7.0
Operating System Version BuildNBD90Z
Layout Engine ClassBrowser
Layout Engine NameBlink
Layout Engine Version53.0
Layout Engine Version Major53
Layout Engine Name VersionBlink 53.0
Layout Engine Name Version MajorBlink 53
Agent ClassBrowser
Agent NameChrome
Agent Version53.0.2785.124
Agent Version Major53
Agent Name VersionChrome 53.0.2785.124
Agent Name Version MajorChrome 53
\ No newline at end of file diff --git a/expect/limitations/index.html b/expect/limitations/index.html index 88499af5f..c835f633f 100644 --- a/expect/limitations/index.html +++ b/expect/limitations/index.html @@ -1,5 +1,5 @@ Limitations | Yauaa - Yet Another UserAgent Analyzer -

Limitations

It only analyzes the provided string

This system is based on analyzing the useragent string and looking for the patterns in the useragent string as they have been defined by parties like Google, Microsoft, Samsung and many others. These have been augmented with observations how developers apparently do things. There are really no (ok, very limited) lookup tables that define if a certain device name is a Phone or a Tablet. This makes this system very maintainable because there is no need to have a list of all possible devices.

As a consequence if a useragent does not follow these patterns the analysis will yield the ‘wrong’ answer. +

Limitations

It only analyzes the provided string

This system is based on analyzing the useragent string and looking for the patterns in the useragent string as they have been defined by parties like Google, Microsoft, Samsung and many others. These have been augmented with observations how developers apparently do things. There are really no (ok, very limited) lookup tables that define if a certain device name is a Phone or a Tablet. This makes this system very maintainable because there is no need to have a list of all possible devices.

As a consequence if a useragent does not follow these patterns the analysis will yield the ‘wrong’ answer. Take for example these two (both were found exactly as shown here in the logs of a live website):

Mozilla/5.0 (Linux; Android 5.1; SAMSUNG-T805s Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.94 Mobile Safari/537.36
 Mozilla/5.0 (Linux; Android 4.4.2; SAMSUNG-T805S Build/KOT49H) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.89 Safari/537.36
 

The difference between “Mobile Safari” and “Safari” has been defined for Google Chrome as the difference between “Phone” and “Tablet” (see the Chrome documentation on this).

And as you can see in this example: we sometimes get it wrong (The Samsung T805s is in reality a Tablet). @@ -9,12 +9,12 @@

As you can see this browser assumes it is only installed on Samsung devices so they ‘force’ the word Samsung in there. In this case you will see this being reported as a “Samsung Nexus 6”, which is obviously wrong.

Note that this specific case with the SamsungBrowser has now been fixed with a set of additional rules so this one is now correctly reported as a Google Nexus 6.

Device Name and Device Brand

The detection of the brand and name of the device are the most brittle and unreliable part of the output.

There are a few reasons for this:

  1. There are a LOT of different devices from a LOT of vendors. To give you an impression; In november 2018 I did a count on the index page of DeviceAtlas and found over 4100 brands and a total of over 55000 different devices.
  2. In most cases the useragent string ONLY include the model of the device and NOT the brand. So you need to be able to determine JUST from the model what brand it was.
  3. This system tries to limit the number of lookup tables and rely on patterns as much as possible. As a consequence I really do not want to have a complete list of all devices in here. So what are the patterns in the mapping from model to brand?
  4. I did an analysis on some of these brands and found that Acer, Lenovo, LG and QMobile all have a device called ‘A200’. -So if the useragent only contains ‘A200’ there is no way to determine what device it really was.

So as a consequence I have chosen to limit this detection to

  1. Brands that are included as the first word in the appropriate field.
  2. Special cases (like robots)
  3. The “most used brands” as good as possible.

WARNING: The detection of DeviceBrand will therefore never be complete and accurate.

\ No newline at end of file + 
\ No newline at end of file diff --git a/expect/manipulations/index.html b/expect/manipulations/index.html index 703a62323..b2ca05640 100644 --- a/expect/manipulations/index.html +++ b/expect/manipulations/index.html @@ -3,7 +3,7 @@ Reducing/Freezing the UserAgent So a few years ago in several browsers projects started to reduce the level of information in the UserAgent. As a direct consequence the analysis results will become less usefull over time as browsers will start taking away more and more information.">Manipulations | Yauaa - Yet Another UserAgent Analyzer -

Manipulations

Privacy

Useragents have had a lot of information about the device and the browser in it. This has been so detailled in the past that there were many situations where the useragents could be used for tracking visitors very reliably.

Reducing/Freezing the UserAgent

So a few years ago in several browsers projects started to reduce the level of information in the UserAgent. As a direct consequence the analysis results will become less usefull over time as browsers will start taking away more and more information.

The (Q1 2022) DRAFT proposal of the User-Agent Client Hints is intended to contain the information needed in a cleaner way.

At this point in time (Q1 2022):

  • This is not a standard yet
  • Not all browsers support this
    • Chromium based browsers support this.
    • Firefox 97 does not.

Compatibility

Also (as I have written a long time ago in this article) the UserAgents set values to show to websites what they are compatible with.

In 2021 several browsers stopped updating the version number of the underlying operating system because of compatibility problems with poorly written websites.

Also several browsers are reaching version 100 which makes the version 3 digits; which leads to parsing problems if a website expects a 2 digit version.

This has led to some testing flags for website builders like +

Manipulations

Privacy

Useragents have had a lot of information about the device and the browser in it. This has been so detailled in the past that there were many situations where the useragents could be used for tracking visitors very reliably.

Reducing/Freezing the UserAgent

So a few years ago in several browsers projects started to reduce the level of information in the UserAgent. As a direct consequence the analysis results will become less usefull over time as browsers will start taking away more and more information.

The (Q1 2022) DRAFT proposal of the User-Agent Client Hints is intended to contain the information needed in a cleaner way.

At this point in time (Q1 2022):

  • This is not a standard yet
  • Not all browsers support this
    • Chromium based browsers support this.
    • Firefox 97 does not.

Compatibility

Also (as I have written a long time ago in this article) the UserAgents set values to show to websites what they are compatible with.

In 2021 several browsers stopped updating the version number of the underlying operating system because of compatibility problems with poorly written websites.

Also several browsers are reaching version 100 which makes the version 3 digits; which leads to parsing problems if a website expects a 2 digit version.

This has led to some testing flags for website builders like chrome://flags/#force-major-version-to-100

Force major version to 100 in User-Agent

Force the Chrome major version in the User-Agent string to 100, which allows testing the 3-digit major version number before the actual M100 release. This flag is only available from M96-M99. – Mac, Windows, Linux, Chrome OS, Android, Fuchsia

and also chrome://flags/#force-minor-version-to-100

Force the minor version to 100 in the User-Agent string

Force the Chrome minor version in the User-Agent string to 100, which allows testing a 3-digit minor version number. Currently, the minor version is always reported as 0, but due to potential breakage with the upcoming major version 100, this flag allows us to test whether setting the major version in the minor version part of the User-Agent string would be an acceptable alternative. If force-major-version-to-100 is set, then this flag has no effect. See crbug.com/1278459 for details. – Mac, Windows, Linux, Chrome OS, Android, Fuchsia

Hold on: They are testing if they can set the major version in the minor version position … to work around broken websites?

In Chrome/Edge 99 has actually implemented this flag:

chrome://flags/#force-major-version-to-minor

Put major version in minor version position in User-Agent

Lock the Microsoft Edge major version in the User-Agent string to 99, and force the major version number to the minor version position. This flag is a backup plan for unexpected site-compatibility breakage with a three digit major version. – Mac, Windows, Linux, Chrome OS, Android, Fuchsia

So you get the effect:

  • Chrome/99.0.1150.25 = Chrome 99
  • Chrome/99.123.1150.25 = Chrome 123

Don’t worry: Yauaa actually detects and handles this and reports the correct version.

There is no MacOS 11

Both Chromium (=Chrome, Edge, …) and Firefox have frozen the version of MacOS X to 10.15(.7) and as a consequence MacOS 11 … simply does not appear in any of their UserAgents. As a consequence these specific versions are reported as unknown version (??).

Back ground information:

  • Always 10_15_7 since Chrome 90.
  • Always 10.15 since Firefox 87.

There is no Windows 11 in Chromium/Chrome/Edge/…

Microsoft has documented here https://docs.microsoft.com/en-us/microsoft-edge/web-platform/how-to-detect-win11

There are two approaches for sites to access user agent information:

  • User-Agent strings (legacy).
  • User-Agent Client Hints (recommended).

Websites can differentiate between users on Windows 11 and Windows 10 by using User-Agent Client Hints (UA-CH).`

and

User-Agent strings won’t be updated to differentiate between Windows 11 and Windows 10. We don’t recommend using User-Agent strings to retrieve user agent data. Browsers that don’t support User-Agent Client Hints won’t be able to differentiate between Windows 11 and Windows 10.

And thus the Chrome UserAgent on Windows 11 looks like this:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36

Which clealy says: Windows 10.0

There is no Windows 11 in Firefox

Firefox has an explicit issue called Cap the Windows version in the User-Agent to 10.0..

Their code comment clearly describes why

Cap the reported Windows version to 10.0. This way, Microsoft doesn’t get to change Web compat-sensitive values without our veto. The @@ -11,12 +11,12 @@ for longer and longer. If the system-reported version ever changes, we’ll be able to take our time to evaluate the Web compat impact instead of having to scamble to react like happened with macOS -changing from 10.x to 11.x.

\ No newline at end of file + 
\ No newline at end of file diff --git a/expect/performance/index.html b/expect/performance/index.html index 8b8e41915..90b91629d 100644 --- a/expect/performance/index.html +++ b/expect/performance/index.html @@ -3,17 +3,17 @@ Please note that the current system take approx 220MiB of RAM just for the engine (without any caching!!).">Performance | Yauaa - Yet Another UserAgent Analyzer -

Performance

On my systems I see a speed ranging from 500 to 4000 useragents per second (depending on the length and ambiguities in the useragent). +

Performance

On my systems I see a speed ranging from 500 to 4000 useragents per second (depending on the length and ambiguities in the useragent). On average the speed is around 2000 per second or ~0.5ms each. A LRU cache is in place that does over 1M per second if they are in the cache.

Please note that the current system take approx 220MiB of RAM just for the engine (without any caching!!).

In the canonical usecase of analysing clickstream data you will see a <1ms hit per visitor (or better: per new non-cached useragent) and for all the other clicks the values are retrieved from this cache at a speed of < 1 microsecond (i.e. close to 0).

The graph below gives you some insight of how the performance of Yauaa has progressed over time.

You can clearly see the increase in the time needed when adding a lot more rules. Also the periodic drops in time needed are clearly visible when a performance improvement was found.

Between version 5.5 and 5.6 a lot of extra rules to detect more brands of mobile devices on Android (at one point during development the needed time to reached ~ 3ms). -Followed by a few steps in a rewrite of that part resulting in effectively the fastest versions to date.

Output from the benchmark ( using this code ) on an Intel(R) Core(TM) Ultra 5 125H @ 3GHz from version 4.0 onwards:

\ No newline at end of file + 
\ No newline at end of file diff --git a/expect/tryit/index.html b/expect/tryit/index.html index 6d4b3ba76..205936cb5 100644 --- a/expect/tryit/index.html +++ b/expect/tryit/index.html @@ -7,13 +7,13 @@ This runs on a very slow and rate limited machine. If you really like this then run it on your local systems. It’s much faster that way. A Kubernetes ready Docker image is provided. See this page about the WebServlet for more information.">Try it! | Yauaa - Yet Another UserAgent Analyzer -

Try it!

You can try it online with your own browser here: https://try.yauaa.basjes.nl/.

NOTES

  1. This runs on a very slow and rate limited machine.
  2. If you really like this then run it on your local systems. It’s much faster that way. -A Kubernetes ready Docker image is provided. See this page about the WebServlet for more information.
\ No newline at end of file diff --git a/index.html b/index.html index 9394e290f..e7c99fc25 100644 --- a/index.html +++ b/index.html @@ -1,5 +1,5 @@ Yauaa: Yet Another UserAgent Analyzer | Yauaa - Yet Another UserAgent Analyzer -

Yauaa: Yet Another UserAgent Analyzer

This is a java library that tries to parse and analyze the useragent string (and when available the User-Agent Client Hints) and extract as many relevant attributes as possible.

Works with Java, Scala, Kotlin and provides ready for use UDFs for several processing systems.

The full documentation can be found here https://yauaa.basjes.nl

If you just want to give it a quick try then you can do that with your own browser here: https://try.yauaa.basjes.nl/ (runs on a very slow and rate limited machine).


HIGH Profile release notes:

These are only the highlights for the last few releases, the full changelog can be found here.

NEXT RELEASE

  • New/improved detections:
    • Updated the list of Amazon devices (2023, 2024 models)
    • Fix phones with real browser name at the end (like AAB does)
    • Presearch browser, Citrix WorxWeb, Klarna, Budbee, MAGAPPX, Yandex, Albert Heijn App, Ghostery, Dalvik, Nu.nl (iOS)
    • ZTE Nubia
    • Pico 3 and Pico 4 VR Headset
    • Very old Samsung Browser is a webview
    • Whitelabel “Safe” Browser apps (iOS): Ziggo, KPN, VandenBorre, F-Secure
    • Handle “Windows 11.0” (the ‘.0’ is very rare)
    • CPU tag arm_64
    • Handle URLs better with Robots/Hackers/Spammers
    • UltraBlock useragent randomizer
    • Improve DuckDuckGo
    • Mapping ClientHint value for AgentName (i.e. ClientHint “YaBrowser” –> “Yandex Browser”)
    • Handle edgecases:
      • ‘OpenBSD != Linux amd64’
      • ‘Linux x86_64:108.0’
    • Robots (Generic, Fediverse and AI Related):
      • AmazonBot, Bravebot, PetalBot
      • FediIndex, vmcrawl, Nonsensebot, Caveman-hunter, …
      • OpenAI/ChatGPT, Claudebot (Anthropic), PerplexityBot
    • Codeberg.org is a code hosting site (not a brand for a bot)
    • Updated the ISO 639-3 language code table

Version v7.29.0

  • Build
    • Require JDK 23 installed for Trino support.
    • Leverage new toolchains plugin: no longer needs toolchains.xml.
  • New/improved detections:
    • Do tag lookups for Webviews (Yandex showed wrong)
    • SamsungBrowser with a newer “reduced” version on a Phone doing DEX.
    • Snorlax useragent with BASE64 encoded part
    • Devices from OX Tab, Xiaomi
    • Partially handle broken: Safari “Mobile” on Mac OS X
    • Gitlab CI Runner
    • HUAWEI Quick App Center (+ false positive of it being a Hacker)
    • TV Bro
  • Analyzer:
    • Renamed Sec-CH-UA-Form-Factor to Sec-CH-UA-Form-Factors (no rules yet)

Donations

If this project has business value for you then don’t hesitate to support me with a small donation either via Github Sponsors or Paypal.


License

Yet Another UserAgent Analyzer
+

Yauaa: Yet Another UserAgent Analyzer

This is a java library that tries to parse and analyze the useragent string (and when available the User-Agent Client Hints) and extract as many relevant attributes as possible.

Works with Java, Scala, Kotlin and provides ready for use UDFs for several processing systems.

The full documentation can be found here https://yauaa.basjes.nl

If you just want to give it a quick try then you can do that with your own browser here: https://try.yauaa.basjes.nl/ (runs on a very slow and rate limited machine).


HIGH Profile release notes:

These are only the highlights for the last few releases, the full changelog can be found here.

NEXT RELEASE

  • New/improved detections:
    • Updated the list of Amazon devices (2023, 2024 models)
    • Fix phones with real browser name at the end (like AAB does)
    • Presearch browser, Citrix WorxWeb, Klarna, Budbee, MAGAPPX, Yandex, Albert Heijn App, Ghostery, Dalvik, Nu.nl (iOS)
    • ZTE Nubia
    • Pico 3 and Pico 4 VR Headset
    • Very old Samsung Browser is a webview
    • Whitelabel “Safe” Browser apps (iOS): Ziggo, KPN, VandenBorre, F-Secure
    • Handle “Windows 11.0” (the ‘.0’ is very rare)
    • CPU tag arm_64
    • Handle URLs better with Robots/Hackers/Spammers
    • UltraBlock useragent randomizer
    • Improve DuckDuckGo
    • Mapping ClientHint value for AgentName (i.e. ClientHint “YaBrowser” –> “Yandex Browser”)
    • Handle edgecases:
      • ‘OpenBSD != Linux amd64’
      • ‘Linux x86_64:108.0’
    • Robots (Generic, Fediverse and AI Related):
      • AmazonBot, Bravebot, PetalBot
      • FediIndex, vmcrawl, Nonsensebot, Caveman-hunter, …
      • OpenAI/ChatGPT, Claudebot (Anthropic), PerplexityBot
    • Codeberg.org is a code hosting site (not a brand for a bot)
    • Updated the ISO 639-3 language code table

Version v7.29.0

  • Build
    • Require JDK 23 installed for Trino support.
    • Leverage new toolchains plugin: no longer needs toolchains.xml.
  • New/improved detections:
    • Do tag lookups for Webviews (Yandex showed wrong)
    • SamsungBrowser with a newer “reduced” version on a Phone doing DEX.
    • Snorlax useragent with BASE64 encoded part
    • Devices from OX Tab, Xiaomi
    • Partially handle broken: Safari “Mobile” on Mac OS X
    • Gitlab CI Runner
    • HUAWEI Quick App Center (+ false positive of it being a Hacker)
    • TV Bro
  • Analyzer:
    • Renamed Sec-CH-UA-Form-Factor to Sec-CH-UA-Form-Factors (no rules yet)

Donations

If this project has business value for you then don’t hesitate to support me with a small donation either via Github Sponsors or Paypal.


License

Yet Another UserAgent Analyzer
 Copyright (C) 2013-2025 Niels Basjes
 
 Licensed under the Apache License, Version 2.0 (the "License");
@@ -13,12 +13,12 @@
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
-
\ No newline at end of file + 
\ No newline at end of file diff --git a/other/article/index.html b/other/article/index.html index 72c7e3de2..ddbb5aced 100644 --- a/other/article/index.html +++ b/other/article/index.html @@ -1,10 +1,10 @@ Blogpost | Yauaa - Yet Another UserAgent Analyzer -
\ No newline at end of file diff --git a/other/index.html b/other/index.html index af3abcf7a..2b6a82121 100644 --- a/other/index.html +++ b/other/index.html @@ -1,10 +1,10 @@ Other | Yauaa - Yet Another UserAgent Analyzer -
\ No newline at end of file diff --git a/other/relatedprojects/index.html b/other/relatedprojects/index.html index ac2fbd16f..1b29205a4 100644 --- a/other/relatedprojects/index.html +++ b/other/relatedprojects/index.html @@ -3,13 +3,13 @@ You can track his efforts here on GitHub: Yauaa .NET standard and download his releases via Nuget.">Related projects | Yauaa - Yet Another UserAgent Analyzer -

Related projects

.NET port

Stefano Balzarotti is putting a lot of effort into porting Yauaa to run in .NET standard.

You can track his efforts here on GitHub: Yauaa .NET standard and -download his releases via Nuget.

\ No newline at end of file diff --git a/search/index.html b/search/index.html index ac1e1e1d1..fdc405fca 100644 --- a/search/index.html +++ b/search/index.html @@ -1,11 +1,11 @@ Search | Yauaa - Yet Another UserAgent Analyzer -

Search

-

\ No newline at end of file diff --git a/tags/index.html b/tags/index.html index 8c6a236d0..2b2ef712c 100644 --- a/tags/index.html +++ b/tags/index.html @@ -1,10 +1,10 @@ Tags | Yauaa - Yet Another UserAgent Analyzer -

Tags

\ No newline at end of file diff --git a/udf/apache-beam-sql/index.html b/udf/apache-beam-sql/index.html index 7cb4cf2e1..1ef9082bc 100644 --- a/udf/apache-beam-sql/index.html +++ b/udf/apache-beam-sql/index.html @@ -19,7 +19,7 @@ nl.basjes.parse.useragent yauaa-beam-sql 7.29.0 Available functions Getting a single value To get a single value from the parse result use this one: ParseUserAgentField(userAgent, 'DeviceClass') AS deviceClassField to give Phone Getting several values as a Map (requires Apache Beam 2.30.0 or newer) You can ask for all fields and return the full map with all of them in there.">Apache Beam SQL | Yauaa - Yet Another UserAgent Analyzer -

Apache Beam SQL

Introduction

This is a User Defined Function for Apache Beam SQL.

Getting the UDF

You can get the prebuilt UDF from maven central.

If you use a maven based project simply add this dependency to your project.

<dependency>
+

Apache Beam SQL

Introduction

This is a User Defined Function for Apache Beam SQL.

Getting the UDF

You can get the prebuilt UDF from maven central.

If you use a maven based project simply add this dependency to your project.

<dependency>
   <groupId>nl.basjes.parse.useragent</groupId>
   <artifactId>yauaa-beam-sql</artifactId>
   <version>7.29.0</version>
@@ -89,12 +89,12 @@
       .registerUdf("ParseUserAgentJson",  ParseUserAgentJson.class)
       .registerUdf("ParseUserAgentField", ParseUserAgentField.class)
     );
-

Limitations / Future

The ParseUserAgent and ParseUserAgentJson have a limitation of at most 10 fieldnames because Calcite does not yet support variable arguments for UDFs. If you need more than 10 fields you currently need to get all fields and then extract the fields you need from there.

Limitations / Future

The ParseUserAgent and ParseUserAgentJson have a limitation of at most 10 fieldnames because Calcite does not yet support variable arguments for UDFs. If you need more than 10 fields you currently need to get all fields and then extract the fields you need from there.

\ No newline at end of file + 
\ No newline at end of file diff --git a/udf/apache-beam/index.html b/udf/apache-beam/index.html index 4e0074654..0521b4c15 100644 --- a/udf/apache-beam/index.html +++ b/udf/apache-beam/index.html @@ -11,7 +11,7 @@ Getting the UDF You can get the prebuilt UDF from maven central. If you use a maven based project simply add this dependency to your project. nl.basjes.parse.useragent yauaa-beam 7.29.0 Usage Assume you have a PCollection with your records. In most cases I see (clickstream data) these records (In this example this class is called “TestRecord”) contain the useragent string in a field and the parsed results must be added to these fields.">Apache Beam | Yauaa - Yet Another UserAgent Analyzer -

Apache Beam

Introduction

This is a User Defined Function for Apache Beam

Getting the UDF

You can get the prebuilt UDF from maven central.

If you use a maven based project simply add this dependency to your project.

<dependency>
+

Apache Beam

Introduction

This is a User Defined Function for Apache Beam

Getting the UDF

You can get the prebuilt UDF from maven central.

If you use a maven based project simply add this dependency to your project.

<dependency>
   <groupId>nl.basjes.parse.useragent</groupId>
   <artifactId>yauaa-beam</artifactId>
   <version>7.29.0</version>
@@ -97,12 +97,12 @@
     record.agentNameVersion = value;
   }
 }

and then in the topology simply do this

.apply("Extract Elements from Useragent",
-  ParDo.of(new MyUserAgentAnalysisDoFn()));
\ No newline at end of file + 
\ No newline at end of file diff --git a/udf/apache-drill/index.html b/udf/apache-drill/index.html index 6a892bdb7..e0c00c27d 100644 --- a/udf/apache-drill/index.html +++ b/udf/apache-drill/index.html @@ -11,7 +11,7 @@ This function is now also packaged as part of Apache Drill itself: documentation. Notable changes With Yauaa 7.0.0 the code for this UDF has been copied back from the Drill project to ensure it keeps working as expected. the parse_user_agent_field has been removed and parse_user_agent supports the same input/output now. Usage I have copied/implemented the functions">Apache Drill | Yauaa - Yet Another UserAgent Analyzer -

Apache Drill

Introduction

This is UDF for Apache Drill. +

Apache Drill

Introduction

This is UDF for Apache Drill. This function was originally created by Charles S. Givre and was imported into the main Yauaa project to ensure users would have a prebuilt and up-to-date version available.

This function is now also packaged as part of Apache Drill itself: documentation.

Notable changes

With Yauaa 7.0.0

  • the code for this UDF has been copied back from the Drill project to ensure it keeps working as expected.
  • the parse_user_agent_field has been removed and parse_user_agent supports the same input/output now.

Usage

I have copied/implemented the functions

parse_user_agent ( <useragent> )
 parse_user_agent ( <useragent> , <desired fieldname> )
@@ -106,12 +106,12 @@
 | Phone       | Chrome 101            | Android 11.0.0             |
 +-------------+-----------------------+----------------------------+
 2 rows selected (0.275 seconds)
-

The improvement after adding the Client Hints is evident.

\ No newline at end of file + 
\ No newline at end of file diff --git a/udf/apache-flink-table/index.html b/udf/apache-flink-table/index.html index 6a63083f5..4b45ca56c 100644 --- a/udf/apache-flink-table/index.html +++ b/udf/apache-flink-table/index.html @@ -15,7 +15,7 @@ If you use a maven based project simply add this dependency to your project. nl.basjes.parse.useragent yauaa-flink-table 7.29.0 Syntax Assume you register this function under the name ParseUserAgent Then the generic usage in your SQL is ParseUserAgent() This returns a Map with all the requested values in one go.">Apache Flink Table/SQL | Yauaa - Yet Another UserAgent Analyzer -

Apache Flink Table/SQL

Introduction

This is a User Defined Function for Apache Flink Table

Getting the UDF

You can get the prebuilt UDF from maven central.

If you use a maven based project simply add this dependency to your project.

<dependency>
+

Apache Flink Table/SQL

Introduction

This is a User Defined Function for Apache Flink Table

Getting the UDF

You can get the prebuilt UDF from maven central.

If you use a maven based project simply add this dependency to your project.

<dependency>
   <groupId>nl.basjes.parse.useragent</groupId>
   <artifactId>yauaa-flink-table</artifactId>
   <version>7.29.0</version>
@@ -59,12 +59,12 @@
 
 // 3 Strings
 TypeInformation<Row> tupleType = new RowTypeInfo(STRING, STRING, STRING);
-DataStream<Row> resultSet = tableEnv.toAppendStream(resultTable, tupleType);
\ No newline at end of file + 
\ No newline at end of file diff --git a/udf/apache-flink/index.html b/udf/apache-flink/index.html index 9ac0fe609..ebe6d3dd4 100644 --- a/udf/apache-flink/index.html +++ b/udf/apache-flink/index.html @@ -11,7 +11,7 @@ Getting the UDF You can get the prebuilt UDF from maven central. If you use a maven based project simply add this dependency to your project. nl.basjes.parse.useragent yauaa-flink 7.29.0 Usage Assume you have a DataSet or DataStream with your records. In most cases I see (clickstream data) these records (In this example this class is called “TestRecord”) contain the useragent string in a field and the parsed results must be added to these fields.">Apache Flink | Yauaa - Yet Another UserAgent Analyzer -

Apache Flink

Introduction

This is a User Defined Function for Apache Flink

Getting the UDF

You can get the prebuilt UDF from maven central.

If you use a maven based project simply add this dependency to your project.

<dependency>
+

Apache Flink

Introduction

This is a User Defined Function for Apache Flink

Getting the UDF

You can get the prebuilt UDF from maven central.

If you use a maven based project simply add this dependency to your project.

<dependency>
   <groupId>nl.basjes.parse.useragent</groupId>
   <artifactId>yauaa-flink</artifactId>
   <version>7.29.0</version>
@@ -97,12 +97,12 @@
     public void setOSNV(TestRecord record, String value) {
         record.operatingSystemNameVersion = value;
     }
-}

and then in the topology simply do this

.map(new MyUserAgentAnalysisMapper())

and then in the topology simply do this

.map(new MyUserAgentAnalysisMapper())
\ No newline at end of file + 
\ No newline at end of file diff --git a/udf/apache-hive/index.html b/udf/apache-hive/index.html index a4b07e3f3..58cda1564 100644 --- a/udf/apache-hive/index.html +++ b/udf/apache-hive/index.html @@ -19,7 +19,7 @@ If you use a maven based project simply add this dependency nl.basjes.parse.useragent yauaa-hive udf 7.29.0 Building Simply install the normal build tools for a Java project (i.e. maven and jdk) and then simply do: mvn clean package Example usage First the jar file must be ‘known’ Either by doing">Apache Hive | Yauaa - Yet Another UserAgent Analyzer -

Apache Hive

Introduction

This is a User Defined Function for Apache Hive

Getting the UDF

You can get the prebuilt UDF from maven central (yauaa-hive-7.29.0-udf.jar).

NOTE: You MUST use the -udf.jar: yauaa-hive-7.29.0-udf.jar

If you use a maven based project simply add this dependency

<dependency>
+

Apache Hive

Introduction

This is a User Defined Function for Apache Hive

Getting the UDF

You can get the prebuilt UDF from maven central (yauaa-hive-7.29.0-udf.jar).

NOTE: You MUST use the -udf.jar: yauaa-hive-7.29.0-udf.jar

If you use a maven based project simply add this dependency

<dependency>
   <groupId>nl.basjes.parse.useragent</groupId>
   <artifactId>yauaa-hive</artifactId>
   <classifier>udf</classifier>
@@ -106,12 +106,12 @@
     ) AS parsedUseragentAllFields
     FROM   clickLogs
 ) ParsedSubSelect;
-
\ No newline at end of file + 
\ No newline at end of file diff --git a/udf/apache-nifi/index.html b/udf/apache-nifi/index.html index 2ec30cbc0..19077d82e 100644 --- a/udf/apache-nifi/index.html +++ b/udf/apache-nifi/index.html @@ -19,31 +19,31 @@ If you use a maven based project simply add this dependency nl.basjes.parse.useragent yauaa-nifi nar 7.29.0 Installation To install this function put the nar file in the /lib directory. cp ./udfs/nifi/nifi-nar/target/yauaa-nifi-.nar /lib Make sure you replace with your actual path to your nifi installation. After you have added this nar file you will find the ParseUserAgent processor in the list.">Apache Nifi | Yauaa - Yet Another UserAgent Analyzer -

Apache Nifi

Introduction

This is a User Defined Function for Apache Nifi

Introduction

This is an Apache Nifi Processor for parsing User Agent Strings.

Getting the Processor

You can get the prebuilt NAR file from maven central.

If you use a maven based project simply add this dependency

<dependency>
+

Apache Nifi

Introduction

This is a User Defined Function for Apache Nifi

Introduction

This is an Apache Nifi Processor for parsing User Agent Strings.

Getting the Processor

You can get the prebuilt NAR file from maven central.

If you use a maven based project simply add this dependency

<dependency>
   <groupId>nl.basjes.parse.useragent</groupId>
   <artifactId>yauaa-nifi</artifactId>
   <type>nar</type>
   <version>7.29.0</version>
 </dependency>

Installation

To install this function put the nar file in the <nifi-path>/lib directory.

cp ./udfs/nifi/nifi-nar/target/yauaa-nifi-<version>.nar <nifi-path>/lib
 

Make sure you replace <nifi-path> with your actual path to your nifi installation. -After you have added this nar file you will find the ParseUserAgent processor in the list.

Add Processor dialog -Add Processor dialog

Usage and examples

  1. First you make sure that the FlowFile going into this processor has the attributes needed as input.

  2. In the configuration specify which attributes contain the values of the Request Headers that were logged. The only mandatory one is RequestHeader.UserAgent. The other properties refer to the original User-Agent Client Hints request header names. -Configure Processor dialog -Configure Processor dialog

  3. In the configuration enable the fields you need for analysis. By default none have been selected. -Configure Processor dialog -Configure Processor dialog

  4. The output FlowFile will now have additional attributes for all of the selected attributes that are named +After you have added this nar file you will find the ParseUserAgent processor in the list.

    Add Processor dialog +Add Processor dialog

    Usage and examples

    1. First you make sure that the FlowFile going into this processor has the attributes needed as input.

    2. In the configuration specify which attributes contain the values of the Request Headers that were logged. The only mandatory one is RequestHeader.UserAgent. The other properties refer to the original User-Agent Client Hints request header names. +Configure Processor dialog +Configure Processor dialog

    3. In the configuration enable the fields you need for analysis. By default none have been selected. +Configure Processor dialog +Configure Processor dialog

    4. The output FlowFile will now have additional attributes for all of the selected attributes that are named Useragent.SelectedField.

      Key: 'Useragent.DeviceClass'
               Value: 'Phone'
       Key: 'Useragent.OperatingSystemNameVersion'
               Value: 'Android 4.1.2'
       

      In this log example the XXsomethingXX attributes are the input values and the Useragent.something are the outputs: -Output from LogAttributes -Output from LogAttributes

\ No newline at end of file + 
\ No newline at end of file diff --git a/udf/apache-pig/index.html b/udf/apache-pig/index.html index 9dcb399e1..c00666711 100644 --- a/udf/apache-pig/index.html +++ b/udf/apache-pig/index.html @@ -15,7 +15,7 @@ Getting the UDF You can get the prebuilt UDF from maven central. If you use a maven based project simply add this dependency nl.basjes.parse.useragent yauaa-pig udf 6.12 Example usage -- Import the UDF jar file so this script can use it REGISTER ../target/*-udf.jar; ------------------------------------------------------------------------ -- Define a more readable name for the UDF and pass optional parameters -- First parameter is ALWAYS the cache size (as a text string!) -- The parameters after that are the requested fields. ---------- -- If you simply want 'everything' -- DEFINE ParseUserAgent nl.basjes.parse.useragent.pig.ParseUserAgent; ---------- -- If you just want to set the cache -- DEFINE ParseUserAgent nl.basjes.parse.useragent.pig.ParseUserAgent('10000'); ---------- -- If you want to set the cache and only retrieve the specified fields DEFINE ParseUserAgent nl.basjes.parse.useragent.pig.ParseUserAgent('10000', 'DeviceClass', 'DeviceBrand' ); rawData = LOAD 'testcases.txt' USING PigStorage() AS ( useragent: chararray ); UaData = FOREACH rawData GENERATE useragent, -- Do NOT specify a type for this field as the UDF provides the definitions ParseUserAgent(useragent) AS parsedAgent;">Apache Pig | Yauaa - Yet Another UserAgent Analyzer -

Apache Pig

DEPRECATED

Apache Pig is no longer used. So with Yauaa 7 this UDF has been dropped. +

Apache Pig

DEPRECATED

Apache Pig is no longer used. So with Yauaa 7 this UDF has been dropped. Version 6.12 is the last released version which still has the Apache Pig in it.

Introduction

This is a User Defined Function for Apache Pig

Getting the UDF

You can get the prebuilt UDF from maven central.

If you use a maven based project simply add this dependency

<dependency>
   <groupId>nl.basjes.parse.useragent</groupId>
   <artifactId>yauaa-pig</artifactId>
@@ -47,12 +47,12 @@
     FOREACH  rawData
     GENERATE useragent,
              -- Do NOT specify a type for this field as the UDF provides the definitions
-             ParseUserAgent(useragent) AS parsedAgent;
\ No newline at end of file + 
\ No newline at end of file diff --git a/udf/commandline/index.html b/udf/commandline/index.html index 390d68fe2..d792db58e 100644 --- a/udf/commandline/index.html +++ b/udf/commandline/index.html @@ -7,7 +7,7 @@ So if you have the need to use Yauaa from a commandline perspective the easiest way to do this is by starting the docker based webservlet locally (and leave it running “for a long time”) and use something like curl to get the information you are looking for.">Commandline usage | Yauaa - Yet Another UserAgent Analyzer -

Commandline usage

Introduction

With version 6.0 the dedicated commandline tool was removed.

Primary reason is that it was not getting any attention, +

Commandline usage

Introduction

With version 6.0 the dedicated commandline tool was removed.

Primary reason is that it was not getting any attention, and it did not perform well (mainly due to the relatively big startup overhead).

So if you have the need to use Yauaa from a commandline perspective the easiest way to do this is by starting the docker based webservlet locally (and leave it running “for a long time”) and use something like curl to get the information you are looking for.

Initial startup

Simply start the webservlet using docker and run it in the background (takes a few seconds):

docker pull nielsbasjes/yauaa:7.29.0
@@ -44,12 +44,12 @@
       AgentNameVersionMajor                : 'Chrome 53'

Doing a large number of values

You can upload a file with useragents to the rest interface (there is a size limitation for this).

curl -X POST \
     -H "Content-Type: text/plain" \
     http://localhost:8080/yauaa/v1/analyze/yaml \
-    --data-binary "@useragents.txt"
\ No newline at end of file + 
\ No newline at end of file diff --git a/udf/elastic-logstash/index.html b/udf/elastic-logstash/index.html index 26a921768..ef3f31609 100644 --- a/udf/elastic-logstash/index.html +++ b/udf/elastic-logstash/index.html @@ -11,7 +11,7 @@ STATUS: … DROPPED … With Yauaa 7.18.0 the logstash UDF has been dropped. The primary reason is that 3 years after Elastic announced Java UDF support as “GA” they have not published the needed dependencies. The workaround I came up with is starting to cause more and more problems so I’m dropping it. Still want it? Get the sources from the latest tag that still had it and build it yourself. https://github.com/nielsbasjes/yauaa/tree/v7.17.1/udfs/elastic/logstash">Elastic LogStash | Yauaa - Yet Another UserAgent Analyzer -

Elastic LogStash

Introduction

User Defined Function (Filter plugin) for Elastic Logstash

STATUS: … DROPPED …

With Yauaa 7.18.0 the logstash UDF has been dropped.

The primary reason is that 3 years after Elastic announced Java UDF support as “GA” they have not published the needed dependencies. The workaround I came up with is starting to cause more and more problems so I’m dropping it.

Still want it?

Get the sources from the latest tag that still had it and build it yourself. +

Elastic LogStash

Introduction

User Defined Function (Filter plugin) for Elastic Logstash

STATUS: … DROPPED …

With Yauaa 7.18.0 the logstash UDF has been dropped.

The primary reason is that 3 years after Elastic announced Java UDF support as “GA” they have not published the needed dependencies. The workaround I came up with is starting to cause more and more problems so I’m dropping it.

Still want it?

Get the sources from the latest tag that still had it and build it yourself. https://github.com/nielsbasjes/yauaa/tree/v7.17.1/udfs/elastic/logstash

See for more information:

Installing the filter

You only need to install it into your logstash once per installation

logstash-plugin remove logstash-filter-yauaa
 logstash-plugin install ./udfs/logstash/target/logstash-filter-yauaa-7.29.0.gem

Example usage

You need to specify

  1. The source field which maps the fields in the record to their original request headers.
  2. For each Yauaa field you need the logstash field in which it needs to be placed.
filter {
   yauaa {
@@ -53,12 +53,12 @@
           "uaAN" => "Chrome",
         "uaANVM" => "Chrome 100",
         "uaAANV" => "Chrome 100.0.4896.60"
-}
\ No newline at end of file + 
\ No newline at end of file diff --git a/udf/elastic-search/index.html b/udf/elastic-search/index.html index 2be1cd74c..d6ccbc661 100644 --- a/udf/elastic-search/index.html +++ b/udf/elastic-search/index.html @@ -23,7 +23,7 @@ Elastic Search 7.x. Elastic Search 8.x. Installing the plugin You only need to install it into your Elastic Search once On Elastic Search 7.x: bin/elasticsearch-plugin install file:///path/to/yauaa-elasticsearch-7.29.0.zip On Elastic Search 8.x">Elastic Search | Yauaa - Yet Another UserAgent Analyzer -

Elastic Search

Introduction

User Defined Function (ingest processor) for Elastic Search

STATUS: … EXPERIMENTAL …

The ElasticSearch ingest plugin is very new.

And yes it is similar to https://www.elastic.co/guide/en/elasticsearch/reference/master/user-agent-processor.html

Getting the UDF

You can get the prebuilt ingest plugin from maven central for

Installing the plugin

You only need to install it into your Elastic Search once

On Elastic Search 7.x:

bin/elasticsearch-plugin install file:///path/to/yauaa-elasticsearch-7.29.0.zip

On Elastic Search 8.x

bin/elasticsearch-plugin install file:///path/to/yauaa-elasticsearch-8-7.29.0.zip

Usage

This plugin is intended to be used in an ingest pipeline.

You have to specify the name of the input field and the place where +

Elastic Search

Introduction

User Defined Function (ingest processor) for Elastic Search

STATUS: … EXPERIMENTAL …

The ElasticSearch ingest plugin is very new.

And yes it is similar to https://www.elastic.co/guide/en/elasticsearch/reference/master/user-agent-processor.html

Getting the UDF

You can get the prebuilt ingest plugin from maven central for

Installing the plugin

You only need to install it into your Elastic Search once

On Elastic Search 7.x:

bin/elasticsearch-plugin install file:///path/to/yauaa-elasticsearch-7.29.0.zip

On Elastic Search 8.x

bin/elasticsearch-plugin install file:///path/to/yauaa-elasticsearch-8-7.29.0.zip

Usage

This plugin is intended to be used in an ingest pipeline.

You have to specify the name of the input field and the place where the possible configuration flags are:

NameMandatory/OptionalDescriptionDefaultExample
field_to_header_mappingMThe mapping from the input field name to the original request header name of this field-"field_to_header_mapping" : { "ua": "User-Agent" }
field (deprecated)MThe name of the input field that contains the UserAgent string-"useragent"
target_fieldMThe name of the output structure that will be filled with the parse results"user_agent""parsed_ua"
fieldNamesOA list of Yauaa fieldnames that are desired. When specified the system will limit processing to what is needed to get these. This means faster and less memory used.All possible fields[ "DeviceClass", "DeviceBrand", "DeviceName", "AgentNameVersionMajor" ]
cacheSizeOThe number of entries in the LRU cache of the parser10000100
preheatOHow many testcases are put through the parser at startup to warmup the JVM01000
extraRulesOA yaml expression that is a set of extra rules and testcases.-"config:\n- matcher:\n extract:\n - '"'"'FirstProductName : 1 :agent.(1)product.(1)name'"'"'\n"

Example usage

Basic pipeline

Create a pipeline that just extracts everything using the default settings:

curl -H 'Content-Type: application/json' -X PUT 'localhost:9200/_ingest/pipeline/yauaa-test-pipeline_basic' -d '
 {
   "description": "A pipeline to do whatever",
@@ -105,12 +105,12 @@
     "_version": 1,
     "found": true
 }
-

NOTES for developers

The ElasticSearch testing tools are quick to complain about jar classloading issues: “jar hell”.

To make it possible to test this in IntelliJ you’ll need to set a custom property

  1. Help –> Edit Custom properties
  2. Make sure there is a line with idea.no.launcher=true
  3. Restart IntelliJ

See also https://stackoverflow.com/questions/51045201/using-the-elasticsearch-test-framework-in-intellij-how-to-resolve-the-jar-hell/51045272

\ No newline at end of file + 
\ No newline at end of file diff --git a/udf/index.html b/udf/index.html index cf8484701..72fb97282 100644 --- a/udf/index.html +++ b/udf/index.html @@ -7,16 +7,16 @@ Apache Beam Apache Beam SQL Apache Drill Apache Flink Apache Flink Table/SQL Apache Hive Apache Nifi Apache Pig Commandline usage Elastic LogStash Elastic Search LogParser Snowflake Snowplow Trino">User Defined Functions | Yauaa - Yet Another UserAgent Analyzer -

User Defined Functions

Several external computation systems support the concept of a User Defined Function (UDF). +

User Defined Functions

Several external computation systems support the concept of a User Defined Function (UDF). A UDF is simply a way of making functionality (in this case the analysis of useragents) available in such a system.

For several systems (tools used within bol.com (where I work)) -I have written such a UDF which are all part of this project.

\ No newline at end of file + 
\ No newline at end of file diff --git a/udf/logparser/index.html b/udf/logparser/index.html index 79af61923..6ff746fe8 100644 --- a/udf/logparser/index.html +++ b/udf/logparser/index.html @@ -15,17 +15,17 @@ NOTE: You MUST use the -udf.jar: yauaa-logparser-7.29.0-udf.jar If you use a maven based project simply add this dependency nl.basjes.parse.useragent yauaa-logparser udf 7.29.0 Client hints Because the logparser can only dissect a single field into multiple pieces it is impossible to extend this to support User-Agent Client Hints.">LogParser | Yauaa - Yet Another UserAgent Analyzer -

LogParser

Introduction

This is a User Defined Function for LogParser

Getting the UDF

You can get the prebuilt UDF from maven central (yauaa-logparser-7.29.0-udf.jar).

NOTE: You MUST use the -udf.jar: yauaa-logparser-7.29.0-udf.jar

If you use a maven based project simply add this dependency

<dependency>
+

LogParser

Introduction

This is a User Defined Function for LogParser

Getting the UDF

You can get the prebuilt UDF from maven central (yauaa-logparser-7.29.0-udf.jar).

NOTE: You MUST use the -udf.jar: yauaa-logparser-7.29.0-udf.jar

If you use a maven based project simply add this dependency

<dependency>
   <groupId>nl.basjes.parse.useragent</groupId>
   <artifactId>yauaa-logparser</artifactId>
   <classifier>udf</classifier>
   <version>7.29.0</version>
-</dependency>

Client hints

Because the logparser can only dissect a single field into multiple pieces it is impossible to extend this to support User-Agent Client Hints.

Client hints

Because the logparser can only dissect a single field into multiple pieces it is impossible to extend this to support User-Agent Client Hints.

\ No newline at end of file + 
\ No newline at end of file diff --git a/udf/snowflake/index.html b/udf/snowflake/index.html index 128a46b3a..9857be3ae 100644 --- a/udf/snowflake/index.html +++ b/udf/snowflake/index.html @@ -15,14 +15,14 @@ Snowflake has marked (last checked on 2021-11-07) Java based UDFs as a Preview Feature. I do not have Snowflake so I do not have any way of testing this other than getting feedback from you. Thanks to Luke Ambrosetti for helping out here! See for more information: https://docs.snowflake.com/en/developer-guide/udf/java/udf-java.html Installation and usage Download the UDF jar to the local file system and upload into a Snowflake internal or external stage.">Snowflake | Yauaa - Yet Another UserAgent Analyzer -

Snowflake

Introduction

User Defined Function for Snowflake.

STATUS: … EXPERIMENTAL …

The Snowflake UDF is very experimental for two reasons:

  • Snowflake has marked (last checked on 2021-11-07) Java based UDFs as a Preview Feature.
  • I do not have Snowflake so I do not have any way of testing this other than getting feedback from you.

Thanks to Luke Ambrosetti for helping out here!

See for more information:

Installation and usage

  1. Download the UDF jar to the local file system and upload into a Snowflake internal or external stage.

    You can get the prebuilt UDF from maven central (yauaa-snowflake-7.29.0-udf.jar).

    NOTE: You MUST use the -udf.jar: yauaa-snowflake-7.29.0-udf.jar

  2. Register the function in Snowflake with something like this:

create or replace function parse_useragent(useragent ARRAY)
+

Snowflake

Introduction

User Defined Function for Snowflake.

STATUS: … EXPERIMENTAL …

The Snowflake UDF is very experimental for two reasons:

  • Snowflake has marked (last checked on 2021-11-07) Java based UDFs as a Preview Feature.
  • I do not have Snowflake so I do not have any way of testing this other than getting feedback from you.

Thanks to Luke Ambrosetti for helping out here!

See for more information:

Installation and usage

  1. Download the UDF jar to the local file system and upload into a Snowflake internal or external stage.

    You can get the prebuilt UDF from maven central (yauaa-snowflake-7.29.0-udf.jar).

    NOTE: You MUST use the -udf.jar: yauaa-snowflake-7.29.0-udf.jar

  2. Register the function in Snowflake with something like this:

create or replace function parse_useragent(useragent ARRAY)
 returns object
 language java
 imports = ('@cs_stage/yauaa-snowflake-7.29.0-udf.jar')
 handler='nl.basjes.parse.useragent.snowflake.ParseUserAgent.parse';

NOTE: The argument of the UDF was in Yauaa 6 defined as a VARCHAR, it must now be defined as an ARRAY!

  1. And from there you can use it as a function in your SQL statements
select parse_useragent(
     'Mozilla/5.0 (Linux; Android 7.0; Nexus 6 Build/NBD90Z) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.124 Mobile Safari/537.36'
-) as ua_obj, ua_obj:AgentClass::string as agent_class;

Using Yauaa in Snowflake with just a UserAgent -Using Yauaa in Snowflake with just a UserAgent

Using User-Agent Client Hints

With version 7.0.0 you are now able to analyze the Client Hints aswell.

Note: The arguments to the function are a single array of values!

select parse_useragent(
+) as ua_obj, ua_obj:AgentClass::string as agent_class;

Using Yauaa in Snowflake with just a UserAgent +Using Yauaa in Snowflake with just a UserAgent

Using User-Agent Client Hints

With version 7.0.0 you are now able to analyze the Client Hints aswell.

Note: The arguments to the function are a single array of values!

select parse_useragent(
    ['User-Agent',                   'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
     'Sec-Ch-Ua',                    '\" Not A;Brand\";v=\"99\", \"Chromium\";v=\"100\", \"Google Chrome\";v=\"100\"',
     'Sec-Ch-Ua-Arch',               '\"x86\"',
@@ -34,8 +34,8 @@
     'Sec-Ch-Ua-Platform',           '\"Linux\"',
     'Sec-Ch-Ua-Platform-Version',   '\"5.13.0\"',
     'Sec-Ch-Ua-Wow64',              '?0']
-) as ua_obj, ua_obj:OperatingSystemNameVersion::string as operating_system_name_version;

Using Yauaa in Snowflake with all Headers -Using Yauaa in Snowflake with all Headers

When only examining the User-Agent this returns Linux ??, with the added information in the Client Hints you should get Linux 5.13.0 instead.

Note that this next form is also supported (the first is the User-Agent, from there it is a list of “header name” and “value”):

select parse_useragent(
+) as ua_obj, ua_obj:OperatingSystemNameVersion::string as operating_system_name_version;

Using Yauaa in Snowflake with all Headers +Using Yauaa in Snowflake with all Headers

When only examining the User-Agent this returns Linux ??, with the added information in the Client Hints you should get Linux 5.13.0 instead.

Note that this next form is also supported (the first is the User-Agent, from there it is a list of “header name” and “value”):

select parse_useragent(
     ['Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36',
     'Sec-Ch-Ua',                    '\" Not A;Brand\";v=\"99\", \"Chromium\";v=\"100\", \"Google Chrome\";v=\"100\"',
     'Sec-Ch-Ua-Arch',               '\"x86\"',
@@ -47,13 +47,13 @@
     'Sec-Ch-Ua-Platform',           '\"Linux\"',
     'Sec-Ch-Ua-Platform-Version',   '\"5.13.0\"',
     'Sec-Ch-Ua-Wow64',              '?0']
-) as ua_obj, ua_obj:OperatingSystemNameVersion::string as operating_system_name_version;

Using Yauaa in Snowflake with all Headers -Using Yauaa in Snowflake with all Headers

Using Yauaa in Snowflake with all Headers +Using Yauaa in Snowflake with all Headers

\ No newline at end of file + 
\ No newline at end of file diff --git a/udf/snowplow/index.html b/udf/snowplow/index.html index fda24017f..92694dd95 100644 --- a/udf/snowplow/index.html +++ b/udf/snowplow/index.html @@ -7,12 +7,12 @@ The official documentation: Snowplow Yauaa Enrichment">Snowplow | Yauaa - Yet Another UserAgent Analyzer -

Snowplow

Introduction

If you are a user of the Snowplow Analytics system and would like to use Yauaa in your analysis you are in luck.

The people at Snowplow have included Yauaa as a readily available feature in their system.

The official documentation: Snowplow Yauaa Enrichment

\ No newline at end of file diff --git a/udf/trino/index.html b/udf/trino/index.html index 25131f235..cd6d40287 100644 --- a/udf/trino/index.html +++ b/udf/trino/index.html @@ -7,7 +7,7 @@ Trino now requires Java 23 (which is non-LTS) which is not readily available for installation in Ubuntu using a normal package manager. This means that I have chosen to no longer let the build fail if you do not have Java 22 installed. The CI build does do Java 23 so any breaking API changes should be detected there.">Trino | Yauaa - Yet Another UserAgent Analyzer -

Trino

Introduction

This is a User Defined Function for Trino (a.k.a. Presto SQL)

STATUS: … EXPERIMENTAL …

The Trino plugin is very new. +

Trino

Introduction

This is a User Defined Function for Trino (a.k.a. Presto SQL)

STATUS: … EXPERIMENTAL …

The Trino plugin is very new. Please tell if it works or not in your case.

Trino now requires Java 23 (which is non-LTS) which is not readily available for installation in Ubuntu using a normal package manager. This means that I have chosen to no longer let the build fail if you do not have Java 22 installed. The CI build does do Java 23 so any breaking API changes should be detected there.

This UDF will simply not be built if Java 22 is missing (and thus may go missing in some releases).

If you have Java 23 installed (and added it to your toolchains.xml) you can still build it.

Installation

You can get the prebuilt UDF from maven central (yauaa-trino-7.29.0-udf.jar).

NOTE: You MUST use the -udf.jar: yauaa-trino-7.29.0-udf.jar

In the plugin directory of your Trino server create a subdirectory and copy the yauaa-trino-7.29.0-udf.jar to that new directory.

In the trino docker image this is /usr/lib/trino/plugin/ so putting the jar in something like /usr/lib/trino/plugin/yauaa is a fine choice.

Important note: This directory may only contain this jar file; no other files may be present!

Usage

This UDF provides two new functions parse_user_agent(<useragent>) and parse_user_agent(array(<parameters>)).

This first function needs one input which is the UserAgent string that needs to be analyzed.

The return value is a map(varchar, varchar) which is a key value map of all possible properties.

The second function needs a list of header name, value pairs to define the headers on which the provided values were originally received.

Example : Just the User-Agent string.

SELECT parsedUseragent['DeviceClass']                   AS DeviceClass,
@@ -48,12 +48,12 @@
 );

Outputs:

 DeviceClass | AgentNameVersionMajor | OperatingSystemNameVersion
 -------------+-----------------------+----------------------------
  Desktop     | Chrome 100            | Mac OS 12.3.1
-(1 row)
\ No newline at end of file + 
\ No newline at end of file diff --git a/using/clienthints/index.html b/using/clienthints/index.html index 2f2f66d2e..ee8aa19c2 100644 --- a/using/clienthints/index.html +++ b/using/clienthints/index.html @@ -3,7 +3,7 @@ In addition, steps are taken to provide information to website builders that is intended to be sufficient for running a website and less prone to tracking people.">User-Agent Client Hints | Yauaa - Yet Another UserAgent Analyzer -

User-Agent Client Hints

The User-Agent and the User-Agent Client Hints

From about 2019 onward several of the main browsers (Firefox/Chromium/Chrome/Edge/…) have been making steps to reduce the information in the User-Agent. The main reason is that the User-Agents so far have so much detailed information that it became so unique that some could be used as a device id for tracking purposes.

In addition, steps are taken to provide information to website builders that is intended to be sufficient for running a website and less prone to tracking people.

As part of this an extension to the Client Hints have been documented and implemented in the Chromium based browsers to provide the User-Agent Client Hints via the HTTP request headers.

See:

Getting the browser to send User-Agent Client Hints

Now the User-Agent Client Hints are provided by the browser in each request to the server via additional request headers.

First important thing is that they will only be send if the server is localhost or over a secured connection (https).

If you try a remote server over plain http you will see no User-Agent Client Hints at all.

By default the browsers that support this will send the “low entropy” values without the need to do anything special (other than going over https).

These headers are

Request headerExample value
Sec-Ch-Ua" Not A;Brand";v=“99”, “Chromium”;v=“100”, “Google Chrome”;v=“100”
Sec-Ch-Ua-Mobile?0
Sec-Ch-Ua-Platform“Windows”

If additional headers are desired then the service should send an Accept-CH response header with the first response and then any subsequent requests will (if allowed) send the requested additional headers.

Accept-CH: Sec-CH-UA, Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Form-Factors, Sec-CH-UA-Full-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version, Sec-CH-UA-WoW64
+

User-Agent Client Hints

The User-Agent and the User-Agent Client Hints

From about 2019 onward several of the main browsers (Firefox/Chromium/Chrome/Edge/…) have been making steps to reduce the information in the User-Agent. The main reason is that the User-Agents so far have so much detailed information that it became so unique that some could be used as a device id for tracking purposes.

In addition, steps are taken to provide information to website builders that is intended to be sufficient for running a website and less prone to tracking people.

As part of this an extension to the Client Hints have been documented and implemented in the Chromium based browsers to provide the User-Agent Client Hints via the HTTP request headers.

See:

Getting the browser to send User-Agent Client Hints

Now the User-Agent Client Hints are provided by the browser in each request to the server via additional request headers.

First important thing is that they will only be send if the server is localhost or over a secured connection (https).

If you try a remote server over plain http you will see no User-Agent Client Hints at all.

By default the browsers that support this will send the “low entropy” values without the need to do anything special (other than going over https).

These headers are

Request headerExample value
Sec-Ch-Ua" Not A;Brand";v=“99”, “Chromium”;v=“100”, “Google Chrome”;v=“100”
Sec-Ch-Ua-Mobile?0
Sec-Ch-Ua-Platform“Windows”

If additional headers are desired then the service should send an Accept-CH response header with the first response and then any subsequent requests will (if allowed) send the requested additional headers.

Accept-CH: Sec-CH-UA, Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Form-Factors, Sec-CH-UA-Full-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version, Sec-CH-UA-WoW64
 

If the additional headers are critical to your application you can send Critical-CH in addition of the Accept-CH to indicate which are

Critical-CH: Sec-CH-UA, Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Form-Factors, Sec-CH-UA-Full-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version, Sec-CH-UA-WoW64
 

See:

Depending on the situation you may need to also set a Permissions-Policy HTTP header to actually get the desired headers.

The headers Yauaa can handle are shown in this table.

The shown example values are the real values recorded when running Chrome 100.0.4896.75 with the reduced User-Agent setting enabled on Windows 7.

Request headerExample value
User-AgentMozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.0.0 Safari/537.36
Sec-Ch-Ua" Not A;Brand";v=“99”, “Chromium”;v=“100”, “Google Chrome”;v=“100”
Sec-Ch-Ua-Arch“x86”
Sec-Ch-Ua-Bitness“64”
Sec-CH-Ua-Full-Version“100.0.4896.75”
Sec-Ch-Ua-Full-Version-List" Not A;Brand";v=“99.0.0.0”, “Chromium”;v=“100.0.4896.75”, “Google Chrome”;v=“100.0.4896.75”
Sec-Ch-Ua-Mobile?0
Sec-Ch-Ua-Model""
Sec-Ch-Ua-Platform“Windows”
Sec-Ch-Ua-Platform-Version“0.1.0”
Sec-Ch-Ua-Wow64?0

Logging the User-Agent Client Hints

If you happen to be using the Apache HTTPD webserver you can record these values with a LogFormat configuration something like this:

LogFormat "%a %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%{Sec-CH-UA}i\" \"%{Sec-CH-UA-Arch}i\" \"%{Sec-CH-UA-Bitness}i\" \"%{Sec-CH-UA-Form-Factors}i\" \"%{Sec-CH-UA-Full-Version}i\" \"%{Sec-CH-UA-Full-Version-List}i\" \"%{Sec-CH-UA-Mobile}i\" \"%{Sec-CH-UA-Model}i\" \"%{Sec-CH-UA-Platform}i\" \"%{Sec-CH-UA-Platform-Version}i\" \"%{Sec-CH-UA-WoW64}i\" %V" combinedhintsvhost
 

Behind this Apache Httpd webserver is a website that returns the header

Accept-CH: Sec-CH-UA, Sec-CH-UA-Arch, Sec-CH-UA-Bitness, Sec-CH-UA-Form-Factors, Sec-CH-UA-Full-Version, Sec-CH-UA-Full-Version-List, Sec-CH-UA-Mobile, Sec-CH-UA-Model, Sec-CH-UA-Platform, Sec-CH-UA-Platform-Version, Sec-CH-UA-WoW64
@@ -28,12 +28,12 @@
 UserAgent userAgent = uaa.parse(requestHeaders);
 

this results in (among other things)

  OperatingSystemNameVersion           : 'Windows 7'
   AgentNameVersion                     : 'Chrome 100.0.4896.75'
-

Although the User-Agent contains Windows NT 10.0 the (correct) answer provided by Yauaa is Windows 7 because this is the reduced User-Agent (all minor versions of Chrome are 0: 100.0.0.0) and the Client hints indicate Windows and 0.1.0.

Which client hints do I really need?

A bit deeper dive into the fields and their “added value” and why to request and keep them (or not).

General considerations:

  • The first request only has the low entropy headers Sec-Ch-Ua, Sec-Ch-Ua-Mobile and Sec-Ch-Ua-Platform. Only later requests can have more headers if requested and allowed.
  • With the reduction/freeze of the useragents several parts are now static and no longer show correct values. This is most obvious are the version of the browser (only the major version), the device brand & name (absent on mobile) and the version of the operating system (i.e. always Windows 10.0, Android 10 or Mac OS X 10_15_7 ). This is where client hints do contain the correct values.
Client hintExampleKeep it?Why
Sec-Ch-Ua" Not A;Brand";v=“99”, “Chromium”;v=“100”, “Google Chrome”;v=“100”YesYou may not have the Sec-Ch-Ua-Full-Version-List with the exact versions.
Sec-Ch-Ua-Full-Version-List" Not A;Brand";v=“99.0.0.0”, “Chromium”;v=“100.0.4896.75”, “Google Chrome”;v=“100.0.4896.75”YesIs the better variant of Sec-Ch-Ua but it may not be present.
Sec-CH-Ua-Full-Version“100.0.4896.75”NoThis field is deprecated in the standard. Also this info is also present in the Sec-Ch-Ua-Full-Version-List.
Sec-Ch-Ua-Mobile?0YesIn the (very rare) case where we cannot determine if it is a phone or tablet this flag determines the end result.
Sec-Ch-Ua-Platform“Windows”YesNeeded in the very common case of bad version info in the useragent.
Sec-Ch-Ua-Platform-Version“0.1.0”YesNeeded in the very common case of bad version info in the useragent. This “0.1.0” means “Windows 7” because the Platform says “Windows”.
Sec-Ch-Ua-Arch“x86”YesThe only way to determine a MacOS system is running an M1/M2 (ARM) instead of an Intel CPU
Sec-Ch-Ua-Bitness“64”YesOften not present in the useragent
Sec-CH-UA-Form-Factors“Mobile”YesNew in the specification, in July 2023 no browsers supported this yet.
Sec-Ch-Ua-Model“Nokia 7.2”YesOften not present in the useragent (brand and device info).
Sec-Ch-Ua-Wow64?0NoThe only thing this says is that this is Windows (use Platform) and that it is 32 bit software running on a 64 bit system.

To simplify it all I would

  • Ask for all of them (regardless if I say No above) because the browsers are currently in flux in what they put in these fields.
  • Persist all headers starting with Sec-Ch-Ua along with the User-Agent. The assumption is that later versions of browsers will change what they put in these fields which may yield possible analysis improvements.

Although the User-Agent contains Windows NT 10.0 the (correct) answer provided by Yauaa is Windows 7 because this is the reduced User-Agent (all minor versions of Chrome are 0: 100.0.0.0) and the Client hints indicate Windows and 0.1.0.

Which client hints do I really need?

A bit deeper dive into the fields and their “added value” and why to request and keep them (or not).

General considerations:

  • The first request only has the low entropy headers Sec-Ch-Ua, Sec-Ch-Ua-Mobile and Sec-Ch-Ua-Platform. Only later requests can have more headers if requested and allowed.
  • With the reduction/freeze of the useragents several parts are now static and no longer show correct values. This is most obvious are the version of the browser (only the major version), the device brand & name (absent on mobile) and the version of the operating system (i.e. always Windows 10.0, Android 10 or Mac OS X 10_15_7 ). This is where client hints do contain the correct values.
Client hintExampleKeep it?Why
Sec-Ch-Ua" Not A;Brand";v=“99”, “Chromium”;v=“100”, “Google Chrome”;v=“100”YesYou may not have the Sec-Ch-Ua-Full-Version-List with the exact versions.
Sec-Ch-Ua-Full-Version-List" Not A;Brand";v=“99.0.0.0”, “Chromium”;v=“100.0.4896.75”, “Google Chrome”;v=“100.0.4896.75”YesIs the better variant of Sec-Ch-Ua but it may not be present.
Sec-CH-Ua-Full-Version“100.0.4896.75”NoThis field is deprecated in the standard. Also this info is also present in the Sec-Ch-Ua-Full-Version-List.
Sec-Ch-Ua-Mobile?0YesIn the (very rare) case where we cannot determine if it is a phone or tablet this flag determines the end result.
Sec-Ch-Ua-Platform“Windows”YesNeeded in the very common case of bad version info in the useragent.
Sec-Ch-Ua-Platform-Version“0.1.0”YesNeeded in the very common case of bad version info in the useragent. This “0.1.0” means “Windows 7” because the Platform says “Windows”.
Sec-Ch-Ua-Arch“x86”YesThe only way to determine a MacOS system is running an M1/M2 (ARM) instead of an Intel CPU
Sec-Ch-Ua-Bitness“64”YesOften not present in the useragent
Sec-CH-UA-Form-Factors“Mobile”YesNew in the specification, in July 2023 no browsers supported this yet.
Sec-Ch-Ua-Model“Nokia 7.2”YesOften not present in the useragent (brand and device info).
Sec-Ch-Ua-Wow64?0NoThe only thing this says is that this is Windows (use Platform) and that it is 32 bit software running on a 64 bit system.

To simplify it all I would

  • Ask for all of them (regardless if I say No above) because the browsers are currently in flux in what they put in these fields.
  • Persist all headers starting with Sec-Ch-Ua along with the User-Agent. The assumption is that later versions of browsers will change what they put in these fields which may yield possible analysis improvements.
\ No newline at end of file + 
\ No newline at end of file diff --git a/using/index.html b/using/index.html index 6db12ff22..0fc3a5e4a 100644 --- a/using/index.html +++ b/using/index.html @@ -3,7 +3,7 @@ Using in Java applications To use the library you must first add it as a dependency to your application. The library has been published to maven central so that should work in almost any environment.">Using the analyzer | Yauaa - Yet Another UserAgent Analyzer -

Using the analyzer

Using the analyzer

To use this analyzer you can use it either directly in your Java based applications or use one of +

Using the analyzer

Using the analyzer

To use this analyzer you can use it either directly in your Java based applications or use one of the User Defined Functions that are available for many of Apache bigdata tools (Hive, Flink, Beam, …) as described here.

Using in Java applications

To use the library you must first add it as a dependency to your application. The library has been published to maven central so that should work in almost any environment.

If you use a maven based project simply add this dependency to your project.

<dependency>
     <groupId>nl.basjes.parse.useragent</groupId>
@@ -114,12 +114,12 @@
     <releases><enabled>false</enabled></releases>
     <snapshots><enabled>true</enabled></snapshots>
   </repository>
-</repositories>
\ No newline at end of file + 
\ No newline at end of file diff --git a/using/kubernetes/index.html b/using/kubernetes/index.html index 5b157785b..3404b6286 100644 --- a/using/kubernetes/index.html +++ b/using/kubernetes/index.html @@ -7,7 +7,7 @@ apiVersion: v1 kind: Namespace metadata: name: yauaa --- apiVersion: apps/v1 kind: Deployment metadata: name: yauaa namespace: yauaa spec: selector: matchLabels: app: yauaa replicas: 3 template: metadata: labels: app: yauaa spec: containers: - name: yauaa image: nielsbasjes/yauaa:7.29.0 ports: - containerPort: 8080 name: yauaa protocol: TCP livenessProbe: httpGet: path: /liveness port: yauaa initialDelaySeconds: 2 periodSeconds: 3 readinessProbe: httpGet: path: /readiness port: yauaa initialDelaySeconds: 10 periodSeconds: 10 --- apiVersion: v1 kind: Service metadata: name: yauaa namespace: yauaa spec: selector: app: yauaa ports: - name: default protocol: TCP port: 80 targetPort: 8080 type: ClusterIP Custom rules in Kubernetes In some cases you’ll have internal systems with custom useragents. You can write your own rules and include them in the deployment.">Kubernetes | Yauaa - Yet Another UserAgent Analyzer -

Kubernetes

I’ve been playing around with Kubernetes and the code below “works on my cluster”.

Basic Service

First create a dedicated namespace and a very basic deployment to run this image 3 times and +

Kubernetes

I’ve been playing around with Kubernetes and the code below “works on my cluster”.

Basic Service

First create a dedicated namespace and a very basic deployment to run this image 3 times and exposes it as a Service that simply does http.

apiVersion: v1
 kind: Namespace
 metadata:
@@ -194,12 +194,12 @@
               service:
                 name: yauaa
                 port:
-                  number: 80
\ No newline at end of file + 
\ No newline at end of file diff --git a/using/license/index.html b/using/license/index.html index 6065a96be..37ee4a743 100644 --- a/using/license/index.html +++ b/using/license/index.html @@ -11,7 +11,7 @@ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION Definitions. License shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document. Licensor shall mean the copyright owner or entity authorized by the copyright owner that is granting the License. Legal Entity shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, control means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.">Licence | Yauaa - Yet Another UserAgent Analyzer -

Licence

Apache License

Version 2.0, January 2004 +

Licence

Apache License

Version 2.0, January 2004 https://www.apache.org/licenses/

TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

Definitions.

License shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.

Licensor shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.

Legal Entity shall mean the union of the acting entity and all @@ -150,12 +150,12 @@ distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and -limitations under the License.

\ No newline at end of file diff --git a/using/memoryusage/index.html b/using/memoryusage/index.html index ccd0a6506..51d30246f 100644 --- a/using/memoryusage/index.html +++ b/using/memoryusage/index.html @@ -3,7 +3,7 @@ Some fields only require a handful of rules where others have a lot of them. This means that it depends on the fields that have been requested how many rules are kept in the system and thus how much memory is used to store the rules in. To get an idea of the relative memory impact of the rules needed for a specific field.">Memory usage | Yauaa - Yet Another UserAgent Analyzer -

Memory usage

The system relies heavily on HashMaps to quickly find the rules that need to be fired.

Some fields only require a handful of rules where others have a lot of them. +

Memory usage

The system relies heavily on HashMaps to quickly find the rules that need to be fired.

Some fields only require a handful of rules where others have a lot of them. This means that it depends on the fields that have been requested how many rules are kept in the system and thus how much memory is used to store the rules in. To get an idea of the relative memory impact of the rules needed for a specific field.

This table was constructed by running all testcases against the engine where we only request 1 field. @@ -11,12 +11,12 @@ The DeviceClass field is always extracted and as such can be seen as the baseline against not having this engine running at all.

Because most rules determine several fields there is a lot of overlap in the rules used. If you keep all rules we see that version 5.6 uses about 37 MiB of memory for all rules -on top of the rules for the DeviceClass (which is always extracted).

Extracting everything will currently have a memory impact (without caching!) of about 114 MiB

FieldRelative Memory usage
DeviceClass (required)90.8 MiB
DeviceName10.0 MiB
DeviceBrand9.1 MiB
DeviceCpu0.7 MiB
DeviceCpuBits0.5 MiB
DeviceFirmwareVersion1.1 MiB
DeviceVersion0.4 MiB
OperatingSystemClass1.2 MiB
OperatingSystemName1.3 MiB
OperatingSystemVersion1.3 MiB
OperatingSystemVersionMajor1.5 MiB
OperatingSystemNameVersion2.0 MiB
OperatingSystemNameVersionMajor2.2 MiB
OperatingSystemVersionBuild0.4 MiB
LayoutEngineClass2.8 MiB
LayoutEngineName2.8 MiB
LayoutEngineVersion2.8 MiB
LayoutEngineVersionMajor3.0 MiB
LayoutEngineNameVersion3.2 MiB
LayoutEngineNameVersionMajor3.4 MiB
LayoutEngineBuild0.6 MiB
AgentClass5.0 MiB
AgentName5.2 MiB
AgentVersion5.1 MiB
AgentVersionMajor5.3 MiB
AgentNameVersion5.7 MiB
AgentNameVersionMajor5.8 MiB
AgentBuild0.5 MiB
AgentLanguage0.4 MiB
AgentLanguageCode0.4 MiB
AgentInformationEmail0.1 MiB
AgentInformationUrl0.1 MiB
AgentSecurity0.2 MiB
AgentUuid0.3 MiB
WebviewAppName1.0 MiB
WebviewAppVersion1.0 MiB
WebviewAppVersionMajor1.0 MiB
WebviewAppNameVersionMajor1.1 MiB
FacebookCarrier0.2 MiB
FacebookDeviceClass0.2 MiB
FacebookDeviceName0.2 MiB
FacebookDeviceVersion0.2 MiB
FacebookFBOP0.2 MiB
FacebookFBSS0.5 MiB
FacebookOperatingSystemName0.5 MiB
FacebookOperatingSystemVersion0.5 MiB
Anonymized0.1 MiB
HackerAttackVector0.1 MiB
HackerToolkit0.1 MiB
KoboAffiliate0.1 MiB
KoboPlatformId0.1 MiB
IECompatibilityVersion0.4 MiB
IECompatibilityVersionMajor0.4 MiB
IECompatibilityNameVersion0.4 MiB
IECompatibilityNameVersionMajor0.4 MiB
Carrier0.2 MiB
GSAInstallationID0.1 MiB
NetworkType0.1 MiB
\ No newline at end of file + 
\ No newline at end of file diff --git a/using/webservlet/index.html b/using/webservlet/index.html index 2013f3e61..05e608c98 100644 --- a/using/webservlet/index.html +++ b/using/webservlet/index.html @@ -15,7 +15,7 @@ nl.basjes.parse.useragent yauaa-webapp 7.29.0 war NOTE that this is a DEMONSTRATION servlet! It is simply the library in a servlet, no optimizations or smart memory settings have been done at all. Docker Starting with version 5.14.1 the webservlet is also published to the central docker registry.">The demonstration webservlet | Yauaa - Yet Another UserAgent Analyzer -

The demonstration webservlet

Part of the distribution is a war file that is a servlet that has a webinterface and +

The demonstration webservlet

Part of the distribution is a war file that is a servlet that has a webinterface and some APIs that allow you to try things out.

This servlet can be downloaded via

<dependency>
   <groupId>nl.basjes.parse.useragent</groupId>
   <artifactId>yauaa-webapp</artifactId>
@@ -199,12 +199,12 @@
       "targetJREVersion": "1.8"
     }
   }
-}
\ No newline at end of file + 
\ No newline at end of file