Research & Development

Ads.txt by the Numbers

Posted 5 months ago
Ads.txt is an industry-wide initiative, championed by the Interactive Advertising Bureau (commonly known as the IAB), which aims to reduce the prevalence of domain spoofing....
Written and published by
Rocky Moss
Chief Executive Officer

Ads.txt is an industry-wide initiative, championed by the Interactive Advertising Bureau (commonly known as the IAB), which aims to reduce the prevalence of domain spoofing.

Domain spoofing is a type of fraud where an unauthorized entity claims they have space for sale on a particular site, but actually the ad is rendered on a different site (or not at all!). This takes advantage of the technical limitations of impression measurement, particularly that it’s not always entirely possible to verify what page an ad is loaded on.

In order to combat this risk, publishers host a file at the “{domain}/ads.txt” directory, and this file lets buyers known exactly what combinations of ad system & publisher ID may announce their inventory for sale.

What Does an Ads.txt File Look Like?

In order to understand the schema for the file, lets look at an example file obtained from cnn.com/ads.txt today:

Each row of the file represents a single relationship with a seller, with a few comma separated fields giving us insight into that relationship. There are 3 mandatory fields, and 1 option field. Those are provided below, with the definition of that field from the official spec:

  1. Domain name of the ad system (REQUIRED): “The canonical domain name of the SSP, Exchange, Header Wrapper, etc system that bidders connect to. This may be the operational domain of the system, if that is different than the parent corporate domain, to facilitate WHOIS and reverse IP lookups to establish clear ownership of the delegate system.”
  2. Publisher’s Account ID (REQUIRED): “The identifier associated with the seller or reseller account within the advertising system in field #1. This must contain the same value used in transactions (i.e. OpenRTB bid requests) in the field specified by the SSP/exchange. Typically, in OpenRTB, this is publisher.id. For OpenDirect it is typically the publisher’s organization ID.”
  3. Type of Account/ Relationship (REQUIRED): “An enumeration of the type of account. A value of ‘DIRECT’ indicates that the Publisher (content owner) directly controls the account indicated in field #2 on the system in field #1. This tends to mean a direct business contract between the Publisher and the advertising system. A value of ‘RESELLER’ indicates that the Publisher has authorized another entity to control the account indicated in field #2 and resell their ad space via the system in field #1”
  4. Certification Authority ID (OPTIONAL): “An ID that uniquely identifies the advertising system within a certification authority (this ID maps to the entity listed in field #1). A current certification authority is the Trustworthy Accountability Group (aka TAG), and the TAGID would be included here”

Opportunities are only considered valid if both the ad system & the publisher ID are in the relevant publisher’s ads.txt file.

 

Ads.txt Statistics

All the charts & data are based off DeepSee’s analysis of over 20 million subdomains, and use a date range of -90 days from August 15th 2020.

Adoption Rate vs Tranco Rank

DeepSee provides site rank statistics in the form of the Tranco rank, which is a “research-oriented top sites ranking hardened against manipulation.”

We grouped sites by their ranks to better understand if there are differences in adoption for high vs low ranking sites.

We found that about 1/4th of the top 10,000 most visited sites host ads.txt files. Even sites in the long tail have ads.txt files at a pretty high rate (compared with the top ranked sites).

As more and more buyers & SSPs require ads.txt files to be hosted as part of normal business, it’s likely we will see higher adoption regardless of rank.

 

Number of unique files identified vs the number of sites hosting them

What this shows us is that there are fewer files than HOSTS of files, and that some are being shared between publishers.

This is helpful in identifying entities which share payment channels, but may not reveal this association for whatever reason. This is the core of our “Shared Ads Txt File” similarity type identified in our e-book.

 

Unique publisher ids observed, broken down by RESELLER vs DIRECT

We have identified over 800k unique publisher IDs, with the vast majority of them being direct. This makes sense, because:

  1. Publishers are encouraged to have DIRECT connections over RESELLER connections.
  2. Direct relationships represent an individual relationship with an adsytem, while reseller IDs represent more general pools of inventory. Reseller IDs basically represent an agreement a seller has with another seller, and there are many fewer such entities than there are individual publishers.

Ad System Reach Statistics

We counted 2,761 unique ad systems contained in ads.txt files as part of our analysis. The majority of those appeared on less than 100 root domains. Many of these “ad systems” are likely the result of poor data quality / human error.

There are many near-misspellings of well-known ad systems, or entries which could not possibly be an ad system. The analysis is agnostic with regard to exactly why something appears as an ad system.

On the other side of the equation, we found there are fewer than 500 ad systems which can boast access to over 1,000 root domains.

We delve into some of the most popular ad systems in the next chart.

Top 25 Ad Systems by Reach (calculated by # of unique root domains where the ad system is encountered)

This chart shows the top 25 ad systems in terms of how many unique root domains where they appear in ads.txt files.

Unsurprisingly, Google has a huge number of publishers in its network. However, they are not the only ones with a grand scale.

There are 18 other ad systems who can boast access to over 100,000 root domains across the web.

Leave a Reply

Your email address will not be published. Required fields are marked *

We monitor over
20 million websites