Research & Development

Evaluating the Ecosystem: What We’ve Learned by Matching Ads.txt Entries to Sellers.json Files

Posted 2 years ago   |   5 min read
Written and published by
Rocky Moss
Chief Executive Officer

Enjoying the article?

We protect against this kind of threat and many more.

Reach out if you are interested in improving your campaign outcomes.

An understanding of Ads.txt and Sellers.json is key to understanding this article. For those needing a refresher:

  • Ads.txt is an industry-wide initiative, championed by the IAB, which aims to reduce the prevalence of domain spoofing. In order to combat the risk of domain spoofing, publishers host a file at the “{domain}/ads.txt” directory, and this file lets buyers known exactly what combinations of ad system & publisher ID may announce their inventory for sale.
  • Sellers.json is an IAB initiative aimed at bringing even greater transparency to each advertising opportunity. It allows buyers to map publisher/seller IDs to the human readable name of the seller making inventory available during the course of an auction.

The data underlying our analysis

Much of this analysis will be based on these publicly available data sources.

We attempt to collect ads.txt files from every website we visit, and we visited over 16 million unique subdomains (associated with 10 million unique root domains) during the term of the study. Each day we also fetch sellers.json files for every ad system encountered in our ads.txt database.

The analysis ran over a -45 day window from November 16th 2020. We found over 1 million publishers with valid ads.txt files.

From this data, we generated 2 ads.txt datasets which reflect slightly different views of the advertising ecosystem:

  • The Ads.txt Corpus: This is a complete dump of over 140 million rows of ads.txt data gathered from over a million sites. If an ads.txt entry appears on multiple sites, then it appears multiple times within this dataset. It is meant to be reflective of the ecosystem at large, because it accounts for how frequently a certain ID is encountered in the wild.
  • Unique Ads.txt Entries: This dataset contains nearly 700,000 unique combinations of: adsystem, publisherID, and account type. Each one only appears once, no matter how many times it appears within the corpus.

The Findings


What % Ads.txt Publisher IDs Can be Matched to a Sellers.json File?

Comparison of how many entries can be linked to a sellers.json file

When we ran this analysis on Unique IDs ~3 months ago, we found that ~40% of uniques could be matched to a sellers.json file. At that time, we excluded the adsystem from the analysis, because it had so few direct entries. At this time, that is no longer necessary; there is decent coverage of direct entries in the google sellers.json file

Currently, we find that ~60% of entries (depending on how you look at it) can be matched to sellers.json files. The exception being if you look at unique reseller entries; only 40% of those are matchable to a sellers.json file.

This shows an increase in coverage in the ecosystem overall! More of the IDs you will encounter are verifiable than ever before.


Account Type Mismatch

Both Ads.txt and Sellers.json give us information to determine the nature of the sellers relationship with the publisher.

  • A DIRECT account type in an ads.txt file signals that the Publisher (content owner) directly controls the account indicated in [the publisherID] field on the system in field #1. This tends to mean a direct business contract between the Publisher and the advertising system.”
  • In Sellers.json files, “a value of “PUBLISHER” indicates that the inventory sold through this account is on a site, app, or other medium owned by the named entity and the advertising system pays them directly.”

From these descriptions, it becomes clear that DIRECT entries in Ads.txt files should match to Sellers.json entries with seller types of PUBLISHER, or BOTH

  • A RESELLER account type in an ads.txt file signals that the Publisher has authorized another entity to control the account indicated in [the publisherID] field and resell their ad space via the system in field #1.”
  • In sellers.json files, a value of “INTERMEDIARY” indicates that the inventory sold through this account is not owned by the [publisher] or the advertising system does not pay them directly”

From these descriptions, it becomes clear that RESELLER entries in Ads.txt files should match to Sellers.json entries with seller types of INTERMEDIARY, or BOTH

An example of an account type mismatch is demonstrated by the following image:

In This example, declares that they directly control this account on the adsystem, but Rubicon Project declares that, actually, this account ID “is not owned by the [publisher] or the advertising system does not pay them directly.”

Why would someone do this? Perhaps it’s because buyers prefer DIRECT inventory, and shorter supply paths. If a buyer doesn’t check the Sellers.json declaration, they may not detect that the DIRECT inventory they are bidding on is not actually publisher direct.

Such a case would be labeled an “account type mismatch” in our analysis.

We define the mismatch rate as: (Entries where the account types do not match)/ (Entries which can be matched to the sellers.json file by adsystem/publisherID)

We see results that vastly differ between the full corpus, and the unique IDs dataset.

For IDs which are declared as DIRECT in ads.txt files, the rate of mismatch in the entire corpus is 45%. This is many times higher than the 2.5% rate of DIRECT id mismatch in the Unique IDs dataset.

This suggests that those small % of mismatched Unique IDs appear on a conspicuously high # of publishers ads.txt files.

The following chart shows the top 20 ad systems that support sellers.json, ranked in order of how many DIRECT ads.txt entries they have in the corpus dataset.

It shows the rate of DIRECT ID mismatch based on both the corpus, and the unique IDs dataset.

A large gap between red and orange points suggests that certain mismatched IDs on this platform are used across multiple publishers, and repeated many times across the corpus.

You can find the underlying data for the top 50 adsystems in the set here

The Most Common Mismatched IDs

The differences between the Corpus and the Unique IDs datasets can be explained by IDs which appear in the ads.txt files of thousands of publishers, and the above image shows some examples.

The top 2500 Ads.txt Pub IDs which show a mismatch with the Sellers.json information can be found here for your further analysis.

Final Words

There are many unknowns when it comes to programmatic advertising, but these public initiatives provide a rare glimpse into the way that things are supposed to work.

Do they provide ground truth data? Not necessarily – both sellers & publishers can provide untrue information in Sellers.json & Ads.txt files respectively.

They DO, however, allow analysts to point out inconsistencies / risk signals based on adherence to standards & practices.

If you’re a buyer, do you check the account types between Ads.txt and Sellers.json for agreement? Does this mismatch have a different meaning to you? Is it harmless?

We’re open for discussion, and very interested in learning how you apply the data from ads.txt and sellers.json to your bid decisioning process.

Find us @deepsee_io on twitter, or on linkedin to continue the conversation!

Ad fraud is serious business.

Let us help you understand the threat.

Additional articles you may enjoy.

Research & Development
September 3, 2020