The term Dark Pool Sales House was introduced to the ad-tech lexicon by privacy & security analyst Zach Edwards (also the founder of the boutique analytics company Victory Medium), in his July 2020 piece: “Breitbart.com is Partnering with RT.com & Other Sites via Mislabeled Advertising Inventory.” Edwards collaborated with Nandini Jammi and Claire Atkin of Check My Ads for further research on the topic, which was released as part of their Substack newsletter, “BRANDED,” in the issue titled “So *that’s* how Breitbart is still making money.”
According to their research: “Any shared DIRECT account-bidding IDs across uniquely owned websites creates a Dark Pool Sales House that spreads money and data across all the websites participating in the architecture.”
Basically, these are resellers masquerading as direct inventory.
This brings us to the research we are sharing today. We wanted to capture the extent of this problem, and find out how common it is for sites to have DIRECT ads.txt entries that appear in multiple other publishers’ ads.txt files.
Constructing & Testing a Hypothesis
As Edwards points out in his article: “Mislabeled advertising inventory is very similar to mislabeled securities. If a buyer doesn’t know what they are buying, that’s a buyer risk. But if the buyer is being told that an asset is of an asset category of greater value than the asset they are actually acquiring, there is an important line being crossed when the seller is purposefully mislabeling that asset, and coordinating with multiple other businesses to mislabel the assets.”
We can break this down into 2 separate, but important, points:
- Advertisers prefer direct inventory paths, so publishers want to offer those
- Sellers can purposely mislabel assets in their sellers.json files to make it easier to pass along to buyers, including ones who do everything the IAB recommends when it comes to cross-referencing SupplyChain Objects with ads.txt & sellers.json files.
Going into this study, we thought that this activity would correlate with sites who have a harder time monetizing (due to many advertisers including their domain names in block lists).
To simulate those conditions, we started with a list of 30 sites flagged as not adhering to basic journalistic standards by NewsGuard, a reviewer of online news sources. Upon review, the list of sites seemed to skew right on the political spectrum (though it wasn’t 100% that way).
(The full list is available in the closing statements of this article)
Indeed, we saw that each domain had thousands of other domains who shared 1+ DIRECT entry. The above graph shows thousands of domains (red nodes) which shared IDs with multiple entries in our NewsGuard set (blue nodes).
Since the NewsGuard set of domains skewed right, and in the interest of balancing our research, we set off to make a left-leaning set to compare with.
Media Bias Fact Check provides lists of sites by their politics, and so we constructed our left-leaning set based on which sites had ads.txt & which seemed to be active (content not too old). This set also included some sites which passed the standards of journalistic integrity set by NewsGuard.
(This list is also available in the closing statements of this article)
These sites were ALSO highly interconnected, even though many are mainstays of programmatic advertising campaigns.
Despite having more “PC” content, 15/20 (75%) of these domains had 10,000+ publishers linked to them by at least 1 DIRECT ads.txt entry. For the NewsGuard set, 24/30 (80%) met the same criteria (not a huge difference, though the sample is rather small).
Next, we looked all all publishers in our dataset with ads.txt files, and identified their DIRECT entries.
We calculated for each entry (1 row in an ads.txt file) how many other publishers’ file this entry can be found in. If it appears in 3 or more publishers’ files, it’s counted as a non-unique DIRECT entry
Globally, 10% of ads.txt enabled sites have 71 or more non-unique DIRECT entries.
“That’s pretty crazy,” we thought.
“Maybe this is an artifact of poorly ranked sites dragging us down(?)” we mused.
But, the data showed quite the opposite.
We found that the median publisher in the Tranco top 10,000 has 37+ non-unique direct ads.txt entries.
This begs the questions: “Is anybody even paying attention to the account type labels in ads.txt files??”
As we’ve pointed out in previous research, “for IDs which are declared as DIRECT in ads.txt files, the rate of mismatch in the entire [ads.txt] corpus is 45%,” yet we haven’t read an article yet suggesting that close to that many supply paths are closed.
Like Edwards in his original article, we worry that this weak application of transparency initiatives could lead us backwards on the path to a clean advertising ecosystem.
As it stands now, many of the publisher/seller relationships labeled as DIRECT in ads.txt files are more akin to reseller relationships.
For the benefit of anyone reading, we’re making the site lists used to conduct this research, and the top 15,000 most non-unique DIRECT ads.txt entries encountered globally available here.
There are many unknowns when it comes to programmatic advertising, but ads.txt & sellers.json provide a rare glimpse into the way that things are supposed to work (ideally).
Do they provide ground truth data? Not necessarily – both sellers & publishers can provide untrue information in Sellers.json & Ads.txt files respectively.
They DO, however, allow analysts to point out inconsistencies / risk signals based on adherence to standards & practices.
If you’re a buyer, do you check the account types between Ads.txt and Sellers.json for agreement? Does this mismatch have a different meaning to you? Is it harmless? How about the uniqueness of a DIRECT ads.txt entry; does that bear any weight when conducting supply path optimization?
We’re open for discussion, and very interested in learning how you apply the data from ads.txt and sellers.json to your bid decisioning process.