Research & Development

Two Tales of One Website: How Arbitrage Sites Manipulate Metrics Using Misleading Content Formats

Posted 3 weeks ago

Ad arbitrage is the practice of buying web traffic, then selling ad space on your website for more than you paid to acquire the user. It’s not a new practice, not by any means, but we’ve detected a new strategy used to keep it profitable.

Written and published by
Rocky Moss
Chief Executive Officer

Introduction to Ad Arbitrage

Ad arbitrage is the practice of buying web traffic, then selling ad space on your website for more than you paid to acquire the user. It’s not a new practice, not by any means, but we’ve detected a new strategy used to keep it profitable.

Many doubt that arbitrage is still a lucrative strategy; take this poster on BlackHatWorld (a forum for marketers to share how they game the system to make $$$):

Is Arbitrage Still Happening?

In short: yes they still do it. You need significant volume to make it work, and the tools have changed a little bit from days past, but it still is happening at a staggering scale. In this article, we focus on arbitrage which starts from native ads placements, aka sponsored content boxes.

If you’re not familiar with the term, you’re probably familiar with the look of these placements (highlighted red below):

When buying traffic from native ads providers with the intention of performing ad arbitrage, the key factors you have to optimize are:

  1. CPC (cost-per-click) when acquiring a visitor
    • This goes down as CTR (Click-through rate) increases, because people buying space in these boxes pay on a per-impression basis.
    • There are tons of levers to pull here, including:
      • the geos you target
      • the sites your ads show up on
      • the user’s device make/model
      • the keywords you target
      • your creative / destination article headline
      • much more!
  2. RPM (revenue-per-thousand-impressions), or how much you make from your visitors
    • Someone focused on long-term stability might focus on organic user growth by writing unique & helpful content.
      • The more return users you have, the more predictable your revenues will be
      • Reputable content providers get access to better affiliate deals, and private programmatic marketplace opportunities
    • Someone focused on getting as much money as possible from a user who arrives on their site might:
      • Add more ad units, increasing the clutter of the page
        • Particularly, adding a bunch of placements above the fold guarantees you will have coveted viewable placements that you can charge for.
      • Paginate the content so that users are always triggering new page loads, and digital ad auctions. Paginated content is also known as “Slideshow” content.
      • Put ad placements on an aggressive refresh schedule.

In this article we seek to educate the reader on the lengths these publishers go to milk readers for revenue, but we also want to answer a question that’s been tumbling around in our heads while writing: is the activity we are profiling even of concern to advertisers?

Anyone who claims that content quality correlates with campaign performance will be skeptical of arbitrage sites, and has probably already added such sites to their domain block list. However, there are many marketers who still prefer to buy inventory on these sites. After all, the arbitrage sites won’t make money if their users are found to be fakes; they specialize in bringing real humans to their sites with eye-catching headlines & common-denominator content.

The real question is: do advertisers get a good value for their ad-dollars when they buy space on these sites? We’d love to hear your answer to this after reading.

A Layman’s Approach to Identifying Arbitrage Sites

Arbitrage practitioners tend to choose some niche, or vertical, and design their site around that. It’s important to note that the content on these sites is often non-unique, and shared between any number of other sites. Because of this, the content rarely lines up with what you’d expect from the name of the site.

Take the site parentingfactor[.]com for instance:

This is the landing page, and you can see that they are trying to cultivate the image of being a source for family / parenting content. However, the links that are promoted do not necessarily meet that mold.

The top promoted link for this site that we detected in June led users to the following page:

There are quite a few alarming things about this image, but before we deep dive into that, lets reflect on the content a bit.

We grabbed a couple random lines of text from the article above, and plugged them into Google:

It seems affluenttimes[.]com has an article sharing the same exact text.

It seems magellantimes[.]com ALSO has an article sharing the same exact text.

Lets take a look at the design for those sites:

As we can see, the sites header has the exact same font. It may be too small for you to see, but the articles even share the same author: David Rule.

Confirming what we can tell for ourselves based on the design & authorship, the footer of these sites shows they all share the same corporate owner.

Arbitrage sites tend to be created & managed this way; the owners either diversify their portfolio by creating site templates for a variety of verticals (which really share a lot of content between each other), or they specialize in a certain vertical that gets a lot of attention, like: pets, fitness, finance, pop-culture, etc…

How Does One End Up Visiting An Arbitrage Site?

Unless you just woke up with a burning desire to find out why Lily from the AT&T ads is causing a stir, chances are you clicked an ad in a sponsored content box, or on a social media platform.

It used to be that arbitrage experts would accomplish the whole feat using Google products. In the before times, one could cost-effectively buy traffic using the AdWords PPC advertising platform, and direct users to your blog monetized with AdSense. These days, not so much. The GoogleAds destination requirements state that “Destination content that is designed for the primary purpose of showing ads” is forbidden.

ibmjango, another user of BlackHatWorld describes the process fairly succinctly:

There are some key statements here that are relevant to the rest of the article that we want to reiterate:

“most of people are buying native traffic for adsense website[s,] as it is profitable process. you can use networks like taboola, outbrain, even bing ads if you know how to buy cheap traffic on these networks”

This will be confirmed later based on our own research, and insights from Similarweb.

“you cant make profit with adsense only, you need to put more ad network ads[;] just visit any news site in your country, you will see how many sidebar or content ads they are posting along with adsense”

The sites we are exploring today are being monetized by many different platforms/networks. This isn’t a small-potatoes blog site; we’re talking billions of avails per-day across hundreds of well-ranked arbitrage sites.

“slide show content generate more revenue for arbitrage so if you are really going for this, create any engaging slide show content, then run FB traffic, thats how people still do it.”

Having a content format that exploits every available inch of space for advertising is key to bringing up revenue-per-page. This strategy is tried and true, but off-putting to SSPs (Supply-Side Platforms; publishers apply to these platforms to get access to programmatic demand).

It’s common for these sites to have articles on the front page that have a single-page format, and few ad placements. This gives the illusion of quality, while the most clicked links are actually stuffed to the gills with ad units that refresh aggressively (more on this later).

Let’s test these claims with the example family of sites we have identified: parentingfactor[.]com, magellantimes[.]com, and affluenttimes[.]com; there’s a couple ways we can go about determining how users are ending up there.

Using data from our crawlers, we checked for any sites that loaded a link to these 3 domains, and checked if those links were from paid sponsored content placements belonging to Revcontent, Taboola, or Outbrain:

We can see that there are hundreds of inbound links for each site that we’ve discovered in just the past month, but almost 100% of them are from sponsored content boxes. That is to say, they have no real reputational authority.

Another way to go about this would be to analyze the referral values for all the users visiting these sites.

Analyzing Inbound Traffic Channels

Since we don’t base our insights on user-generated data at DeepSee, we sometimes turn to Similarweb for second opinions on these matters.

Echoing our own analysis, Similarweb shows the vast majority of parentingfactor[.]com’s traffic comes from display ads. Let’s see how it looks for the other two as well:

Further solidifying the point that ibmjango of BlackHatWorld made, we can see that all 3 sites rely heavily on paid display, and social traffic. Similarweb doesn’t separate out paid social from organic in this chart, but judging by the extremely high rate of paid display, it’s not a far leap to assume that the social is paid as well.

This is extremely relevant, because our research shows that many arbitrage sites behave VERY differently when accessed via paid link vs directly.

Two Tales of One Website

The same site visited directly vs visited by paid link
The same site visited directly vs visited by paid link

Now we get to the core of the “Misleading Content Formats” issue that we warn about in the title. Believe it or not, the image above shows the same page on the same site; the image on the left is what you’d see if you visited directly, and the one on the right is how it looks when visiting via paid link.

On the left we see zero ads above the fold, on the right, there are 6 display ads plainly visible, with an out-stream video placement out of view of my screenshot. It’s not clearly visible in the image, but the paid page is actually a slideshow (compared to the direct link, which is single-page format).

Let’s recall something we mentioned in the intro:

  • Someone focused on getting as much money from a user as soon as they arrive on their site might:
    • Add more ad units, increasing the clutter of the page 
      • Particularly, adding a bunch of placements above the fold guarantees you will have coveted viewable placements that you can charge for.
    • Paginate the content so that users are always triggering new page loads, and digital ad auctions. Paginated content is also known as “Slideshow” content. 
    • Put ad placements on an aggressive refresh schedule.

We know the paid version of the site has way more ad units, and paginated content, but it’s not clear from the image if there is an aggressive refresh schedule. We can clear that up with the following video, which shows the different ways this site presents content:

To summarize:

  • We saw ~30 second refresh timing on the slideshow ad placements.
    • Not particularly aggressive, but still, refresh on slideshows seems entirely unnecessary
      • Slideshow readers are expected to click to a page every 30 secs to 1 minute, and adding refresh on top of that is just another way to milk ~10 ad loads out of a user in between slide clicks.
    • This isn’t captured in the video, but out-of-view placements did not refresh on the single-page style version of the site.
  • The single page format had 1-2 ads per content block, while the slideshow format has 10+ placements per content block

This video shows a single site, but the pattern is typical of SO many more sites.

Why Show Two Formats? Why Not Just Embrace the Slideshow?

This is a tough question to answer, and we’re going to venture into the realm of educated guesswork here while trying to answer.

There are certainly many sites which are upfront about their slideshow content, and manage to monetize it to some extent. Why hide the format that your content predominantly appears in?

Think back to what our spirit guide, ibmjango, says:

“you cant make profit with adsense only, you need to put more ad network ads”

As we mentioned earlier, slideshow content can be off-putting to SSPs, because their reputation is staked on the quality of their publisher network. Having content that makes SSPs uneasy limits the amount of demand you can get access to. For modern arbitrage operations to work, they need access to multiple SSPs / ad-networks, and perhaps this is the reason we find so many sites with a disconnect between direct & paid visits.

As part of the onboarding process at each SSP, there is likely to be a human reviewing the sites of each publisher who applies. While each SSP has different publisher requirements, it hardly seems a stretch to imagine that reputable SSPs don’t want sites with slideshows that are 80% ads / 20% content.

So, in order to protect themselves, the arbitrage specialists design sites in such a way that advertising analysts who click around their home page wouldn’t find anything objectionable.

Once they do make it into a reputable ad-network, that’s when the problems begin. We spoke off the record to someone responsible for ensuring publisher quality at a major SSP, and they told us “that’s the problem we have; it’s not straight fraud, and there’s a huge demand for this inventory.”

This is confirmed by a contact of ours on the demand side, who described the inventory as “crack for advertisers” due to the high viewability, and the high availability of inventory (due to the inflated placement count per-page).

Thus back to our initial question: do advertisers get a good value for their ad-dollars when they buy space on these sites?

One might venture a guess to say these ones did not:

Scaling Our Insights With Data

The following research is based on analysis of 111 domains with Misleading Content Formats we discovered during the last week in June (available here). To arrive at this sample, we visited thousands of sites who had high representation in sponsored content boxes, and compared how they performed based on if the visit was direct vs paid. This comparison included a lot of manual verification while we built the capability for automated detection.

While many sites buy most of their visitors from sponsored content boxes, it is unique for a site to completely transform their layout, and this is the activity we highlight in this study. That’s how we ultimately arrived at this sample; these sites, of all the sites buying visitors for arbitrage purposes, showed anomalous behavior when visited by direct visit vs sponsored content box.

You may notice the list of publishers we provide is over 111 items; this is because we discovered additional sites in the same publishing groups as Misleading Content Format sites by carefully analyzing ads.txt & sellers.json records. In order to give you maximum protection, we proactively flag these sites.

For the purpose of this article, we compared three things:

  1. The activity level of the page; this is approximated by the count of document, script, and XHR requests intercepted.
  2. Ad units above the fold
  3. Ad units on page

Many thanks to data scientist Edward Krueger, who assisted us with visualizations and statistical analysis for this research.

Page Activity Levels – Direct Link vs Paid

This chart shows how much more busy these sites are when accessed from the sponsored content box vs loaded directly.

For a one-tailed (greater) paired t-test (N = 111) the difference between the number of events paid (M = 472.90, SD = 343.86) and number of events direct (M = 228.39, SD = 134.89) was found to be statistically significant (p < 0.001).

Simple but effective, the takeaway is pretty clear from this bar chart: we can see that paid visits are over twice as active on average!

Ads Units Above The Fold – Direct Link vs Paid

Ad units that are “above the fold” are visible right when you land on a page. These are coveted, because they tend to be marked viewable by the measurement tech most advertisers employ, and this gives advertisers more confidence the ads were seen.

For a one-tailed (greater) paired t-test (N = 111) the difference between the the average number of ads above the fold paid (M = 3.62, SD = 2.66) and the average number of ads above the fold direct (M = 0.66, SD = 1.04) was found to be statistically significant (p < 0.001).

Right away, we can see there is a stark difference in the median number of ad placements above the fold here. Remember, we are talking about the same exact sites & pages here; the only difference is how the user arrives.

This violin plot is another way to visualize the typical number of ads above the fold for direct vs paid visitors. The fat bottom of the “Direct” plot shows that many sites in this group never display ads above the fold. It’s clear that much of the “Paid” figure’s area lies at 2 or above, with a big bulge around 4 ad units. These are completely different shapes.

Ads Units on Page – Direct Link vs Paid

Similar to the last chart, this one shows the difference in the total number of ad units encountered per-page.

For a one-tailed (greater) paired t-test (N = 111) the difference between the average number of ads document paid (M = 6.54, SD = 3.41) and the average number of ads document direct (M = 4.27, SD = 4.74) was found to be statistically significant (p < 0.001).

The maximum figure here is especially interesting, because it shows the difference in limitations between single page and slideshow design formats. The single page format can go on forever hypothetically, and so the upward bounds of ad units per page are higher. On a slide, where most of the content is above the fold, there’s only so many ad units you can cram into the page before it fills up.

While the differences visible in the violin chart here are not quite as stark as the same visualization for “above the fold” ad units, there is a clear takeaway: the paid visits result in many more ad units loading on the page. The higher maximum value in the “Direct” plot shows that there are exceptions to the previous statement, but they appear as outliers.

Conclusion

Publishers with Misleading Content Formats straddle the line between valid & invalid activity. Unlike GoogleAds, sending visitors to “[d]estination content that is designed for the primary purpose of showing ads” is not forbidden by the tech companies providing these sponsored content boxes.

In this situation, any violations that exist likely occur between the publisher and the SSPs / ad-networks they are a part of. Such organizations may have specific publisher content requirements that forbid the behaviors displayed by the hidden, hyper-active, versions of these sites.

Now, having made it to the end of the article, do you think advertisers get a good value for their ad-dollars when they buy space on these sites?

Is there something we’re missing here? Feel we are off base? Please keep the conversation going, and drop us a line on twitter @deepsee_io, or on Linkedin!

Ensure your ads are funding publishers that provide great experiences.