Excel formula to distribute proportionally (not provided) attribution

We all know by now (not provided) is a pain in the ass. Not only because part of the picture has gone and you have to work harder to find the missed pieces of the puzzle, also because while putting SEO results in charts and figures you have to deal with that or your reports and dashboards are going to be harder to understand for your clients.

Traditionally we segment organic in two types, Branded and non-Branded traffic for obvious reasons but we have now a new type around messing it up a bit, (not provided) organic traffic. Thanks Google.

The next chart is an example of that mess. This is revenue from organic traffic segmented by the three types since the nightmare started.

branded, non-branded, (and not provided) organic traffic

In many accounts I've checked, visits from (not provided) go from 9% to 13% although the proportion could be higher in certain cases or segments of your traffic. Two explicit tweets from yesterday on that note.

 

How to represent (not provided)?

Basically we have two options:
Leave it as it comes and represent it as a third bucket for visits, conversion, revenue or any other metric you might want to include

  • Disadvantages: rebuild tons of segments in your analytics tool and having to explain why more frequently that you would like
  • Advantage: the nature of the (not provided) remains uncertain but the rest of the numbers for Branded and non-Branded are accurate

Break it down proportionally and integrate those numbers within the Branded and non-Branded ones

  • Disadvantages: the higher the proportion of (not provided) compared to total organic traffic, the higher the inaccuracy for all segments knowing in advance that splitting (not provided) proportionally is an inaccurate solution per se
  • Advantage: K.I.S.S. principle

I am inclined to use the last solution but a second opinion is always welcome so I tweeted.

 

Excel formula to distribute (not provided)

With the confidence inherited from Matthew's and some of my workmates (@jackson_lo and @adrianvender) words, next step was simple, put excel to work and create a formula* solving the issue. Here you have it:

Excel formula to distribute proportionally (not provided) attribution
Pin It

Once the formula is applied to initial numbers of the chart above this the resulting new one. Total organic traffic is the same, as expected, but we get rid of anoying (not provided) integrating it within the two typical organic traffic types.

branded and non-branded organic traffic after compensate (not provided)

Excel spreadsheet to download with formula to distribute proportionally (not provided) attribution.

I guess we can live with the disadvantages of this method right? At least me.

*This was my first attempt to write something more complicated than 'a+b=c' in the shape of a formula since I left high school so, math and physics nerds in the house tonight, don't laugh so hard at me, ok?

Update. My apreciation to the comments  from john Andrews you see below (#3 & #4) triggered a conversation in Twitter (worst place to discuss whatever but anyway) interesting enough, in my opinion, to reporduce it here.

@anilopez
I really appreciate your comment @johnandrews "trust value based on the relative size of the np set" is a great idea.

@JohnAndrews
It's nothing more than error or variance... standard for any model/stat. NEVER accept anything without variance numbers.

There def is standard in stats/data. Any measure has a variance. Any combination of measures has a variance.

"robustness" is a measure of ability to handle real world data, as opposed to test data or ideal data. Sensitivity is also stnd.

Can always review variance, sensitivity, and robustness with any proposed derivation/algo. Great way to "see" utility.

That's why import to state assumptions... that limits the application to real data; allows us to evaluate utility.

From there you can plot diagram of when it's useful, when it's not, and when it might be useful. Great stuff.

@MatthewJBrown
I had been adjusting for that based on what the split between branded and non-branded looked like prior to NP.

What sucks is new sites that come in with a high NP right off the bat. Lots of guessing and modeling at that point.

@JohnAndrews
Prob need to model the NP user. She's logged into Google. Assume no reason to be diff w/r to B/NB, or some reason.

Key is the assumptions made... that NP mirrors rest, or not, and why.

Otherwise drop the NP from analysis, and then normalize results to recover lost volume. Same outcome as Ani method.

@MatthewJBrown
Yeah, run the numbers based on a few different splits and compare the difference.

Mar 09, 2012
Written by:
Filed under: SEO






16 comments
Mar 09, 2012
Posted by:
Natzir #1

I've already said it on twitter but I'll say you twice, you're a genius Ani. Thanks a lot :)

Mar 09, 2012
Posted by:
Ani Lopez #2

Thanks Natzir for your kind comments but I'm really far far away to be something close to genius.

Most of the ideas for my posts come from conversations around SEO with collages, coworkers and such, nothing from thin air.

Mar 09, 2012
Posted by:
john andrews #3

I like to be explicit with assumptions behind algorithms like this. In this case, you've assumed that not-provided traffic is split between branded and non branded, the same way the overall traffic is split.

So that is the underlying assumption that must be true in any specific case where someone wants to use this algorithm.

It is also what you want to disprove if you can, using other available data. If you see any evidence that the np traffic may be skewed towards brand or non brand kws, you should NOT trust this approximation.

Mar 09, 2012
Posted by:
john andrews #4

the next step might be assigning a trust value based on the relative size of the np set, and the balance of b and nb sets. As easily seen, this algo is best when b is close in size to nb, and np is small compared to those.

Mar 09, 2012
Posted by:
Ani Lopez #5

Señor Andrews, thanks for coming and comment

You are 100% Right.
We have no idea how (not provided) traffic may be skewed towards branded or non-branded one so at that point I'm embracing some inaccuracy here in order to represent things easier.

When is it somehow safe to use this approximation?
- When not-provided traffic is not a high % of total organic traffic
- When there is no evidence of not-provided traffic skewing towards branded or non-branded


I tried to explain that in the post but I see now it was not clear enough so thanks a lot John for pointing that out.

Mar 10, 2012
Posted by:
Paula #6

Me parece una manera muy buena y sencilla para aplicar, aunque otra opción sería hacer promedios (coger las cuatro o cinco métricas clave y aplicar un reparto proporcional ) y luego aplicar el promedio. He escrito sobre ello , basándome en tu post http://www.webanalyticsdata.com/2012/03/estimando-las-keywords-not-provided.html pero sigue siendo una estimación muy estimación :(

Mar 10, 2012
Posted by:
Kaushalam #7

Around 33% of organic search queries are resulted into not provided list and this illustration will help to figure out how much total branded search queries tracked.

Mar 11, 2012
Posted by:
Gus #8

Thanks for sharing Ani!

It actually makes sense to assign a proportionate value of the np to each of the segments. I do agree that it's safe to use when the % of np is relatively small (and stable).

Following John Andrews comments, which would be a good indication of bias towards branded-non branded? Would it be possible by analysing historical data (previous to the np release) of branded/non branded?

Thanks again for this superb post!!

Gus

Mar 11, 2012
Posted by:
Andrea Moro #9

Although I'm a real fan of radical problem solving and Excel formulas to solve my daily pains, and despite I like your approach, this won't never solve our biggest issue.

The percentage of the NP keys are likely to increment up to 50% and more in the next 6 months. Therefore, not accounting them properly for what they are may end up to result a mess for the ROI calculation with quite a few discrepancies in the numbers and the allocated budget.

Mar 11, 2012
Posted by:
Ani Lopez #10

@Gus which would be a good indication of bias towards branded-non branded? Compare some metrics at the same time.
Let's say conversion rates are:
· nB 7%
· B 16%
· nP 14%
This indicated (somehow, just an example) that the nature of nP seems more close to Banded than non-Branded despite what the proportion of the formula says if you only take into account visits.

@Andrea, I agree, check my comment #5. When not-provided traffic is a high % of total organic traffic this formula does not make much sense.

Seems nP is increasing but I have not seen yet more than 10% - 12% in average although some few cases around 15% of total organic traffic.

Mar 23, 2012
Posted by:
Shafie #11

This is one of the most successful solutions to solve the problem of "not provided"
Ani u r the "not provided" crack :-)

Thanks.

Mar 23, 2012
Posted by:
Ani Lopez #12

Thanks Shafie for your kind words but it is just a band-aid that only works for some cases, "viewer discretion is advised" :)

Oct 03, 2012
Posted by:
Francesco #13

I totally agree with the first comment (Natzir), u r a Genius :)

Oct 03, 2012
Posted by:
Ani Lopez #14

Ciao Francesco
Although I appreciate your kind words, I'm really far from being a genius or such, I mean, seriously.

Aug 31, 2013
Posted by:
ALEJANDRO #15

Hello

I've found very practical your article. After I've read another one where it's showed how to measure NP in another diferent way and pretty much simple

I wanted to ask you if you knew this article and your opinion about it, thanks

http://www.stateofsearch.com/beyond-rankings-google-analytics/?utm_source=buffer

Mar 18, 2014

The "Not provided" its the 90% of the searches, thanks Francesco for the help.

Have your say
Submit
twitter @anilopez

Articles I write for other sites

On Paella and Semantic Markup for recipies

In plain words, it does not work fine most of the cases. It's a bad idea. I'll explain why while I teach you how to cook an authentic Paella.

Analytics Tribulations Of An SEO

The art of measure is never easy but when it comes to SEO it's even worst

Some decisions to take beforehand on multilingual SEO @ Cardinal Path blog

Hold on! Sure, you’re excited to get your content online, but stop asking for a site to be built, and think about its audience especially when it comes to international SEO.

Challenges of Spanish Language on Search Marketing @ Multilingual Search

'Standard Spanish' is something that I don’t buy into when it comes to international scenarios. I'll explain to you why and some tips to start facing correctly your Spanish strategy.

50 feeds keeping me updated on SEO

Besides the feeds listed here I follow around 30 more for topics like Analytics, UX, Link building, IA or any other SEO related discipline

Handling Multilingual Sites for Humans & Search Engines @ Bruce Clay Blog

The logic behind the scenes to show all content to bots and the right language to users

Mobile detection issues & Google Instant Previews @ Cardinal Path blog

Mobile web represents the bigger headache ever for those wanting to target the small but growing audience they represent nowadays. check your Instant Previews for possible indexation issues.