The Fallacy of Measuring Everything

We have all been in those talks. You know, the ones in which they are telling you to measure all things? Test everything!!! The one in which they share the secret to achieving a 300% growth! How it’s impossible to make a decision without data? How data will set you free? You know, the talks that make you question everything you’re doing and leave you so confused at the same time?

I’ve found myself in the same situation as most, struggling to grasp the discussion. Not because I didn’t understand what they were saying, or disagreed with the logic, but I was having very different results and the experiences with the journey of testing. My failures in the domain were overwhelming and exhausting, especially since the rest of the industry kept saying this was the way to go.

I had become content with my thoughts of confusion until I heard Beka Rice, with Skyverge, speak at WooConf this past week and was compelled to revisit my thoughts on the subject.

Considerations When Using Statistics

Whenever we talk about leveraging statistics it’s important we take a few things into consideration around how to work with our data. If we don’t, it can lead us to make very bad decisions.

Facts are stubborn things, but statistics are pliable. – Mark Twain

The need for a representative sample to make a good decision. In other words, calculating how much data you need to collect  so that it accurately reflects your audience. If you jump the gun on your tests, you’re bound to set yourself up for failure and despair.

A representative sample is a subset of a statistical population that accurately reflects the members of the entire population. A representative sample should be an unbiased indication of what the population is like. via Investopedia

The need to account for your confidence level. The higher you set your confidence level, the bigger the sample you’re going to require, the bigger your sampling requirement, the longer your test might run.

A confidence interval does not quantify variability. A 95% confidence interval is a range of values that you can be 95% certain contains the true mean of the population. This is not the same as a range that contains 95% of the values. via GraphPad

The need for statistical significance. The point at which your data becomes meaningful, any decisions you make before you reach statistical significance is a fools errand.

When a statistic is significant, it simply means that you are very sure that the statistic is reliable. Via StatPac

It’s Not About Measuring Everything

The harsh reality is that while it’s fun to spend time reading case studies on the wacky testing schemes employed by organizations we all aspire to be, they are rarely practical insights for you. It’s not because testing is impractical, but because most of us don’t have enough data to actually test against in a timeframe that makes sense to make any meaningful decisions from. They key word there is timeframe.

Can you achieve similar success? Probably. It’s not however an overnight thing, it takes time (weeks, possibly months) for most companies to get enough data points to achieve any form of statistical significance.

When was the last time you actually saw a marketing article or presentation reference the considerations above? Yet, in every scientific paper you read they always start by describing the details of the test, their methodology, their confidence level and the sample size used. It helps validate the analysis and conclusions.

Unfortunately, in business, especially marketing, this is rarely the discussion we have. We are so consumed by the need for quick wins, always eager to turn the knobs and spin the dials. Have this need to show we’re making progress, when in reality we may be going in circles, with no real meaningful output.

Paralysis by Analysis

Every time I hear someone say “Measure everything…” I want to cry because it’s unrealistic, naive, and sets people up for failure. It feels lazy, grossly inaccurate and disconnected from reality. Especially if you’re starting a business, and aren’t a data scientist.

When you’re starting your company it’s impossible to measure everything. There are not enough hours in the day for one, literally, and second you’re already spread thin. Depending on your company size, you’re likely also limited on resources to make this happen.

I would argue that better advice would be to identify 2 to 3 metrics (key performance indicators) that you know you can track, start there. Start small, get your feet wet with the idea of measuring and making decisions from data then slowly work your way into consuming more data.

Measure The Revenue Per Unit

While testing is fun, a more practical approach, might be to place more emphasis on measuring your Average Revenue Per Unit (ARPU). Beka referred to this as Average Order Value (AOV), and I personally call it Average Revenue Per Site (ARPS) and Average Revenue Per Transaction (ARPT). Be mindful of the Law of Averages, but it’s definitely a good place to start.

Note: If you are on WordPress and use WooCommerce for your online store, the WooCommerce Google Analytics Pro extension might be worth looking at to help you with this calculation. 

This measurement is telling you how much revenue you’re generating by whatever unit your business focuses on. Whether it’s a Software as a Service (SaaS) offering, or some form of good. Every time someone makes a payment, how much are they giving you? The beautiful thing about this metric is it’s independent of the audience / sample size.

For example, assume you had 10,000 unique visitors to your website. You convert 2% of that, meaning of that you are picking up 200 customers. Now say that of those signups, your average revenue is $50, so you’re making something in the neighborhood of $10,000 a month. With this knowledge you can make a couple different decisions:

  • If your conversion is consistent, do you go about acquiring more audience? In other words, do you fill the top the of the funnel with more? Whether through content, social or paid acquisition?
  • If your ARPU is consistent, do you find ways to increase that per each customer? Perhaps you give you discounts if certain thresholds are hit (great for those selling goods, nice recommendation Beka!!) or perhaps you introduce new plans that entice the user to break your desired goal for you SaaS providers.

Simple, yet effective, especially when you’re starting off. It helps gain deeper appreciation for working with data, making decisions off data, and avoiding some of the pitfalls of inaccurately measuring data that isn’t significant enough yet.

Capturing and Measuring Data is Important

Contrary to what you might derive from this article, I have no issues with capturing data and measuring it. On the contrary, I believe it’s very important. I just believe most people tell a very skewed story when they’re highlighting their testing and measurement stories.

I believe that like most things, everything is about context and when we’re telling our stories we have provide enough of it that people can come to their own conclusions. Personally, I struggled with these thoughts a couple of years ago. Everything you read were these incredible examples of tests that dramatically changed businesses, yet I was never able to replicate.

I think a good message is that other people’s tests/results are not necessarily results that make sense for your company.  Do your own testing on revenue etc, but in the correct scale of your size so you’re not making the statistical errors. – Renu Hermon, Demand Generation Manager at Sucuri

I would read how changing the color of a button from Yellow to Green would have a 200% improvement on conversions, or adding an arrow to your email would entice the user to click by 350% and yet never about the data supporting those tests. Granted, a lot of my failures were most likely attributed to incorrect testing methodology, but the fact remains that the considerations I described were rarely discussed. To achieve the same level of success often described, it would take a very long time (weeks, not days or hours) to achieve statistical significance applying the same techniques using a true representative sample of an organizations audience, with an acceptable confidence level.