Today’s topic is a little more advanced, for people who are working in the digital marketing arena already and trying to use Google Analytics. Perhaps you’re trying to report some numbers to a client.
In Google Analytics, if you are familiar with the format you tend to have words on the left and numbers on the right.
The words on the left, those are called ‘Dimensions’ and it will include things like page names, events, country, product name, gender… basically anything that is text.
Then you have metrics which are the numbers and they come on the right-hand side of the tables. So you’ll see sessions, users, session duration, conversion rate, bounce rate… i.e. all the numbers and percentages.
What you’ll discover in Google Analytics — well the whole main point of Google Analytics – is that Google Analytics automatically aggregates all the numbers on the right for a particular piece of text on the left for any date range you choose.
A big problem happens when you have two or more pieces of text that have been separated for some reason, but they are supposed to be reported as one. When you go to report on what happened, you have two or more separate numbers and it gets messy. Today’s video will explain to you how to fix this issue using Google Data Studio!
Click to Open Transcription of Video
Hi! How are you going?
Now today’s topic is not for the faint of heart – it’s a little bit more advanced topic. Feel free to watch though even if you’re a beginner but it is more suited to someone who is more advanced with Google Analytics and understands some of the technical concepts.
So we’re going to talk about, basically, we’re going to answer the question, “What do we do if data isn’t aggregating correctly in Google Analytics?”
Before we go down this path, let me just explain what that even means in the first place.
In Google Analytics, you might be familiar with the format: you tend to have words on the left and numbers on the right.
The words on the left, those are called ‘Dimensions’ and it will include things like page names, you might have events so it will be saying the user did a particular thing. You might have information like which country someone came from or what product name they purchased or what their gender is. These are all text-based fields and these are all on the left-hand side of any graph.
Then you have metrics which are the numbers and they come on the right-hand side of the graph. So you’ll see sessions, you’ll see users, you’ll see time-based things such as session duration. You’ll see percentages like conversion rate and bounce rate, those are all percentages. And then you can also see things like timestamps as well so you can have dates in there as a metric.
Dates are a bit of an interesting one depending on how that’s set-up. But generally, you have — just imagine it as the dimensions are text which are on the left and the metrics are numbers which are on the right.
What you’ll discover in Google Analytics — well the whole main point of Google Analytics is that Google Analytics automatically aggregates all the numbers together for a particular piece of text on the left.
By aggregates, I mean that whichever date range you pick it will automatically recalculate the numbers, so the numbers all add up. You don’t have to see line by line every single hit that happened on your website and add them up yourself.
Let’s think, for example, let’s think about page name. Now if you have no aggregation, you would say a hit on every page and it would say, “one user looked at this page, one user looked at this page, one user looked at this page, one user looked at this page,” and it would go on and on and the same page name would be repeated for as many users looked at that page and then you have to then like manually add it up.
So you don’t have to do that with Google Analytics. Google Analytics has a wrap. For this page and this date range, this is how many people looked at it and it just does it all behind the scenes on the Google Analytics service.
That is really, really helpful and one of the main reasons why we want to apply filters in order to clean up some other data so that you can have Google aggregation.
The problem happens when — let’s say you didn’t have good filters applied or you changed the way data was being collected either by changing the website or changing the analytics, either one can do it. Or — there’s probably plenty of other situations where it can happen but what you can have is — let’s say you have two pieces of text which really were supposed to be the same thing but they’ve never been separated.
I’ll give you an example, let’s say you have a page name which is — I don’t know how to call that — home page. What a lousy name! My imagination is so lazy today! But let’s say it’s literally called Home-Page and then let’s say — this is a terrible example but let’s just you’re renaming your home page. I swear you would never ever do this but let’s say you renamed it to Home-Page Two. Now if you had your Analytics and you compare to other date range in which that was changed, then you’d have half of your data on Home-Page and half of your data on Home-Page Two and you might get the feeling that less people have seen your home page then it was actually the case.
That’s not the most contract example because first of all, you wouldn’t name your home page Home-Page and second, you wouldn’t rename it so let’s think of something else.
Have you ever seen on an ECommerce store when you order by price or about relevance? Something like that. It puts some text inside the URL to tell that ECommerce software what to do so one minute you are at — you’re shopping for shoes and the next minute it says, shoes and then it says order by price, you ever seen that?
Now if you have seen that before, you know what I’m talking about then. This is an instance of where aggregation can go badly, badly wrong if you haven’t got any filters applied. So in my agency, I always have a filter that strips out all of these order by in some views for reporting because all this little cynical perimeter they completely mess up your aggregation.
Let’s say you didn’t have those filters applied yet and you had a page called Shoes and you had some people go to the page who were just looking at shoes page and then other people going to the page were doing order by price, and it could even get worst. It could be ordering by price ascending or ordering by price descending or by ordering by some means — you’ve got all of these different things on the back – each one of those will be aggregated differently. So you’re going to have shoes, you’re going to have shoes with order by price, you’re going to have shoes with price ascending. You’re going to have different numbers of users and if you want to find out how many people looked up the shoe page, now you’re screwed.
I want to say screwed because now there’s like ten different permutations of the shoes page or a different number of people next to them – makes a big mess.
So, rule number one, apply your filters. But if you have not got any filters and someone asked you, “How many people looked at shoes?” You can still do it and you can do it really easily – and you do not have to add it up through a calculator!
What you want to do in this situation is, if you have scenarios where the numbers aren’t aggregating because your dimensions are different, you can fix up the dimensions to make them all the same. Yey! And then it will automatically aggregate. But you can’t do it in Google Analytics itself, because in Google Analytics once data is in there, it’s in there. But you can do it somewhere else. You can do it in a tool called Google Data Studio.
What you want to do is you want to go to Data Studio. This tool — it’s just part of the Google Marketing Platform so it’s the same kind of way — so when you go to Google Analytics, you type in analytics.google.com, so if you want to go to Data Studio, you type in datastudio.google.com. Go there. Create a new report and you might already have them but for this sake just create your report. You want to import your Google Analytics as a data source. So what will happen is you’ll click on data sources, you’ll pick your Google Analytics and press connect. And then you will want to add whatever the field it is you are trying to aggregate, you can add that as the dimension and then whatever you are trying to see in terms of the numbers, you add them as a metric.
Now, this will not fix it yet. So you’re still going to have — with that example we’ve got page and then users — we’re now going to have all the pages listed with all these random parameters, and we’re going to have the number of users for each one.
But now what you can do is you can fix it in Data Studio even though you can’t fix it on Google Analytics. There’s a couple of commands in Google Data Studio which if you are a bit more advanced with working with Google Analytics, these are your friends. So the commands are called ‘replace’, ‘regexp_replace’ which is R-E-G-E-X-P underscore replace and then ‘case.’ What these do is they let you change your data so that the numbers make sense.
Basically, what they do — let’s talk about ‘replace.’ So what ‘replace’ does is it lets you pick out a particular piece of text and replace it with something else. Let’s say you have — Oh, I don’t know. Let’s say you have — why do I go back to that example that I said — that really crappy example with Home-Page and Home-Page Two. Alright, this is a perfect example.
With ‘replace’ you can say, replace the page name and if you see Home-Page Two replace that with Home-Page. Then when it goes to aggregate it, it will remove Home-Page Two and replace with Home-Page and it will correctly aggregate as though that Home-Page Two never existed.
Now the ‘regexp_replace’ one – that’s a little bit tricky. What you can actually do with that is, you can assign what’s called a ‘regular expression’ and it is a rule-based way of determining which characters it’s going to search for in a string. What you could do — let’s go back to the silly example — so let’s say, you had actually created a Home-Page Two, a Home-Page Three, Home-Page Four, Home-Page Five and people are going to all these different home pages. And you just want to know how many people went to these home pages. You could do a regular expression replace so ‘regexp_replace’. We said the dimension is the page name and then you want to be looking for the number and you can do a regular expression to say any number between zero and nine replace that with blank.
If you want to get rid of something, you do so by replacing it with blank.
I do this all the time but it’s hard trying to explain when you’re just talking it out but basically, you can — you can just clear out things you don’t want by replacing it with blank. And you can also stack replaces on to each other. Let’s say you want to completely change the name of something and but first you want to get rid of some weird numbers or weird letters that are in there. You can replace all the weird stuff with blank and then you can — that will aggregate but then you can wrap that in another replace and it can wrap that in another replace and wrap that in another replace.
Sometimes I have strings that looked like replace bracket, replace bracket, replace bracket (replace), (replace), (replace) and then a whole lot of things that I am replacing. So if you — Hey, I am a computer programmer so all makes sense to me but for you, probably, it doesn’t make sense if you’re not a computer programmer especially if you’re a beginner. So I totally understand that but just letting you know you can do it.
Hey, if you’re into SEO or if you’re working with Google Analytics all the time, you don’t have to be a data scientist or a programmer for you to do it. You just need to understand syntax and once you have one you can just plug it in pretty much the same every time.
Alright, so there’s one more that I will quickly mention and that is ‘case’. A case segment is — in programming, it’s also known like a weakness case in programming as well but the more common way of calling it is like an ‘If Statement’. And what you’re doing with the ‘case’ is saying if you have this kind of data then make it look like this or if you have this other data then make it look like this other thing.
So what you can do is you can — how do I describe this? Basically, actually, a good example is countries. Let’s say you want to — let’s say you have a particular country that has to meet some requirement that you have in terms of…
I think in Google Analytics some of the Middle Eastern countries they have listed in there might be different geographically to the way that — you might just want to call them all Middle East — Yeah, I’ve done this exact example.
So a client said to me, “Well I want to know, I want to know what happened for Australia, I want to know what happened with UK, I want to know what happened with USA and I want to know what happened with Middle East.”
So then, Australia is easy because Australia is just one country in Google Analytics but in the Middle East, there’s like a million different countries in the Middle East. So we ended up aggregating them using a different way. We ended up finding at which geographical region it was and then if it’s in that geographical region, we were able to change the data to just say the Middle East instead of naming like all the different Middle Eastern countries which were going to — like end up with users against all the individual countries where actually we just want to replace them all with Middle East.
So that’s one example of using the case segment and in that case, that’s pretty much the only real way to do it well because you can’t — if you need to replace, you have to replace them all individually anyway which becomes a mess.
I don’t know if I’ve just like completely confused everybody or maybe you’re following along with me and going wow I never thought of doing that. This comes in hand like you don’t have to be analyzing data in the Middle East — to make use of this. If you were looking at something that you did last quarter and then you suddenly realized, “Oh no! my Analytics was broken and I’ve got — I’m trying to work out how many users did something and split into three different camps and it was supposed to be split into just one.” Then you can quickly instead of just using Google Analytics, you can quickly pull the data into Data Studio, fix it up using replace statements and export it out again and no one would ever be the wiser. So I do this all the time. Once you’ve practised that, it will not take a couple of minutes.
Yeah, I think that’s pretty much all that I’m going to cover in that particular topic. Over time, I might add some of these things into the blog post. So at this point in time, you probably have a very short blog post and I won’t go into all the detail but over time just check on the blog post and I might have exact descriptions on how to do this with actual examples that you can copy.
Have a wonderful day and if you want tips from me about things that you can do with Google Analytics, Google Tag Limit, Google Ads, etcetera, anything that’s Google, then you can find me on — basically, you should subscribe to me on either on YouTube or on my email newsletter. That’s probably the best way.
So you can go to www.petramanos.com and just subscribe to the email newsletter and you should get tips like this about three times a week.
Okay, I hope you have a wonderful, wonderful day and catch you soon. Bye!
Okie dokie! Well, hopefully, that will come in handy for you too if you’ve ever been scratching your head about how to clean up data that’s already happened.
Next time you can use filters in Google Analytics to make the data aggregate well and save yourself all the time and effort, but that’s another post for another day.
Take care and have a good one!