This is the ninth lesson video from my course The Exact Process For Setting Up Google Analytics – https://www.thequantifiedweb.com/exactga.
I decided to post the videos from the lessons publicly, but I encourage you to log into the course because from within the course you can access the written notes and the checklists, plus access my special offer. The course is completely free to register 🙂
In this lesson, you’ll be learning about how to add an Include Valid Hostname filter to your Filtered Views. This filter prevents spammers and malware from sending fake data to your account by using your Google Analytics ID.
It’s actually surprisingly easy to receive bogus data in Google Analytics – if your Google Analytics ID is added to someone else’s website due to either human error or malice, your Google Analytics data could be ruined, especially if invalid data is received in high volume.
Click to Open Transcription of Video
Hi! Welcome to Lesson Nine in the Exact Process for Setting Up Google Analytics, The Quantified Web. This lesson we are going to go through how to set up a filter that specifically includes your website and therefore excludes any other website.
So to do that let’s and actually, the lesson name is — Alright, yeah, so it’s actually called Create the Include Valid Hostname Filter. Alright, so this is how we get in. We need to go to the website www.thequantifiedweb.com/exactga. Now, you want to put your email address in here and click Get Started Today. Alrighty, just scroll down to Lesson Nine, that one, here it is, Include Valid Hostname Filter.
Alrighty, so the reason for this procedure is to prevent spammers and malware from sending fake data to your account or using your Google Analytics ID. Now, actually, a little bit of a funny story with this. So when I first started this business, I think it might have even been my first client or it was my second, something like that. It was my first or second client. I was setting up some filters for this client and then I was also setting up my own website at the same time. And I installed the plugin for my website that integrated with Google Analytics.
So when, because I had the client in my Google Analytics account, it just automatically picked the account name based on alphabetical order and my new client had, I mean, my business at the time was called Web Data Analytics so that’s obviously late in the alphabet and my client’s name was obviously earlier in the alphabet and so it automatically picked my client’s account instead of mine. And while I was testing my website, the software was doing pop-ups on my website, so I was testing pop-ups. And then my client invited me in to teach their entire marketing department how to read Google Analytics data. And so we were going through the events and I was so embarrassed because the event from my website was coming through to their Google Analytics and I’m like, “That shouldn’t be there.” I was just mortified. So I like – do was say sorry and they said, “Oh, it’s okay, it’s okay.” But they never had me back again. Oh, I just went bright red.
So but the, how about this, there actually was a problem for their Google Analytics account because when they had asked me to go through and set up all of these filters. But basically, they had a problem on their Google Analytics account because their account allowed anyone to use their ID and send fake data to their Google Analytics even though the hostname was not their website. So even though I made a mistake, their Google Analytics wasn’t set up right.
And if you do this procedure it would stop anybody from using your Google Analytics account ID and putting it on to their website and setting things to it. And which also essentially can help with getting rid of spam because one way that people can spam you is if — basically if they want to ruin your Google Analytics account, they could steal your Google Analytics ID and put it on a fake website and have a whole bunch of fake data go through to your Google Analytics. There wouldn’t be anything that you could do to stop it until you added this filter. So it’s very, very important to do the Include Valid Hostname Filter.
Now, just make it clear, none of these filters are included by default in Google Analytics. You do need to actually set them up manually. There’s no out of the box option where you can just choose to have it included or not. No, you actually have to set it up.
Alright, so what we’re going to do — now some of these names, so if you have done the earlier lessons you’ll know that I’ve taught you to create things that are Filtered Data No Params view. If you’ve come in at this lesson and if you come into some of the earlier lessons, you’ll get to see where this was set up. I do recommend following these procedures in order.
Alright, so let’s do it. Now, the first thing I want to do is go to Admin and we want to make sure we’re in the Filtered Data No Params view. Then we want to go into Acquisition > All Traffic > Channels Report. Set the Date range to the last 30 days. Okay, now this filter can be a little bit risky because if you only include certain IP addresses or certain domain names but there are other domain names that are included in your website then all of a sudden you’re going to filter out other domain names that should have been included. So this is why we need to go through and check.
So we want to set a Secondary dimension of hostname. So we’re going to Secondary dimension here, type in hostname. Alrighty, so basically we want to see if there are any hostnames in here that are different from the normal hostname for the website. So I can see that they don’t have, for this particular client, they don’t have any other hostnames. But if they did have other hostnames in there then you would need to investigate if some of them are actually legitimate hostnames or not.
So we want to look for hostnames that had more than 20 sessions in the past month. If you got one sess and two sess then they could either be spam or they could be some third-party tool that you’re using that isn’t really part of your website that has Google Analytics set up on it.
Alright, so for this particular example that I’ve got in my procedure here, we actually have two main hostnames. So we have one without www and we have one with a www. So in this particular instance, we actually need to include both of those in our Include Filter or we would inadvertently filter out one of this whole chunks of traffic. But in the example that I’m doing over here, we’ve actually just got the same hostname for all.
Alright, so we do want to be very careful not to include any hostnames that might be spam and we don’t really care about including all of the Google cache and things. Alright, so always ignore not set and googleweblight things.
Alright, so if there is a hostname that’s unexpected in the list of hostnames then we want to look into that. So one of the things that I do is our procedure says that we email our client and ask about it because there might be a reason for their — usually when we see this, they’ll have a third-party software. And that third-party software has Google Analytics on it and it’s part of their business. But I do like to check because we definitely don’t want to include any spam in that hostname, allowable hostnames. So we just wait to get their response back about that.
So once we’ve got the list then we need to apply a filter to the smallest version of the hostname. So what do I mean by smallest? So let me just annotate the screen actually. Okay, alright, so we got www.example.com and we’ve got example.com. Which of these is the smallest? So this one, not that one because if we create a filter that says only www.example.com is going to be included, then this one won’t be included but the way this filter works is it encompasses everything that is a subdomain if we go for the smallest one.
So if we would type in example.com then www.example.com would be included. But if we would type in the www.example.com then the example.com without the www’s would not be included. Hopefully, that makes some sense. Let’s just clear it out. Oh, I have repeated the same one. Yeah, so we’ve got basically if example.com and subdomain.example.com were all valid hostnames so we want to include everything containing example.com.
Now, of course, there’s an exemption. So if we have subdomains that we want to exclude from a production view then we want to apply a smart filter. So one example that we have here is we might have — if you have a more, let’s say you’re running e-commerce or you’ve got a corporate website that has a more in-depth process for deploying that website then you might have some non-production versions of your website floating around. So you might have a staging or pre-production or you might have both of these and this is basically testing version that you would be testing on before it going live and we don’t want to have that testing data going into your live production one.
So if you have staging and pre-production, then we actually want to exclude those subdomains. We don’t want to be including everything at your domain. Now, if we do have multiple subdomains then we do want to confirm with somebody. Now, in my business, we confirm with the person who designs the analytics one which should be me as which subdomains will be included. One way that I usually like to set it up is if we have a lot of subdomains then I would normally like subdomains to have their own views. And then we do a multi-site setup and I’ve got a different procedure for that.
Alright, so we’re going to just assume that we’ve got standard, put standard situation here, we’re going to create a new filter for that. So let’s go to Admin and we’re going to open the Filtered Data No Params view which we created in an earlier lesson. We’re going to go Filters and click Add Filter. Alright, Filtered data No Params, go Filters, click Add Filter, alright.
Alright, so let’s call this one Include Valid Hostname. Alright, now we want to select Custom as the Filter Type and we want to tick the Include option. Alrighty, now from the Filter Field drop-down menu, we want to select Hostname or just type in.
Alright, so if there is just one hostname or if there are multiple variations of the same domain then I have a step here called Write a Filter Pattern for one hostname. But if we’ve got multiple hostnames then, and including if we have cross-domain tracking which I haven’t got into yet in this series, we want to follow a step called Write a Filter Pattern for multiple hostnames.
Then I’ve got some examples here. So for the multiple hostnames then some examples might be if you have an e-commerce shopping cart that’s hosted on a separate domain. So this would be — alright, so let’s say next week for example. Is that the one that’s called? Next week. I think that’s what it’s called. They will have a shopping cart which is actually on its own subdomain and it’s a hosted shopping cart. So that’s one example. There’s plenty of others as well. Just thinking of, it’s in my head. I think it could be Shopify has this in some accounts.
Alright, so clients using a booking engine of some sort. So you see this all the time in hotels industry or in health industries. So when you have Cliniko, that’s a health one. Bookonthenet is an accommodation one. There’s plenty of others, you’ll see that they’ve got a completely different domain name for their booking engine. And then you’ve got clients with more than one website. So that’s an example if you have multiple hostnames.
Alright, so you can, it’s like multiple choice here, you can pick which one is your situation. So in this case, we just have one hostname so I’m going to go to that one. Thirteen. Oopps. actually, if you click on it, it opens a bigger version so feel free to do that anytime. I’ll go back to it. Cool. Where do we go up to? Here we go.
Now, so I’m just doing one hostname. So if we have different variations of the same domain name then like I told you before we want to choose the smallest domain name that can be matched by all of them. But if we only have one hostname then we can just enter into the Filter Pattern text box.
Now, we want to have a backslash (\) before any dots. So this is just going to be really weird if you’re not a programmer so let me just type again for you. Alright, so if — so let’s say your website was www.example.com then we want to put in www and then we want to do backslash dot example backslash dot com (www\.example\.com) and the reason for that is this dot has a special meaning if you’re a programmer. And that dot actually represents any character at all. So it’s actually called a wildcard, so it could represent any single character.
Now, I have seen spammers take advantage of that to get themselves included in an Include Filter by changing their hostname to look like yours but with some other character in it. So they might have like, that or something in there and look this is not going to happen very often but it can happen.
So if you put in backslash before the dot, what this does is called an escape character and what this says is: we are only going to include it if it is literally a dot – not if it’s any other character. So that’s actually called a regular expression or regex and that matches any other character. So let’s just clear that. Hopefully, that makes sense. It’s a little bit, that’s a bit nerdy on you but hopefully, I’ve explained that well enough that we can continue.
Alright now, alright, so if we have subdomains, what we’ve got, so we’ve got an example with the backslash in there and if we have a subdomain then we still want to make it the smallest one that matches all of them. So if we don’t want to include all the subdomains then what we need to do is we need to pick the longest specific domain. So and then, in fact, we end up having to go to do multiple hostnames.
So if we do not want to include all of the subdomains, if we only want to include one subdomain then we do in fact want to specify the longest one. We want to specify the subdomain. Now, in this particular case, we just have the one, the one domain so it ended up being really easy and oh I didn’t write it down but I have it from up here so I’m just going to paste it in. Alright, so this one was really easy – we just added one.
Alright, so if you have multiple hostnames then what you want to do is separate them with this little pipe symbol here (|), this vertical line means OR. So I’ve been putting it in brackets. I can’t remember where this is actually managed or not but I’ve been doing it anyway. So I’d do a bracket and then I put in the first hostname and this little symbol here that means OR and then the second hostname etcetera, and what that would do is it will say: any of these hostnames will be included in this view.
So I’ve got here the pipe character means OR.
Alright, so if we have cross-domain tracking which I haven’t gone into cross-domain tracking so far in this procedure but if we do have cross-domain tracking then we always have two or more domains. So if you know that you have cross-domain tracking then you should definitely have two domains.
And sometimes the second domain is embedded as an iFrame and so you wouldn’t necessarily realize that you have more than one domain but you would definitely start to see it when you look in the earliest step of checking for more than one hostname. So you do that using Google Analytics. We should have already done it in the earlier step anyway.
I’ve just got an example using Cliniko because we do actually have a set up specifically for Cliniko. So if you are a health clinic using Cliniko, feel free to reach out to me because we’ve actually got a standardized procedure just for Cliniko which makes it really budget-friendly.
Alright, next up we want to Verify the filter. There’s like a little link here down the bottom that says: Verify this Filter. Alright, so there are a few different situations you might see here. You might see, “This filter would not have changed your data.” If you see that, then you’re good to go, you can just click Save. So in this case, it’s what I’ve got so I’m just going to click Save.
Now, another situation you might see a table like this and you might have all the hostnames on the left and then you might have a sub-set of those hostnames on the right. So what you want to be doing is you want to check through this and make sure that the hostnames that you do want to include are present on the right because what you see on the right will be what will be coming through in the future. And if there’s anything blank on the right that’s present on the left it means that those will no longer be included in this view. Hopefully, that makes sense. This can be a little bit confusing for some people so basically, whatever shows up in the right of the graph, oh sorry, the right of the table is what will be coming through in the future after the filter is applied.
Alright, so if a valid hostname is showing on the left but there is blank on the right then there’s an error in the filter and the filter will need to be fixed and verified again. Now, this is one area where errors can cause loss of data so do check your work and do make sure that the filter has been applied correctly.
If you do have multiple hostnames and you filter them out then it will cause a big drop of traffic on your account and it’s one of the most common filtering issues and does have some pretty nasty results if you don’t realize for a while. So please be careful before saving the filter that you’re not filtering out any hostnames that should be included especially if you see a table like this.
Hopefully, that all makes sense. This particular one turned out to be a really easy case but your case might not be easy. Look, if you have a complicated case just feel free to reach out to me because my team can help you out. Alrighty, the next lesson that we go through will be — so I’ve put the Create the Exclude The Quantified Web filter but it will basically be excluding your own team. Alright, stay tuned for that one.
If you liked this video and want to find the next one, look out for How to Create an “Exclude Own Traffic” Filter in Google Analytics. In that next video, we’ll be showing you how to filter out your own IP address(es) from your Google Analytics Filtered Views. This prevents your own activity and the activity of your staff from adding extra data that might make your results inaccurate. This is especially the case if you are testing activities on your website that might trigger a goal or a conversion of some kind, or if you have your staff computers set up to log into your website.