Get Rid of the URL Pollution

You want to copy the URL of a nice article/video/picture you’ve just opened and send it to friends in skype chats, whatsapp, other messengers or social networks. And you realize the URL looks like this:

http://somesite.com/artices/title-of-the-article?utm_campagin=fsafser454fasfdsaffffas&utm_bullshit=543fasdfafd534254543&somethingelse=uselessstuffffsafafafad&utm_source=foobar

What are these parameters that pollute the URL? The above example uses some of the Google Analytics parameters (utm*), but other analytics tools use the same approach. And probably other tools as well. How are these parameters useful? They tell Google Analytics (which is run with javascript) details about the current campaign, probably where the user is coming from, and other stuff I and especially users don’t really care about.

And that’s ugly. I myself always delete the meaningless parts of the URL, so that in the end people see only “http://somesite.com/artices/title-of-the-article”. But that’s me – a software engineer, who can distinguish the useless parts of the URL. Not many people can, and even fewer are bothered to cut parts of the URL, which results in looong and ugly URLs being pasted around. Why is that bad?

  • website owners have put effort in making their URLs pretty/ With “url pollution” that efforts goes to waste.
  • defeating the purpose of the parameters – when you copy-paste such a url, all the people that open it may be counted as, for example, coming from a specific AdWords campagin. Or from a source that’s actually wrong (because they got the URL in skype, for example, but utm_source is ‘facebook’)
  • lower likelihood of clicking on a hairy url with meaningless stuff in it (at least I find myself more hesitant)

If you have a website, what can you do about this URL pollution, without breaking your analytics tool? You can get rid of them with javascript:

    window.history.replaceState(null, null, 
        window.location.href.replace("utm_source=....", ""));

This won’t trigger fake analytics results (for GA, at least, as it requires manual work to trigger it after pushState). Now there are three questions: how to get the exact parameters, when to run the above code, and is it worth it?

You can get all parameters (as shown here) and then either remove some blacklisted ones (utm_source, utm_campagin, etc.), or remove all, unless your whitelisted parameters. If your application isn’t using GET parameters at all, that’s easy. If it is, then keeping the whitelist in sync would be tedious, so probably go for the blacklist.

When should you do that? A little after the page loads, and the analytics tool does its job. When exactly is that – I don’t know. Maybe on window.load, maybe you have to wait for a second and then remove the parameters. You’d have to experiment.

And is it worth it? I think yes. Less useless parameters, less noise, nicer, friendlier URLs (that’s why you spent time prettifying them, right?), and less incorrect analytics results due to copy-pasted long URLs.

And I have a request to Google and all other providers of similar tools – please cleanup your “mess” after you read it, so that we don’t have to do it ourselves.

You want to copy the URL of a nice article/video/picture you’ve just opened and send it to friends in skype chats, whatsapp, other messengers or social networks. And you realize the URL looks like this:

http://somesite.com/artices/title-of-the-article?utm_campagin=fsafser454fasfdsaffffas&utm_bullshit=543fasdfafd534254543&somethingelse=uselessstuffffsafafafad&utm_source=foobar

What are these parameters that pollute the URL? The above example uses some of the Google Analytics parameters (utm*), but other analytics tools use the same approach. And probably other tools as well. How are these parameters useful? They tell Google Analytics (which is run with javascript) details about the current campaign, probably where the user is coming from, and other stuff I and especially users don’t really care about.

And that’s ugly. I myself always delete the meaningless parts of the URL, so that in the end people see only “http://somesite.com/artices/title-of-the-article”. But that’s me – a software engineer, who can distinguish the useless parts of the URL. Not many people can, and even fewer are bothered to cut parts of the URL, which results in looong and ugly URLs being pasted around. Why is that bad?

  • website owners have put effort in making their URLs pretty/ With “url pollution” that efforts goes to waste.
  • defeating the purpose of the parameters – when you copy-paste such a url, all the people that open it may be counted as, for example, coming from a specific AdWords campagin. Or from a source that’s actually wrong (because they got the URL in skype, for example, but utm_source is ‘facebook’)
  • lower likelihood of clicking on a hairy url with meaningless stuff in it (at least I find myself more hesitant)

If you have a website, what can you do about this URL pollution, without breaking your analytics tool? You can get rid of them with javascript:

    window.history.replaceState(null, null, 
        window.location.href.replace("utm_source=....", ""));

This won’t trigger fake analytics results (for GA, at least, as it requires manual work to trigger it after pushState). Now there are three questions: how to get the exact parameters, when to run the above code, and is it worth it?

You can get all parameters (as shown here) and then either remove some blacklisted ones (utm_source, utm_campagin, etc.), or remove all, unless your whitelisted parameters. If your application isn’t using GET parameters at all, that’s easy. If it is, then keeping the whitelist in sync would be tedious, so probably go for the blacklist.

When should you do that? A little after the page loads, and the analytics tool does its job. When exactly is that – I don’t know. Maybe on window.load, maybe you have to wait for a second and then remove the parameters. You’d have to experiment.

And is it worth it? I think yes. Less useless parameters, less noise, nicer, friendlier URLs (that’s why you spent time prettifying them, right?), and less incorrect analytics results due to copy-pasted long URLs.

And I have a request to Google and all other providers of similar tools – please cleanup your “mess” after you read it, so that we don’t have to do it ourselves.