Home Page

A Canonical URL JavaScript;


Warning: Don't use JavaScript for this. This is documented here because I once did this and wrote about it. But it was a terrible, awful idea. You have been warned!


Written 2008, Revised 2021. A website should be known through a single hostname and a page by a single URL. These are known as the 'canonical' hostname and URL, respectively.

There may be many URLs for one page. Consider HTTPS versus HTTP and WWW versus no-WWW. AlaskaLife.net, home.GCI.net, and MatNet.com were once aliases because of buy-outs. There may even be multiple document names, like index.php, default.htm, index.html, and index.shtml.

To robots (search engines), presenting the same content at multiple URLs looks like an attempt to get multiple listings (spamming the index). This can get you penalized ( ). Google.com. 2021. Avoid Duplicate Content. Developers.Google.com @archive.org.

To avoid such adverse effects, you should tell robots which one is the single, official canonical form. You need only include <link rel='canonical' href='xx' /> where xx is the desired official URL, in the HTML head.

Canonical Names for Humans

To get humans in line, you need to redirect their visit so the URL their browser displays is the right one. Otherwise, six different people who link to the same page on your site are sure to use six different URLs for it.

Had they all used the same URL, robots would have counted six votes for your page. Because they used six different URLs, none of your URLs got more than one vote. With only one vote each, your URLs are not likely to be on the first page of search engine results. It is important to enforce the use of the canonical URL ( ). Cutts, Matt. 2006. URL Canonicalization. mattcutts.com.

Server-Side Redirects

The correct solution to multiple URLs is to use server-side redirects to the canonical URL. The Linux Apache server's mod_rewrite (RewriteEngine) in the .htaccess file can do this ( ). Apache.org. 2017. Module mod_rewrite. Apache.org, old Apache 1.3 version at archive.org. Alas, many customers have difficulty with it.

Rather than incur customer support issues, some web hosting companies simply disable it ( ). Morgan, JD. 2007. Reply to bcrbcr re: Host Supports .htaccess but not Mod Rewrite. WebmasterWorld.com, Apr. 16. If your host does this, you should flee to a better one.

Client-Side JavaScript Redirects

Theoretically, JavaScript is a workable alternative. Practically, avoid it like the plague. You're much better off to flee to a web host that lets you do server-side redirects.

If you have to fall back to JavaScript, as many as twenty percent (20%) of your visitors may still see and use the wrong URL. The various ad blocking, no script, no tracking, and privacy extensions conspire against you. There are also a lot of traps. I'm sure I haven't thought of all of them, but here are a few:

  1. The Hotel California problem. It infuriates visitors when they use the back button to leave your site and it reloads instead. My example script uses 'location.replace' to avoid this.
  2. The broken page link/query problem. It disappoints visitors when your same-page links and search query features don't work. My example script appends the hash and search strings to avoid this result.
  3. The inaccessible saved page problem. After you or your page is dead, people may try to view a local, search engine cache, or Internet archive copy. Your JavaScript must not redirect them to the now-dead original. For this reason, my example script only kicks in if someone is visiting a known alias of the canonical URL.
  4. The blacklisting problem. The robot (search engine) indexers blacklist pages and sites suspected of using JavaScript redirection for nefarious purposes ( ). Chellapilla, Kumar. and Maykov, Alexey. 2007. A Taxonomy of JavaScript Redirection Spam (8 page PDF). AIRWeb-'07: Proceedings of the 3rd international workshop on Adversarial information retrieval on the web, May. (Also archive.org airweb.cse.lehigh.edu /2007 /papers /paper_115.pdf.)

    Theoretically, because we're presenting the same content to humans and robots, there should be no penalties. Practically, search engine algorithms are not publicly known, so we can't be sure of this.

My example JavaScript is presented here, but there can be no assurance that this will not greatly harm your rankings. You have been warned.

Code: Example JavaScript


// walt.gregg.juneau.ak.us/6/javascript-make-canonical-url
// As a last resort only, a method to redirect visitors.
// Copr. walt.gregg.juneau.ak.us 2016; License: attribution
// sharealike; Details: apache.org/licenses/LICENSE-2.0.
// Absolutely no warranties, express or implied.
//
// Insert in the HTML head, replacing with the page's canonical url:
//   <base href="http://example.co.uk/extensionless_page">
//   <script type="text/javascript" src="mkcanonical.js">//</script>
// Below:
//   Set alias array count to total hostnames including canonical hostname.
//   Set alias[0] to desired canonical hostname.
//   Set alias[1] to 1st alias and continue adding all other known aliases.
var alias = new Array(2);
alias[0] = 'example.com';
alias[1] = 'www.example.com';
var basehref;
var baseget;
var hostname;
var i;
if (document.getElementsByTagName) {
  basehref = document.getElementsByTagName('base')[0].href;
  baseget = location.protocol + '//' + location.hostname + location.pathname;
  hostname = location.hostname.toLowerCase();
  for (i = 0; i < alias.length;i++) {
    if ((hostname === alias[i]) && (basehref !== baseget)) {
      location.replace(basehref + location.hash + location.search);
    }
  }
}

Extensions are best omitted

There are some advantages in making the canonical URL for linking to a page lack a file type extension. This way, you can freely change the document type without breaking incoming links.

Is it worth the trouble? Experts think that it's best for canonical URLs to exclude all unnecessary detail ( ). Berners-Lee, Tim. 1998. Cool URIs Don't Change. w3.org. Who am I to argue with the inventor of the Web?


📧 Send Comment Walt.Gregg.Juneau.AK.US/contact
🏡 Home Page Walt.Gregg.Juneau.AK.US
  Global Statistics   gs.statcounter.com