Handling Multilingual Sites for Humans and Search Engines
Every international SEO project must be evaluated and solved carefully according to its particularities, but there are some basics to cover if you want to ensure a smooth-as-silk experience. This is what we’re going to cover today. Given the worldwide reach of Bruce Clay, Inc., and being an international and multilingual SEO myself, this topic came naturally and is something I’m passionate about.
Combining the Best User Experience and a Perfect Indexation
Let’s look at an example of a site with different languages. We could organize it in directories like:
- http://mysite.com English, default language
- http://mysite.com/fr/ for French
- http://mysite.com/es/ for Spanish
- http://mysite.com/ru/ for Russian
My few followers know how often I argue against languages in subdomains when country level domains (ccTLDs) are not an option, but this is another story.
What I propose here will also work for languages organized in subdomains or ccTLDs. There is a double objective to achieve here: two types of visitors (users and bots) are coming to this site and we want to make all of them happy.
Ensure Effortless Experience for Humans
Ensuring an effortless and memorable experience by serving content up in the visitor’s language is a key factor for obvious reasons. We all are more likely to engage and interact, and achieve a site’s goals if we can read Web pages in our language, capisci?
As a user this means:
- I don’t want the site making me select my language if it already knows that detail (yes, it does) the first time I land there.
- I want the Web to remember my language preference when I come back.
Let me show you a couple of examples of what you should never do unless you hate your audience.
Bidz.com welcomes me in Spanish (¡Bienvenidos!) because they detect my browser language, but asks me to confirm Spanish as my preferred language as well as if I want to continue browsing in this language. Would you mind just showing content directly in Spanish and stop annoying me please?
If, for any reason, I decide I want to navigate in English, let me change that. Period. You could save tons of clicks and make a much better user experience by avoiding obvious questions, don’t you think?
The next example is taking language handling idiocy one step beyond. UGG boots, besides being a crime against good taste in shoes (unless you want to look like Mazinger Z), has a site that works this way
- You land on the United States (English) version of the site, no attempt to detect/offer other languages automatically.
- You click on country selector hoping you can jump to another language quickly but you are sent to a country selector page (very typical error), where you can select (finally) Spain supposing it will be in Spanish.
- Ta-da! Site shows the Spanish flag but it is completely in English. Good job.
Make It Easy for Search Engine Bots
Describing all those troubles in detail would take a whole other post, but you know the consequences — all the money invested to make business international is wasted because search engines don’t rank anything they cannot index.
Language detection can be done at server side or browser side. I advise against the latter one for several reasons:
- The probability to mess up Web statistics doing redirections at browser are much higher.
In any case, we need to come up with some Web programming logic capable of achieving both goals and apply it to the server-side script language your site is created upon: PHP, .NET, Ruby, Java, Python or any other. That is the takeaway.
The user experience side of the coin is something you can test yourself. A nice way to test the bot side is using the “Fetch as Googlebot” at Google Webmaster Tools.
If your logic is not correct, and it’s not only redirecting users, but also bots after language detecting, you will see something like:
This shows Google bot is redirected to a non-existing “mysite.com//” and keeping the bot unable to index the site. In other words, a complete disaster.
Show All Content to Bots and the Right Language to Users
All Web programming languages have the required functions/variables to solve this issue, but let me use the more familiar (to me) PHP languages for the examples.
What can we get from somebody who is asking for a page in your site (HTTP request) before sending it to his/her browser? A couple of interesting things, the User Agent and the language of the browser:
- Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
- Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.91 Safari/534.30
- Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
- Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
- en-ES, es-US
First approach could be check the User Agent to verify if the visitor is human or a search engine bot, but this is a horrible idea:
- The list of bot’s user agents is longer than eternity: User-Agents.org, and you know what that means from a coding perspective, right?
- Technically, it is cloaking, and you could get a fantastic penalty and disappear overnight from the search engines. Yee-Haw!
So, reverse the logic; what do users/browsers have that bots don’t? Language. In an HTTP request from a bot, the language variable will return nothing. Voilà!
Yes, pretty straightforward, if the request has language, it is not a bot, it’s a human-like visitor and you know the browser language. Double win.
OK, OK, for accuracy, a couple of considerations:
- Detecting language does not mean the visitor is always human, but it’s a very high percentage.
- Browser language is not 100 percent accurate regarding language preference of the user, but again, there’s a good chance you’re going to guess.
Nothing is perfect, amigos, but this procedure can get you very close to 100 percent success on both your goals.
The Logic Behind the Scenes
After several tests, the following programming logic is behind the code; it works fine for me and improves conversions for multilanguage sites. It’s a combination of browser language detection at the server side and language preference stored in a cookie.
This ensures it’s transparent to search engine bots, they will find no barriers to crawl all languages available and users will enjoy a very comfortable experience. I hope you find these tips useful. Thoughts and experiences you would like to share are very welcome.
I want to thank Jessica Lee for inviting me to write for the Bruce Clay blog.
About Ani Lopez
Ani Lopez is SEO manager and Web analytics consultant at Cardinal Path. With a professional background that includes agencies in Europe and North America, Ani has lead SEO campaigns targeting several countries and languages for brands like Nokia, Mundia.com (an Ancestry.com company), Carrefour, Amadeus Travel and Solostocks, to name a few. You can follow Ani on Twitter @anilopez.