This article is currently only available in english
The Apache web server has a really nice feature for multi-language sites: Multiviews.
This is a content negotiation system where the web server tries to serve the
user a page in what it thinks is the best matching language version, under the
same URL. The user just specifies a list of preferred languages in his browser,
or uses the preconfigured default - which is usually the browser language. And
all that works with otherwise static content, no need for PHP or a heavy CMS.
Also for the web developer, mutliviews is a good thing because it allows you
to have multiple language versions of a document sitting next to each other,
just having a different namings: article.en.html, article.fr.html, article.de.html
and so on. This simplifies the directory tree somewhat.
But... if you search the internet for multiviews, you'll often find negative comments, the baseline being "Multiviews is evil, don't use it". But why? Well, here are the usual arguments against multiviews
On the other hand, there are some important advantages of multiviews:
While clean URLs and ease of maintenance are nice-to-haves, the fact that no PHP or similar is required can have a dramatic impact on your web site speed, especially when PHP is running as a CGI module. This combined with extensive caching gives you a really fast site with low CPU and bandwidth requirements.
Think of multiviews as a really lightweight built-in multilanguage CMS.
So can multiviews be enhanced such that it becomes user and robot friendly? For me, this is a clear YES. But it needs some tweaking to work out some limitations first.
Let's look at some scenarios: By default, when a user hits the site, he will see a language version that his browser indicates. A savvy user may have the list of acceptable languages set up properly, and thus will see what he wants to see. No problems here. The occasional user, with his default installation, will normally see the language that his browser indicates or the default site language. This user absolutely needs a language selection bar, best presented in a well visible location with country flags to click on. So we are looking at a manual language selection feature. This manual language selection must override the selection imposed by the server.
A search robot however does not support content negotiation, and would only
see the default language version. Normally, you'd want all language versions
of your site indexed, and that can only be achieved by presenting all versions
as separate URLs. Either - for Googlebot at least - as a query string in the
form ?lang=XY
, or, much better and generally recommended, in the
form mydomain.com/en/some/article/today
versus mydomain.com/de/some/article/today
.
And here comes the magic, which requires Apache 2.0.40 or higher, with modules mod_rewrite, mod_env, mod_header, mod_mime, mod_negotiation enabled.
The site is organized in the form mydomain.com/some/article/today, where "some"
and "article" are subdirectories, and "today
"
would be a html (or php) file, that is actually named "today.html
"
for the default version, or "today.en.html
" for the english
version, and so on. With multiviews enabled, the server will find the correct
file. In your .htaccess, do the following:
# Enable Multiviews option for multiple language support
Options +MultiViews
# DirectoryIndex: MultiViews also looks for .png and .gif, this enables
# a language dependent display of images. E.g. flag => flag.lb.png
DirectoryIndex index index.php index.html index.htm index.png index.gif
# Add standard LB language (Apache uses the now obsolete ltz identifier)
AddLanguage lb .lb
# also add all other languages, don't trust the apache base config!
AddLanguage en .en
AddLanguage de .de
AddLanguage fr .fr
AddCharset latin1 .lb
AddCharset latin1 .en
AddCharset latin1 .fr
AddCharset latin1 .de
# Default .html is in this language:
DefaultLanguage lb
# some defaults....
LanguagePriority en lb de fr
ForceLanguagePriority Fallback
The manual language selection override for the human user is done by presenting
a link to the same page with a simple and sweet query string in the form ?lb
.
When that link is clicked, the following rewrite directive takes care:
# 1. if there is only a lang code on the query string, make this the new
# virtual base language directory and redirect accordingly.
# Search robot safe!
RewriteCond %{QUERY_STRING} ^(lb|de|en|fr)$
RewriteRule ^((lb|de|en|fr)/)?(.*?)(\.(html?|php|lb|de|en|fr))*$ \
http://localhost/fellerich/%1/$3? [E=LANG:%1,R=302,L]
What happens here is that a query string with only a language code set triggers
a rewrite to a language specific subdirectory. In the example, www.mydomain.com/some/article/today?en
would be redirected using a temporary redirect to www.mydomain.com/en/some/article/today
- and also note how the query string got removed.
Since the /en
subdirectory with the english language version does
not exist physically, let's take care of it:
# 3. make virtual language subdirectories and set a temp ENV var
RewriteRule ^(lb|de|en|fr)/(.*) $2 [E=LANG:$1]
# mod_rewrite prepends REDIRECT_ to each set Env Var
SetEnvIf REDIRECT_LANG de prefer-language=de Header append Vary cookie
SetEnvIf REDIRECT_LANG en prefer-language=en Header append Vary cookie
SetEnvIf REDIRECT_LANG fr prefer-language=fr Header append Vary cookie
SetEnvIf REDIRECT_LANG lb prefer-language=lb Header append Vary cookie
This rule removes the virtual language identification subdirectory from the
request and stores the language selection in an environment variable. The content
of this variable is then copied to the magic environment variable prefer-language
.
Notice how the rewrite rule stores the data into the LANG
variable,
which is later accessible as REDIRECT_LANG
.
The magic env var prefer-language
overrides the language selection in multiviews: whatever the browser requested
in its Accept-language
headers is ignored, and the variable value
is taken instead. If this language version is not available, the server will
still serve the default file instead of a 404 not found.
So what we have achieved up to now is a set-up that has n+1 views, where n
is the number of supported languages. There is the base view, which has the
automatic language selection feature for the human user with a well configured
browser, and there are the n language specific views with the two-letter virtual
language code directory in the root directory. A search bot will see the mandatory
language selection links in at least the main index document and hopefully follow
them, thus indexing all language versions.
A human guest can browse all views separately and switch freely between them.
In all this, there is still the problem of duplicated content. The default
view is identical to one of the language views, depending on which language
is set as default. For bots, a simple solution is to avoid indexing the default
view, i.e. explicity allow only /index
(note the absence of the
html extension!) and the /en
, /de
, /lb
and /fr
subdirectories.
For the human user, there is no such blocking feature, so here's another proposal: Use cookies and javascript to select a language version. Setting a cookie following a click on a link is easy - assuming jQuery is used:
$('#language-select a').click(function(){
// Language Changing Link: location + ?lang [ + optional anchor ]
var q = this.href.match(/\?(\w{2})(#.*)?$/);
// must have found a 2 letter language code and cookies
if (!q || !navigator.cookieEnabled) return true;
// set the cookie to the desired language
document.cookie = "lang="+q[1]+";path=/";
// anchor in language link found? then update the anchor
if (q[2]) window.location.hash = q[2];
// and then simply refresh the page!
window.location.reload();
// prevent default action
return false;
});
In .htaccess, a rather simple logic is used to read that language cookie and override whatever language was selected like so:
# Language selection override via cookie (cookie set by JS or similar)
SetEnvIf Cookie "lang=(.+)" prefer-language=$1 Header append Vary cookie
But that leads to a nasty side effect: if the user hit a page in the /en
directory and manually selects the luxembourgish language, he will see the luxembourgish
version inside the /en
subdirectory, which is misleading. So we
need to fix that as well - this code goes between the rewrite snippets 1 and
3:
# 2. If a language cookie is set, don't bother with the virtual subdir
# and redirect to the main representation, and use the cookie setting
RewriteCond %{HTTP_COOKIE} "lang=(.+)"
RewriteRule ^(lb|de|en|fr)/(.*) http://localhost/fellerich/$2 [R=302,L]
This rewrite detects the presence of the lang cookie and, if the request is inside a virtual language directory, redirects back to the main view.
Why that? Well, if the visitor doesn't have JS enabled, he can still browse the site and access all pages, but his language selection is visible in the URL all the time. And if he decides to send a link to a friend, that link contains the language selection already. But if a visitor has JS and session cookies enabled, he will only see the language agnostic URLs, and if he sends a link to other people, they will see their preferred language at once.
Last instalment: Caching. Since static files become somewhat dynamic due to multiviews, you must make sure that language specific static content is at least revalidated on every access, otherwise it just won't work properly. Put this code at the top of your .htaccess file:
# Detect IE first
BrowserMatch MSIE MSIE=1
# Cache Control: 60 minutes for static content
<FilesMatch "\.(html|css|js|jpe?g|gif|png)$">
Header set Cache-Control "max-age=3600, public"
</FilesMatch>
# add must-revalidate for dynamic content (even html becomes dynamic due to MultiViews!)
<FilesMatch "\.(((lb|de|en|fr)\.\w+)|(s?html?|php))$">
Header set Cache-Control "max-age=0, private, must-revalidate"
# conditional headers for IE. Guess why.
Header set Cache-Control "no-cache, no-store" env=MSIE
Header set Pragma "no-cache" env=MSIE
Header set Expires "-1" env=MSIE
</FilesMatch>
The first bit sets all static content to 1 hour cache lifetime, whereas the second overrides this for everything which looks dynamic.
A site which is fast, accessible, SEO friendly, and which is easy on CPU and
bandwidth.
A dynamic site which can be built entirely on static files.
A site which does doesn't need PHP, although you can put PHP files in as well
- one per language. It might take some more rewrite tricks to pass the selected
or negotiated language on to PHP.
This site itself.
When you're seeing to the main page, multiviews has already chosen a language for
you. If you don't agree with that choice, select another language from the top
right language selection menu. What happens then depends on your browser settings:
Javascript | Cookies | Action |
---|---|---|
On | On | The click on the link, which actually reads /lb/index?lb will be
caught by the javascript code which sets the language session cookie,
and then reloads the page without the query string.The server will redirect back to /index and not to /lb/index , because
the language information is kept in the cookie now. We have cleaner, i.e.
language agnostic URLs again.
|
On | Off | After the click, the JS code will detect that cookies are
unavailable, and will fallback to the default action - the browser
will request the page /lb/index?lb , and the server
will respond with a redirect to /lb/index - same page,
but with the language query string stripped. |
Off | On | No JS, no Cookies. Same as in the previous case. |
Off | Off | Again, same case - the selection logic is entirely on the server side. This is what happens when a search engine spider reads those pages - it sees several separated sites in different subdirectories. |