Apache Virtualhosts for SEO
If you manage more than one domain on an IP address, you are probably using Apache name-based VirtualHost sections to send incoming requests to the correct application handlers. Name-based vhosts are used when multiple domains/subdomains are being handled on the same IP address and port, and need to be handled differently in the virtual host section.
What you may not realize is that an improperly configured VirtualHost setup can cause serious Search Engine Optimization (SEO) problems, even if everything appears to be working perfectly. The following code walk-through shows common issues to look for, and simple recipes to get you on the right track. The techniques shown here apply to any kind of applications you are serving. I have not included any proxy/balancer/rewrite rules you might use to support your application. Applying the techniques below to your existing virtual host sections should be straightforward in all cases.
Handling unused domains and subdomains
The first problem to guard against occurs when a domain or subdomain is not handled by any of your VirtualHost sections. In this case, the first VirtualHost section acts as a catch-all to handle anything not handled already by a later section.
Say, for example, you have the following DNS A records:
example.com - A - 1.2.3.4 example2.com - A - 1.2.3.4 unused.com - A - 1.2.3.4
And your virtual hosts are set up as follows:
virtualhosts.conf:
#enable name-based vhosts NameVirtualHost *:80 # handle example.com requests # this is also the default handler <VirtualHost *:80> DocumentRoot /var/www/example ServerName example.com </VirtualHost> #handle example2.com requests <VirtualHost *:80> DocumentRoot /var/www/example2 ServerName example2.com </VirtualHost>
When Google indexes your sites, you risk getting search results that look like this:
Example.com
example.com, example.net, and example.org are second-level domain names
reserved by the Internet Engineering Task Force through RFC 2606
unused.com
The problem is that while indexing unused.com, Google found the content for example.com. Until the Virtual host setup is fixed, searches for example.com will lead people to unused.com while serving the actual content for example.com. This is a very difficult problem to detect and prevent as long as the default virtual host section points to a valid site. This is hard to catch because, by definition, the domain name that causes the problem is one not in use. Therefore it is likely that none of your tests will catch the problem. The test results for example.com will look fine, and you probably have no tests for unused.com since it is not being used.
The solution is to use the default virtual host to point to an error page that returns an HTTP Response 503: Service Unavailable. This response code lets the crawler know 1) not to continue crawling, and 2) to try to come back later. Both are important for SEO, since you do not want the current content from the error page to get indexed, yet you probably don’t want the crawler to avoid it forever.
The revised virtual hosts look like this:
virtualhosts.conf:
#enable name-based vhosts NameVirtualHost *:80 # default handler points to error doc <VirtualHost *:80> DocumentRoot /var/www/unknown ServerName example.unknown </VirtualHost> # handle example.com requests <VirtualHost *:80> DocumentRoot /var/www/example ServerName example.com </VirtualHost> # handle example2.com requests <VirtualHost *:80> DocumentRoot /var/www/example2 ServerName example2.com </VirtualHost>
The error page uses a few lines of PHP to set the 503 HTTP Response:
index.php:
<?php header("HTTP/1.1 503 Service Temporarily Unavailable"); header("Status: 503 Service Temporarily Unavailable"); header("Retry-After: 7200"); ?>
Handling URLs with and without “www”
The next common issue is including both example.com and www.example.com in VirtualHost sections, but not specifying a HTTP Response 301: Permanent Redirect for one of them to point to the other. If not correctly using a 301 redirect, search engines may penalize you for apparently having identical content on two domains. (Search engines treat example.com and www.example.com as two independent URLs.) Also, your PageRank may suffer since each URL is only receiving credit for a portion of your total traffic and inbound links.
Here is our example with proper redirects included:
virtualhosts.conf:
#enable name-based vhosts NameVirtualHost *:80 # default server points to error doc <VirtualHost *:80> DocumentRoot /var/www/unknown ServerName example.unknown </VirtualHost> # handle example.com requests <VirtualHost *:80> DocumentRoot /var/www/example ServerName example.com </VirtualHost> # redirect www.example.com requests <VirtualHost *:80> ServerName www.example.com RedirectMatch 301 (.*) http://example.com$1 </VirtualHost> # handle example2.com requests <VirtualHost *:80> DocumentRoot /var/www/example2 ServerName example2.com </VirtualHost> # redirect www.example2.com requests <VirtualHost *:80> ServerName www.example2.com RedirectMatch 301 (.*) http://example2.com$1 </VirtualHost>
In the above example, RedirectMatch matches all URLs for www.* and redirects with a 301 (Moved Permanently) response, which lets the search engines know to combine their data for the two URLs and update their outbound links to point to the preferred URL.
Site maintenance page
Lastly, if you use a special page while your site is offline or undergoing maintenance, this page should also include the above PHP code to return HTTP Response 503: Service Unavailable. Otherwise, an untimely visit from the Google robot could cause this:
Example.com
Sorry we’re offline for some maintenance. We’ll be back online soon…
example.com
The 503 response code is preferable to using a robots file to block the spider. Since the 503 response specifically lets the spider know the site is offline, the currently cached search content is still valid, and the spider should check back after the retry duration. Blocking search robots with the robots file, or other techniques, could cause your current index content to be purged, and your search engine placement to suffer.

