Roll Your Own Sitemaps in Rails
Sitemaps are an important SEO requirement for sites with content on lots of pages. This is especially true if the pages are frequently updated or hard for a search spider to navigate. Yet a surprising number of sites with dynamic content do not provide a sitemap.xml for search engines. (You can test this in a browser by simply adding “sitemap.xml” to the root URL.)
There are some really interesting options out there for managing sitemaps in rails. Sitemap (http://github.com/queso/sitemap/tree/master) is a full application in itself, with five controllers, three models, and tons of features. There are others that spider your site from the outside, as a third party tool would. And many variations on rake-driven tasks, or other processes outside Rails that must be automated some way or run manually.
Instead of having to run a separate task, here we are going to simply generate the sitemap only when it is requested. If it hasn’t changed, it will be served from the page cache. You may outgrow this technique if you have a very large sitemap, although using a sitemap index (allowing multiple smaller sitemaps) would help quite a bit in this case.
This solution is for people who want to:
- Customize what gets included and/or updated
- Stay up to date with frequently changing content
- Keep the code lean
- Set it and forget it (no rake tasks)
This article is based on how I implemented sitemaps for subreala.com, however in researching it, I (finally) stumbled on an article by Ilya G in 2006 where he describes essentially the same technique. I decided to go ahead and publish this as an update, since some things have changed in three years, though not as much as you’d think.
The first thing you’ll do is add a route to handle sitemap.xml:
map.connect 'sitemap.xml', :controller => 'portal', :action => 'sitemap'
Then in your controller:
caches_page :sitemap def sitemap @pages = Page.find :all respond_to do |format| format.xml end end
Then use builder to generate the XML. (views/page/sitemap.xml.builder):
xml.instruct! :xml, :version=>"1.0" xml.urlset('xmlns:xsi'.to_sym => "http://www.w3.org/2001/XMLSchema-instance", 'xsi:schemaLocation'.to_sym => "http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd", 'xmlns'.to_sym => "http://www.sitemaps.org/schemas/sitemap/0.9" ) do @pages.each do |page| xml.url do xml.loc page_url(:only_path => false, :name => page.link) xml.lastmod page.updated_at.strftime("%Y-%m-%d") xml.changefreq 'monthly' xml.priority '0.5' end end end
Check sitemaps.org for the full spec on the XML elements. The ones used above are reasonable defaults to start with.
The only thing left is to expire the cache when the sitemap should be regenerated.
This is the quickest, easiest way to keep your sitemap current if you’re serving a moderate amount of content. The cost of generating a new sitemap is only incurred when the sitemap.xml is requested by a spider and the cached version has expired. If your site grows to the point where you want to use a background task, the above method can be converted to be called from a rake task.