Duck Map Guide

Overview

In simple terms, an XML Sitemap is a list of the pages on your website. Sitemaps generated by this gem are accessible via HTTP using a url defined in config/routes.rb. The default sitemap can be found at: /sitemap.xml Sitemap content is constructed based on standard Rails named routes and the process of generating the content is very straightforward.

  • access a sitemap via a url such as: /sitemap.xml
  • get a list of named routes defined for the sitemap
  • generate URL nodes for each of the named routes.

That's it!!

Sitemap Definitions

Default Sitemap

Duck Map gem is designed as a Rails Engine. As part of the gem loading process, a default sitemap is configured via a config/routes.rb file located within the gem. The default sitemap is named :sitemap, has a path of: /sitemap.xml and contains all of the routes defined by your app. If no routes are defined, sitemap will produce empty xml content.

# the following example will produce a default sitemap and requires zero configuration.
MyApp::Application.routes.draw do

  resources :trucks
  root :to => "home#index"

end

contents of /sitemap.xml will look similar to the following:

<url>
  <loc>http://localhost:3000/trucks.html</loc>
  <lastmod>2011-11-03T06:44:25+00:00</lastmod>
  <changefreq>monthly</changefreq>
  <priority>0.5</priority>
</url>
<url>
  <loc>http://localhost:3000/</loc>
  <lastmod>2011-11-03T06:44:25+00:00</lastmod>
  <changefreq>monthly</changefreq>
  <priority>0.5</priority>
</url>

Using Sitemap Block Statements

You can control the contents of your sitemap using block statements and route filters. Sitemaps are defined in config/routes.rb by enclosing named routes within a sitemap block. The default sitemap is the equivalent of enclosing the entire contents of config/routes.rb in a sitemap block. The following example redefines the default sitemap by wrapping a few routes in a sitemap block. Notice that bikes is excluded from the default sitemap.

MyApp::Application.routes.draw do

  # notice that bikes is not included in the default sitemap.
  resources :bikes

  # here we are wrapping a couple of routes for inclusion in the default sitemap.
  sitemap do
    resources :cars
    resources :trucks
    root :to => 'home#index'
  end

end

contents of sitemap.xml will look similar to the following:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://localhost:3000/cars.html</loc>
    <lastmod>2011-11-03T07:35:00+00:00</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.5</priority>
  </url>
  <url>
    <loc>http://localhost:3000/trucks.html</loc>
    <lastmod>2011-11-03T07:35:00+00:00</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.5</priority>
  </url>
  <url>
    <loc>http://localhost:3000/</loc>
    <lastmod>2011-11-03T07:35:00+00:00</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.5</priority>
  </url>
</urlset>

Simple Sitemap Block Rule

You can include as many sitemap blocks as you desire. All sitemap blocks can point to the same sitemap name, multiple names, or any combination. The following two blocks are equivalent.

MyApp::Application.routes.draw do

  sitemap do
    resources :cars
    root :to => 'home#index'
  end

  sitemap :sitemap do
    resources :trucks
  end

end

In fact, if you were to use the above block definitions it would produce a single sitemap containing cars, trucks, and the root url by simply merging all of the definitions into one. It doesn't matter how many times you define a sitemap block. It will ALWAYS merges all routes into one sitemap definition.

The simple rule for using sitemap blocks is:

  • If a sitemap block has been defined, include everything within the block.
  • If no sitemap blocks have been defined, include everything in config/routes.rb.

In fact, the default sitemap is defined without a block and includes all routes in config/routes.rb. However, if you redefine the default sitemap using a block only those routes will be included.

Defining Multiple Sitemaps

You can group routes into multiple sitemaps. Simply pass a name to the sitemap block and a new sitemap route containing all of the routes you include within the block will be included in the sitemap. The following will produce three separate sitemaps.

MyApp::Application.routes.draw do

  sitemap do
    root :to => 'home#index'
    resources :faqs
  end

  sitemap :electronics do
    resources :cameras
    resources :laptops
    resources :desktops
  end

  sitemap :tools do
    resources :hand_tools
    resources :power_tools
  end

end

Namespaces

Namespaces are supported for sitemaps. Namespaces could be a great solution for logically grouping sitemaps for large sites. Sitemaps behave in the same manner regardless of namespace. The main difference is the path pointing to the actual sitemap content. The following example defines five sitemaps including the default.

MyApp::Application.routes.draw do

  root to: 'home#index'                   # default sitemap   /sitemap.xml
  resources :faqs

  namespace :products do

    sitemap do                            # /products/sitemap.xml

      resources :papers
      resources :pencils

      namespace :audio do

        sitemap do                        # /products/audio/sitemap.xml
          resources :accessories
          resources :head_phones
          resources :speakers
        end

      end

      namespace :video do

        sitemap do                        # /products/video/sitemap.xml

          resources :accessories
          resources :dvd_players

          sitemap :bluray do              # /products/video/bluray.xml
            resources :bluray_players
          end

        end

      end

    end

  end

end
  • The default sitemap includes the root url and FAQs.
  • Products sitemap includes paper and pencils.
    • Products/audio includes head_phones, speakers, and accessories.
    • Products/video includes dvd_players and accessories.
    • Products/video/bluray includes blu_ray_players. Notice that the sitemap block includes the name: bluray. If we would have excluded the name, the routes within the block would have simply been added to /products/video/sitemap.xml and /products/video/bluray.xml would never be defined.

You can find more information and demo apps at: (http://www.jeffduckett.com/blog/13/sitemaps-with-namespaces.html)

Nested Resources

I think the easiest way to explain sitemaps with nested resources is to explain the process from a high level.

When a sitemap is built:

  • The sitemap controller kicks in, builds a list of routes the sitemap should contain and loops through the routes.
  • for each route:
    • automagically determine the controller and model used for the route. if specified by the developer, use those values.
    • ask the controller for all of the attributes for the url node.
    • controller asks the model for attributes when building the actual canonical url for the node.
    • model responds by populating a Hash with key/value pairs representing the segments of a route.

Given the following configuration:

MyApp::Application.routes.draw do

  resources :books do
    resources :comments
  end

end

Would have a named route like:

book_comment GET    /books/:book_id/comments/:id(.:format)      comments#show

The mechanism used to acquire these values is: sitemap_capture_segments. You can see that the "show" action for the route has segment keys :book_id and :id. When the url is built, the Comment model is "asked" for the segment key values for the route by a call to sitemap_capture_segments. The model is passed an Array containing all of the keys required to build the url for the named route. In this case, the default behavior would be to ask the Comment model for :book_id and :id.

The following is an example of the output that could be produced:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://localhost:3000/books.html</loc>
    <lastmod>2013-02-20T15:46:13+00:00</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.5</priority>
  </url>
  <url>
    <loc>http://localhost:3000/books/1.html</loc>
    <lastmod>2013-02-20T15:46:13+00:00</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.5</priority>
  </url>
  <url>
    <loc>http://localhost:3000/books/1/comments/1.html</loc>
    <lastmod>2013-02-20T15:46:35+00:00</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.5</priority>
  </url>
</urlset>

If you are following Rails conventions, DuckMap should be able to pickup most nested routes. However, we are not always afforded the luxury of adhering to all conventions. Therefore, you have the option of defining segment key mappings.

class Comment < ActiveRecord::Base

  sitemap_segments book_id: :my_book_id

end

When the url is built, the model object will be asked to provide values for: :my_book_id and :id and map those values back to the real :book_id and :id when the url is built.

You can find more information and demo apps at: (http://www.jeffduckett.com/blog/14/sitemaps-and-nested-resources-using-duckmap.html)

Route Filters

Filters give you the power to exclude named routes from a sitemap based on verbs, controller name, action name, and the name of the route (i.e. root_url). For the most part, named routes represent controllers and models. It is doubtful that you would want to include any URL in a sitemap that performs an HTTP POST, PUT, or DELETE operation. Therefore, the default named route filters exclude all routes and have a verb of POST, PUT, or DELETE. Also excluded are all named routes that have an action name of: :create, :destroy, :edit, :new, and :update. Basically, this leaves actions :index and :show. These default setting should cover just about general needs for generating a sitemap. However, you can reset all of the filters to empty and include everything or exclude only what you need. Also, named route filters are local to a sitemap block. Meaning, you can define default filters at the top of config/routes.rb and define addtional filters within as many sitemap blocks as you wish. When you define exclude filters within a block, the current state of the exclude filters outside of the block is copied and used within the block.

Note: Filters ARE NOT applied DURING the execution of a block. Filters are applied to the list of routes that are generated AFTER the block executes.

  • filters are applied to routes contained within a block AFTER the block executes.
  • location of filter statements is irrelevant.
    • top or bottom of the entire file config/routes.rb (applied to the entire file).
    • top or bottom of the block (applied to the entire block).
  • filters can be place directly on a route, resources statement
  • you can include / exclude the following things: actions, controllers, names, verbs

The following example applies filters to the entire config/routes.rb file.

MyApp::Application.routes.draw do

  exclude_actions :show     # exclude all "show" actions from all routes.
  root to: 'home#index'
  resources :contact
  resources :faqs

  # you could put the exclude_actions method here and get the same result.
end

The following example encapsulates and applies filters to a sitemap block, however, contents are included in the default sitemap.

MyApp::Application.routes.draw do

  # included in default sitemap
  root to: 'home#index'
  resources :contact
  resources :faqs

  # using a block to encapsulate, however, contents are included in the default sitemap.
  sitemap do

    exclude_actions :show   # excludes the "show" action from every route witnin the block
    resources :dvd_players
    resources :accessories
    # you could put the exclude_actions method here and get the same result.
  end

end

Values And Attributes

Values are generated for two main areas:

  1. Url nodes of a sitemap
  2. Meta tags of a page.

How those values are generated depends on the handler generating them, however, all of the handlers follow a general procedure. Start with default values and continue to merge new values until all of the options and logic for the given handler have been completely exhausted.

For example, when generating sitemap or meta tag values for the index action of a controller, the index handler will:

  • grab the global default values and put them into a Hash.
  • ask the controlller for any values that it needs to overwrite and merge them with the Hash.
  • if the first_model option is true, then, it will find the first model on the controller, ask it for values to overwrite and merge them with the Hash.
  • and so on.

The index handler actually does a little more work than described, however, the overall point is that all of the handlers will start with global defaults and overwrite values as per handler behavior and configuration. The intent is to give the developer the power to fine tune the content that is generated for any particular sitemap url node or meta tag down to the lowest possible levels.

Setting Global Default Values And Attributes

Global default values and attributes are set in config/routes.rb. See DuckMap::ConfigHelpers for a list of all of the available helper methods to set values and attributes within config/routes.rb file.

MyApp::Application.routes.draw do
  title "My Rails App"              # default title for meta tags
  lastmod "02/04/2013"              # default last-modified date for sitemap and meta tags
  canonical_host "www.example.com"  # default canonical host for sitemap and meta tags.
end

There are three methods used to configure values and attributes for global configuration, controllers and models.

  1. acts_as_sitemap - sets all values and attributes including handlers and segments.
  2. sitemap_handler - wrapper method for acts_as_sitemap.
  3. sitemap_segments - wrapper method for acts_as_sitemap.

Two of the methods are really just convenience methods to help reduce code clutter. All of the handlers are capable of being configured globally as well. Let's say you have a standard column on all of your tables named "tags" and it is used to store information that would work well for the keywords meta tag. You can easily tell all of the handlers within your app to use the "tags" attribute from a model by using the following configuration.

MyApp::Application.routes.draw do
  acts_as_sitemap keywords: :tags
end

Setting Values And Attributes Directly On The Controller And Model

You can fine tune the values that are generated by the handlers by using acts_as_sitemap, sitemap_handler or sitemap_segments directly on a controller, model or both. The syntax is the same regardless of where you use it. The following example configures a global title that is overwritten with a static title String for all index actions and grabs the "my_title" attribute from the first model found on the controller during a show action.

MyApp::Application.routes.draw do

  # default title for meta tags
  title "My Rails App"

  # this would overwrite the global title "My Rails App"
  # if the current object has the "common_title" attribute, otherwise, it would use "My Rails App"
  acts_as_sitemap title: :common_title

end

class BooksController < ApplicationController

  # this would overwrite the global title "My Rails App"
  acts_as_sitemap :index, title: "Books Listing"

  # this would overwrite the global title "My Rails App"
  # if the current object has the "my_title" attribute, otherwise, it would use "My Rails App"
  acts_as_sitemap :show, title: :my_title

end

Meta Tags

A feature of DuckMap is synchronization of data between sitemap and meta tags. Meaning, data contained in a sitemap should match meta tag data within the HEAD section of a page. Handlers are the single mechanism used to provide data to both sitemaps and meta tags. sitemap_meta_tag is a helper method that provides title, keywords, description, last-modified, and canonical url. Existing Rails apps may already have code in place that provides some or all of the data items generated by sitemap_meta_tag and could potentially force you to write some hack to accomodate both. So, you have the opiton of using sitemap_meta_tag to generate all meta tag items or you can call individual methods to generate the only data you want.

# app/layouts/application.html.erb
<!DOCTYPE html>
<html>
<head>
  <%= sitemap_meta_tags %>
</head>

# would generate something similar to the following:
<head>
  <title>My First Book</title>
  <meta content="Rails, Ruby on Rails, Ruby, Programming" name="keywords" />
  <meta content="This is a short description of my application." name="description" />
  <meta content="Wed, 01 Jan 2014 07:07:00 UTC" name="Last-Modified" />
</head>

# here we are using a couple individual methods
<!DOCTYPE html>
<html>
<head>
  <%= sitemap_meta_title %>
  <%= sitemap_meta_lastmod %>
  <%= sitemap_meta_canonical %>
</head>

Logging

Duck Map keeps a detailed log during the generation of a sitemap. Use this log to determine why a route is or is not being included in a siteamp. The default log level is :info. Switch to :debug to see every step in the process. The location of the log file is in the standard Rails log directory and the file is named: duck_map.log.

Generators

duckmap:sitemaps

The following rails generator will display all of the sitemap routes defined in config/routes.rb

rails g duckmap:sitemaps

Sitemap routes
route name                               controller_name#action_name
                                         path
---------------------------------------------------------------------------------------------------------------
sitemap_sitemap                          sitemap#sitemap
                                        /sitemap.:format

duckmap:static

duckmap:sync

Sychronizing with .git and local files on disk. You have the power to include any route within a sitemap. There is no requirement to have a model associated with any given route. In fact, you may have a route with several actions that contain some type of content where it makes sense that the last time the actual view file for an action represents the actual last modified date for that action. There are two ways to synchronize last modified dates for an application.

# via rails generator
rails g duck_map:sync

# via rails task
rake duck_map:sync

The acutal timestamp values are stored in a locale at: config/locales/sitemap.yml Below is sample content.

--- 
:sitemap: 
  home: 
    index: 10/24/2011 02:03:51

When a Rails app needs a lastmod for a controller it will look to this file as the source for static values. Running the generator or rake task will populate this file for you, however, you have the option of manually editing this file. The synchronization process uses the values in this file when deciding which timestamp to use as the last modified date for an actions view file. The decision is very simple. The synchronization process will grab the timestamp from the actual view file for the action on local disk, grab the timestamp of the same view file from a .git repository, and grab the timestamp of the view file from config/locales/sitemap.yml, then, it will compare all three values. The value that is the latest date/time wins the fight and that value is stored for the action inside config/locales/sitemap.yml

It is up to you to decide when to run the synchronization process. If you are using a .git repository, then, you would have to commit all of you files, run the synchronization process, then, commit again. It may be an option to run the synchronization process during some type of central deployment procedure or even run it on the production server.

Mongoid Support

Recently, I added support for mongoid. All that should be needed is to require "duck_map_mongoid" when including the duck_map gem in your app.

# add duck_map with mongoid support
gem "duck_map", required: "duck_map_mongoid"

Demo applications

You can find articles and demo apps at: http://jeffduckett.com/blogs.html

Copyright (c) 2011 Jeff Duckett. See license.txt for details.