If you’re cleaning up your user’s input in your views you’re doing it wrong

Have you ever found yourself using the “h” view helper all around your views in your applications? Have you ever thought that cleaning up user input in views is a tedious, error prone and cumbersome job?

You’re not alone.

Think with me, the user provides information once to your application, that information could be badly formatted, could be an XSS attack, but you store it as the user provided in your database. When you’re about to show that information, something that could happen once or a hundred of times (you probably would like to have thousands of page views, wouldn’t you?) you finally clean it up, instead of cleaning it up just once when the user provided it.

Insane, heh?

What about stopping with this waste of CPU cycles and cleaning the data once and for all?

Don’t worry, you don’t have to do anything, it’s already done and sorted out for you with this dead simple plugin. The params_sanitizer plugin uses Rails own sanitizers to clean the user input when it’s first provided on form POSTs and PUTs (what? do you change your application/database state with GET calls? OMFG!). You can protect all calls to all controllers, protect all actions in a single controller and even protect specific actions in a single controller, it’s your call!

First step is to install the plugin:

ruby script/plugin install git://github.com/mauricio/params_sanitizer.git

With the plugin installed, you can start cleaning up user input in your application once and forgetting about it forever, here are the examples:
Stripping tags from all params in all actions (remember, only POST or PUT actions will really be changed ):

class ApplicationController < ActionController::Base
  strip_tags_from_params #strip_tags users rails full_sanitizer
end

Stripping tags from all params for all actions in a single controller:

class NewsStoriesController < ApplicationController
  strip_tags_from_params
end

Stripping tags from all params for specific actions in a single controller:

class CommentsController < ApplicationController
  strip_tags_from_params :only => [ :create, :update ]
end

If instead of stripping all tags, you’d just like to use the simple sanitizer (it removes bad tags like but would leave others intact, uses rails white_list_sanitizer):

class ApplicationController < ActionController::Base
  sanitize_params
end

class NewsStoriesController < ApplicationController
  sanitize_params
end

class CommentsController < ApplicationController
  sanitize_params :only => [ :create, :update ]
end

This plugin depends only on Rails default sanitizers, so you don’t need to install anything else to have it working.

Now, as the data is cleanly stored in your database, you don’t have to waste CPU cycles cleaning up data in your view layer (and you can even say that you’re more adherent to the MVC, as cleaning up user input was never one of it’s jobs).

Building your own ActiveRecord validation macros with validates_each

A common task when writing your own Rails applications using ActiveRecord is creating your own validations for your models. While it’s perfectly correct to add the validation directly into the model you’re going to need it, sometimes you’d like to reuse the same validation logic in other models and we’re not really going to do a cut-and-paste here are we?

The simplest solution when you’re validating fields (and not the whole model) is to use the validates_each method, as it has some nice features seen in other validations that might interest you as the :if, :unless, :allow_blank and :allow_nil options.

Our custom example validation is to validate that one or more fields are different from one specific field. Imagine that you’re building an invoices application, having the seller to also be the buyer isn’t really what you’re looking for, so that’s why we’re building this validation. Let’s take a look at the validation code:

ActiveRecord::Base.class_eval do

  def self.validates_different( *args )

    options = args.extract_options!
    raise "You must define a :field option to compare to" if options[:field].blank?

    validates_each(*(args << options)) do |record, attribute, value|
      if record.send( options[:field] ) == value
        record.errors.add(
          attribute,
          record.errors.generate_message(
            attribute
            'different',
            :field => record.class.human_attribute_name( options[:field].to_s ) ) )
      end
      true
    end

  end

end

We have inserted a static method inside the ActiveRecord::Base class to be our validation macro, it takes a list of parameters and an options hash at the end, here’s a sample of how it would be used:

class Invoice < ActiveRecord::Base
    validates_different :seller_id, :field => :buyer_id, :allow_blank => true
end

The validation looks just like any other ActiveRecord validation and even uses options well known in them like :allow_blank, keeping the principle of least surprise at bay. It’s also important to notice the use of I18N on the validation message, the “’activerecord.errors.messages” namespace is the ActiveRecord error messages namespace and that’s where you should add your custom validation messages, do not place the messages directly inside your validation or model code. Here’s how the YAML file would look like:

en:
  activerecord:
    errors:
      messages:
        different: “must be different than {{field}}”

And there you go, you have built your own validation macro for your ActiveRecord models and even used the I18N helpers to keep the messages away from your model code.

Accessing the current request object on your mailer templates to generate links

A common issue with mailer templates is that as they’re not being called from a controller you can’t get your hands on the request object and access properties like host_with_port. While you’re usually calling the mailers inside controllers and you could possibly hand the request as a parameter to it, it isn’t really nice to do this every time you need to send an email.

So, if you’re looking for a quick and easy solution to this issue, the current_request plugin is your friend, you can install it by calling:

ruby script/plugin install git://github.com/mauricio/current_request.git

The plugin works by setting the current request in a thread local variable that will be available until the end of the request, which means that you can use it safely in your templates, two new methods are added to all views, current_request, that returns, obviously, the current request being answered and current_host that will build the current host with port and protocol for you. Examples:

< %= link_to 'Home', "#{current_request.protocol}#{current_request.host_with_port}/home" %>

Or you can just use a shorthand to the current host:

< %= link_to 'Home', "#{current_host}/home" %>

You can also use it wherever you want to access the current request (and not only on templates) by calling:

CurrentRequest::Holder.current_request

Why I am not using Masochism for my master-slave setups and why monkey-patching isn’t the only solution

I got a message this morning from Gregg at Ruby5 asking why I wrote the master_slave_adapter plugin instead of using Technoweenie’s Masochism and I think the answer to this question deserves a little blog post (and the blog really needs some new content :P).

When building the Talkies project we had to setup a master-slave environment using MySQL at the production servers. To get these things up and running I configured the replication on MySQL and set out to find a solution on Rails/ActiveRecord to handle this special need, all SELECT* statements should be sent to the slave db while all other commands should be sent to the master. The only solution available at the time was Masochism (at least it was the only one I could find).

With Rails 2.1, everything looked like we would live happily ever after, but Rails 2.2 brought a lot of changes and many of them on ActiveRecord, the main one being connection pooling and we upgraded. The production server, that wasn’t really live yet, broke badly, the new connection pooling code made the application crazy and the slave was receiving UPDATE* and INSERT* calls ( this was the code at the moment ).

With this new issue showing up I set out to find a solution, the first thing was to hack the plugin itself (as github had no “issues” thing at the moment). Trying it out I couldn’t really find a simple fix and wasn’t really happy with the way the plugin worked, looked a lot like a hack when a hack wasn’t really needed, so I started to write my own solution.

The first requirement was that it should perform no black magic at all, we were burned more than once during the project by plugins that were too clever and relied heavily on monkey-patching, so my solution had to be really straightforward and do as little clever things as possible.

But hey, active_record needs a database adapter, so why not just build a fake database adapter that forwarded the work to a master or slave connection depending on the method called? This way I would never need to hack ActiveRecord, as the thing would just be a common database adapter, like all the others and the plugin would survive to Rails upgrades with little or no changes. And that’s exactly what I did, an ActiveRecord database adapter who’s job is to route method calls to a real master or slave connection.

Why was it an improvement?

By relying on the ActiveRecord database adapter contract I had no need to monkey-patch Rails itself, it would just work, even if Rails or ActiveRecord got upgraded, the only thing that would make me change the plugin was if the database adapter contract got changed and this isn’t really something that changes a lot.

And if there’s one thing that’s burning a lot of people using plugins and Rails itself is clever code and too much monkey-patching. When you’re building a solution that’s going to be “inserted” inside someone else’s codebase that you don’t even know how it’s going to look like, you better try to avoid changing too many things or breaking well known contracts, you might end up with bugs that are hard to discover and kill. And they’ll surely make you waste a lot of your time.

Monkey-patching and class-redefinition are some of the coolest features of Ruby as a language, but they should be used with care and are better avoided if possible.

Setting up your Ruby on Rails application in an Ubuntu Jaunty Jackalope (9.04) server with Nginx, MySQL, Ruby Enterprise Edition and Phusion Passenger

There are many ways to deploy and run Ruby applications with the Ruby on Rails framework but it’s unlikely that you’re going to find a simpler and faster solution than using Ruby Enterprise Edition (REE from now on) with Nginx and Phusion Passenger. Nginx is a fast, scalable and lightweight HTTP server, that is able to serve a lot of content without using up all your memory and Passenger is a module that can be tied into Apache or Nginx to handle your Ruby (and RoR) applications automatically.

When using Passenger you don’t need to worry about managing a pack of Mongrels or use a proxy HTTP server, Passenger lives inside your web server and just takes care of everything for you. Here you’ll learn how to use Passenger in conjunction with Nginx to deploy your applications in the wild.

This tutorial assumes that you’re building a brand new Ubuntu server with none or little custom packages installed. Does this mean you can’t use this with an already customized server? No, but it’s easier if you can follow it step by step to avoid problems, as this has already been tried and tested to be sure that it works. We’re using MySQL here because it’s what I’m using right now but can easily change the apt-get calls to use whatever database you’re using yourself.

Setting up users

If you’re really starting up from a brand new install with no users created beyond the default ones you might want to create a user for yourself so that you don’t need to be logged in as a “root” forever. To create a new user in a Linux box the command is “useradd”:

useradd -m -g staff -s /bin/bash mauricio

This will create a user called “mauricio” with a “/home/mauricio” home directory (as defined by the “-m” param), with “staff” as it’s default group and using the “/bin/bash” shell. After creating a user for yourself you might also want to create a user for the application you’re deploying or a “deployment” user. This is the user that’s going to be used to deploy the application and run all application related processes. Just use the same command above changing the username to your deploy user, this can be the name of the application you’re deploying or just “deploy” (keep all your users belonging to the same “staff” group to avoid file permission issues when you edit or create files).

After doing this you can also make all users that belong to the staff group be able to use the “sudo” command. To do this just open the “/etc/sudoers” file with a text editor (I usually use “nano”) and add this line:

%staff ALL=(ALL) ALL

Setting up your ssh keys

If you’re running in a Linux/Unix box and haven’t generated your SSH keys, it’s time to do it. If you have never heard of them, SSH keys can get you to login into servers where you have a user account without asking you for the password, which is really cool if you have to handle a lot of servers at the same time (and if you don’t want to type passwords every time you do a “git pull|git push”. To generate them do this as your local user in your local machine:

ssh-keygen -t dsa

This command will create a folder called “.ssh” in your home directory (as in “/home/your_user/.ssh” with a bunch of files. It will ask you for a password to protect these files, the password isn’t required but it’s nice to be cautious here as if you don’t set a password anyone with physical access to your machine (or can log in as you) could log in into all machines to where your SSH keys were copied to.

Now that the keys are already generated, you can copy them to the servers you usually log into, to do this, first log in to the server using your account and at your user’s home folder create a .ssh folder:

mkdir ~/.ssh

Log off and, from your local pc, copy the ~/.ssh/id_dsa.pub file to the remote machine using scp:

scp ~/.ssh/id_dsa.pub your_remote_user@host:.ssh/authorized_keys2

This will copy your public key to the remote server and you’ll be able to log into that server from your current local machine and local user to the user you copied the key to in your remote server. You can obviously copy this to as many servers and user accounts as you like and none of them will ask you for a password again.

Getting Ubuntu up-to-date

First thing we need to do is to be sure that our server is up-to-date with the currently installed software:

sudo apt-get update
sudo apt-get upgrade

Then we need to install some basic libraries and MySQL:

sudo apt-get install build-essential mysql-server libmysqlclient15-dev libmagickcore-dev imagemagick libpcre3 libfcgi-dev libfcgi0ldbl libxml2-dev libxslt1-dev -y

This is going to install the MySQL server, the ImageMagick library to handle image processing and the XML and XSLT libraries needed for some common gems like Nokogiri.

Installing Ruby

We’re not going to use the default Ruby interpreter that comes with Ubuntu but Phusions’s Ruby Enterprise Edition. REE is a reliable and memory friendly fork of the main Ruby interpreter from the Phusion guys. Go to the REE download page and grab the “.deb” files for your architecture (look for the “Ubuntu Linux” tab). Install the “.deb” with:

sudo dpkg -i path-to-the-deb-file

This will get REE installed to “/opt/ruby-enterprise” but the binaries will not be available at your PATH, we’ll need to add the “bin” dir to our PATH variable manually. Open up your “/etc/environment” file with your preferred command line text editor (mine is “nano”):

sudo nano /etc/environment

And add the “/opt/ruby-enterprise/bin” dir to your PATH variable like this:

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/opt/ruby-enterprise/bin"

This will get the scripts at the “bin” folder available to your user but not when you use “sudo” calls (Ubuntu just overrides the PATH when you call “sudo” for security reasons) so we’ll need to symlink some of the files to “/usr/bin” to be sure that they’re visible when you’re sudoing:

ln -s /opt/ruby-enterprise/bin/ruby /usr/bin/ruby
ln -s /opt/ruby-enterprise/bin/gem /usr/bin/gem
ln -s /opt/ruby-enterprise/bin/ri /usr/bin/ri
ln -s /opt/ruby-enterprise/bin/rdoc /usr/bin/rdoc
ln -s /opt/ruby-enterprise/bin/irb /usr/bin/irb

Now let’s install some gems to be sure that everything is ok:

sudo gem install rails mysql nokogiri rmagick mislav-will_paginate --no-ri --no-rdoc

The “–no-ri –no-rdoc” option is to avoid creating docs that we’re not really going to use and that will take a long time to be generated (also, if you’re into a VPS and don’t have a lot of memory those commands are surely going to throw out of memory errors). If you got no errors here, we’re good to go and install Nginx and Passenger.

Installing Nginx and Passenger

Installing Nginx with Passenger and Ruby EE is as easy as calling this command:

sudo /opt/ruby-enterprise/bin/passenger-install-nginx-module --auto --auto-download

Those “–auto” options are there to tell the installer that we’re saying yes to all defaults and we want it to download a brand new Nginx copy and build it with the Passenger module. The installer is going to ask you where to install Nginx with a default of “/opt/nginx”, just hit enter to get it installed at the default path.

As you can see from the messages printed, Passenger has already generated a sample configuration file with the basic config needed to run the application, here’s an example of how it would look like.

It’s VERY important to set the Nginx user to be the same user that’s going to deploy and create the application files as this will avoid permission issues that are one of the most common problems you’re going to have. With Nginx configured to load your application, start it to be sure that everything is OK again:

sudo /opt/nginx/sbin/nginx

Open up your browser pointing to the server where Nginx is running and you should see your application running correctly. If it isn’t, check the application logs and also Nginx error logs at “/opt/nginx/logs/error.log”. You can kill Nginx with a simple:

sudo pkill nginx

Getting Nginx to run as a Daemon

Now that Nginx is running correctly and serving your application you need to set it up to run as a daemon. To do this we need to create a script that’s going to handle the Nginx daemon and install it using the update-rc.d utility. You can get the script here.

You should save this script at “/etc/init.d/nginx” (sudo to do it), mark it as executable and install it as a daemon:

sudo chmod +x /etc/init.d/nginx
sudo update-rc.d nginx defaults

Now when the machine reboots Nginx will be started automatically. As a last touch, start the Nginx daemon and your server is ready to roll:

sudo /etc/init.d/nginx start

Quick Tip – Using to_s as a label and simplified link_to calls to your ActiveRecord models

One of the things you’ll find in every rails application is links like this one:

< %= link_to user.login, user %>

Or maybe like this one:

< %= link_to user.login, user_path( user ) %>

Or maybe something ugly like this one:

< %= link_to user.to_label, user_path( user ) %>

How about throwing all of them and just doing it like this:

< %= link_to user %>

Cool, isn’t it?

Now “how can I do that” you ask, it’s dead simple. First, remember that every object responds to a method called “to_s” and this “to_s” method is defined as “a method that returns a string representation of your object” in most programming languages, including Ruby.

A string representation of your object is something human readable that would tell someone else what this object represents. “to_s” isn’t meant to be a debug like method, we already have “inspect” to do that, so why not put it to work and simplify our links?

At every ActiveRecord model in your application you’ll define a to_s method that returns one (or maybe more, if needed) attributes of your object as a string (if they’re not strings, turn them into strings and return). Let’s see how your user would look like:

class User < ActiveRecord::Base
    validates_presence_of :login
    validates_uniqueness_of :login
    def to_s
        self.login
    end
end

I decided that the “login” method is the one that best represents the User object and it’s also the one I want to use then someone else seers users on the website. They won’t see their real names by default, but their logins.

Why is it better to do this with the “to_s” method instead of adding a “to_label” method to all objects? ‘Cos many helpers will already call the to_s method by default on your object (as we’ll see with the link_to helper), so you just get full compatibility for free. Check out our new link_to implementation:

module ApplicationHelper
  #sample link_to override that will generate urls for active_record objects by default
  #if the first parameter is an active_record object and you just want a link to it,
  # you can call it just like this:
  # <%= link_to user %>
  # the helper will take care of generating the correct url
  def link_to( *args )
    options = args.extract_options!
    if args.size == 1 && args.first.is_a?( ActiveRecord::Base )
      super( *([ args.first, args.first ] + [ options ]) )
    else
      super( *( args + [ options ] ) )
    end
  end
end

If there’s only one object (besides the options hash) and it is an ActiveRecord::Base instance, just use the object itself as the url parameter (the real link_to helper will call polymorphic_url on it automatically) and also use the object as the link_to label (the first parameter), this will call the to_s method on our user and the link will use it as the label.

What do you get with this?

A seamless and clear way of defining labels for your models (the “to_s” method is part of the core of the language anyway) and also a simpler way of generating links for your objects. Common methods for object labels are very important because you can never be sure if the property you’re using today as a label will be used forever. Using the “to_s” method as the default “label method” will allow you to change the label property at any time with almost no change to other parts of the code.

Building a I18N aware form builder for your Rails applications

Form builders are one of the coolest features when building Rails applications, they streamline the task of writing complex forms and you can usually write your own form builder to maintain their consistency in your application. One of the most common customized form builders is the one that users the field name to show a label:

< % form_for @user, :builder => SimpleFormBuilder do |f| %>
    < %= f.text_field :date_of_birth %>
< % end %>

We would probably use the “date_of_birth” symbol, turn it into a String and then call “humanize” on it ( you can see a custom form builder example that does exactly this here ):

:date_of_birth.to_s.humanize
=> “Date of birth”

That’s awesome if you’re building a website for a language that uses only ASCII characters, but if you’re building a form in Portuguese you’re doomed. Imagine that my “usuario” (user) has a “profissão” (profession) and I try to use it using a common form builder:

< % form_for @usuario, :builder => SimpleFormBuilder do |f| %>
    < %= f.text_field :profissao %>
< % end %>

What do I get? “profissao”, no tilde. And if I try to create a method called “profissão” at my object it’s just going to break.

So, customized form builders for Rails using funny characters are impossible? Never! With the new I18N support this trouble as been completely removed, let’s see how we can write a customized form builder that uses the attribute names to generate labels and will respect funny Portuguese, Russian or characters from any other language.

Here’s the builder code:

class SimpleFormBuilder < ActionView::Helpers::FormBuilder

  attr_accessor :object_class

  helpers = field_helpers +
            %w{date_select datetime_select time_select} +
            %w{collection_select select country_select time_zone_select} -
            %w{hidden_field label fields_for submit select} # Don't decorate these

  helpers.each do |name|
    class_eval %Q!
    def #{name}(field, *args)
      options = args.extract_options\!
      args << options
      return super if options.delete(:disable_builder)
      @template.content_tag(:p, field_label(field, options) << '<br/>' < < super)
    end
    !
  end

  def select(field, choices, options = {}, html_options = {})
    return super if options.delete(:disable_builder) || html_options.delete(:disable_builder)
    @template.content_tag(:p, [field_label(field, options), '<br/>', super].join("\n"))
  end

  def submit(value = nil, options = {})
    if self.object && value.nil?
      value = self.object.new_record? ? I18n.t( 'txt.shared.create' ) : I18n.t( 'txt.shared.update' )
    end
    @template.content_tag( :p, super( value, options ) )
  end

  def field_label( field, options )
    self.label( field, options.delete( :label ) || self.object_class.human_attribute_name( field.to_s ), :class => options[:label_class])
  end

  def initialize(object_name, object, template, options, proc)
    super
    self.object_class = self.object.nil? ? self.object_name.to_s.camelize.constantize : self.object.class
  end

end

We’ve created our own form builder that inherits from the ActionView::Helpers::FormBuilder and it redefines all field helpers using a class_eval call (don’t know what class_eval does? Learn here).

Our version isn’t really that different from the usual solution, it looks for a :disable_builder option, if there’s one and it’s true, the builder will just call the original method, without the custom decoration. If there’s no :disable_builder option the builder will set out to do its work. Also, we need the object class to find out the correct attribute names, so your form builder also holds the class of the object that’s being used in the form.

If there’s a :label option available, it’s the one that’s going to be used, if there’s no :label option the builder will access the class of the object that’s being used in the form and call the “human_attribute_name” with the field name as a parameter on it. By default, “human_attribute_name” will just call “humanize” on your field name (same as our code above) but, when we’re using the Rails I18N support things change a bit.

The first thing that changes is that we can use the I18N support to define labels for those fields (and also for the class names). Let’s take a look at the “config/locale/en.yml” to check out how it looks like:

en:
  activerecord:
    models:
      user:
        one: User
        other: Users
    attributes:
      user:
        name: Name
        date_of_birth: Date of birth
        login: Login
        password: Password
        password_confirmation: Confirm Password
        profession: Profession

Under the “activerecord” namespace we have the “models” and “attributes” namespaces. As you might have guessed, the “models” namespace is used to internationalize your model names and the “attributes” to do the same to your object’s attribute names. While we’re using English as the language things aren’t really that interesting, so we’re going to add Portuguese support:

pt-BR:
  activerecord:
    models:
      user:
        one: Usuário
        other: Usuários
    attributes:
      user:
        name: Nome
        date_of_birth: Data de Nascimento
        login: Login
        password: Senha
        password_confirmation: Confirmação da Senha
        profession: Profissão

And now your form builder will use the Portuguese field names on its labels whenever the current locale is set to “pt-BR” (this isn’t the full file, you can check it out at the project repo). The real catch here is to use the “human_attribute_name” instead of just “humanizing“ the field name.

When human_attribute_name is called it will first try to get the attribute name from your I18N files using the current locale and you don’t really need to be writing a Multilanguage application to use the I18N support, whenever you’re using a language that isn’t pure ASCII only you can use the I18N support and have nice default labels for your form fields. Translating your models using the default “models” and “attributes” namespaces will also internationalize Rails default error messages, as they’re going to use the names of the current locale.

You can see this form builder in action at the sample_social_network project.

with_scope and named_scopes ignoring stacked :order clauses

If you’ve been using with_scope and named_scopes a lot with ActiveRecord you have probably noticed that the :order clauses defined at the scopes are lost and only the first :order clause is used. If you defined an :order clause you’d like to have it merged with the other ones already provided. Here’s a simple example:

class User
  named_scope :by_first_name, :order => "#{quoted_table_name}.first_name ASC"
  named_scope :by_last_name, :order => "#{quoted_table_name}.last_name ASC"
end

Our user has two named scopes defined and both of them define an :order clause, if we try to run a finder like this:

User.by_first_name.by_last_name.all

This is the generated query:

SELECT * FROM `users` ORDER BY `users`.first_name ASC

As you’ve noticed, only the first :order clause was used, the last one was lost. Our ideal SQL query would have to look like this, with both :order clauses being used:

SELECT * FROM `users` ORDER BY `users`.last_name ASC , `users`.first_name ASC

That’s why we’re going to hack the with_scope method a litle bit to reach our goal. This issue was already reported to the Rails issue tracker but there’s no fix yet so our only hope is to monkeypatch Rails to behave as we expect it to, so here’s a really simple fix for the problem:

ActiveRecord::Base.class_eval do

  class << self

    def merge_orders( *orders )
      orders.map! do |o|
        if o.blank?
          nil
        else
          o.strip!
          o
        end
      end
      orders.compact!
      orders.join( ' , ' )
    end

    def with_scope_with_hack(method_scoping = {}, action = :merge, &block)
      method_scoping = method_scoping.method_scoping if method_scoping.respond_to?(:method_scoping)

      # Dup first and second level of hash (method and params).
      method_scoping = method_scoping.inject({}) do |hash, (method, params)|
        hash[method] = (params == true) ? params : params.dup
        hash
      end

      method_scoping.assert_valid_keys([ :find, :create ])

      if f = method_scoping[:find]
        f.assert_valid_keys(VALID_FIND_OPTIONS)
        set_readonly_option! f
      end

      # Merge scopings
      if [:merge, :reverse_merge].include?(action) && current_scoped_methods
        method_scoping = current_scoped_methods.inject(method_scoping) do |hash, (method, params)|
          case hash[method]
          when Hash
            if method == :find
              (hash[method].keys + params.keys).uniq.each do |key|
                merge = hash[method][key] && params[key] # merge if both scopes have the same key
                if key == :conditions && merge
                  if params[key].is_a?(Hash) && hash[method][key].is_a?(Hash)
                    hash[method][key] = merge_conditions(hash[method][key].deep_merge(params[key]))
                  else
                    hash[method][key] = merge_conditions(params[key], hash[method][key])
                  end
                elsif key == :include && merge
                  hash[method][key] = merge_includes(hash[method][key], params[key]).uniq
                elsif key == :joins && merge
                  hash[method][key] = merge_joins(params[key], hash[method][key])
                elsif key == :order && merge
                  hash[method][key] = merge_orders(params[key], hash[method][key])
                else
                  hash[method][key] = hash[method][key] || params[key]
                end
              end
            else
              if action == :reverse_merge
                hash[method] = hash[method].merge(params)
              else
                hash[method] = params.merge(hash[method])
              end
            end
          else
            hash[method] = params
          end
          hash
        end
      end

      self.scoped_methods << method_scoping
      begin
        yield
      ensure
        self.scoped_methods.pop
      end
    end

    alias_method_chain :with_scope, :hack

  end

end

You can place this code at an initializer (maybe called with_scope_fix.rb) or at your lib folder and require it in your initializers. And now all your :order clauses defined by named_scope or with_scope calls will be correctly merged and will not be lost in your code.

Handling database indexes for Rails polymorphic associations

One thing that is usually overlooked when defining tables and their associations in a Rails application are the indexes. Usually, this comes from the idea that “my ORM tool does the job” and in fact it might be true sometimes. One of the most successful ORM tools in the Java land, Hibernate, generates a database with indexes for all foreign keys that you have, so Java programmers that use it don’t really worry about these issues (at least not until their database is slowing down to death).

ActiveRecord migrations, on the other side, don’t really worry about these things ( unless you’re using the cool Foreign Key Migrations plugin ), you must define the indexes that you need by yourself. Usually this is done by a simple call like this:

add_index :comments, :user_id

This will create an index for the column :user_id at the :comments table. For simple associations this is straightforward, but ActiveRecord offers goodies that are not so common in other tools and one of them is the “polymorphic associations”. With polymorphic associations you can define an association without defining the kind of the object you will be associated with, you just say that it’s a polymorphic association and you’re done. The code would look like this:

class Comment
    belongs_to :commentable, :polymorphic => true
    belongs_to :user
end

To make this work, at the database level you would need two columns at the :comments table, one called :commentable_id, that will hold the id of the object that owns the comment, and another called :commentable_type, that will hold the full class name of the object that owns the comment. So, if you’re commenting in a Post object with an ID of 1, the commentable_id would be 1 and the commentable_type would be “Post”. At the Post model the association would look like this:

class Post
    has_many :comments, :as =>  :commentable
    has_one :user
end

When you’re using polymorphic associations, your queries for the object will usually contain the commentable_id and the commentable_type in the where clause, as you would be looking for comments for the commentable_id of 1 and commentable_type of “Post”, so it makes sense to create indexes for these columns. As you’ve already saw, you could do this with the following code:

add_index :comments, :commentable_id
add_index :comments, :commentable_type

And now you have two indexes, one for each column, your database searches should fly with this, shouldn’t they?

Well, they will not. You’re defining two different indexes, one for each column, but you almost never search for them in separate, you’re always searching for the :commentable_id and also for the :commentable_type, so you should create an index for both columns and not for each one of them, the call should be something like this:

add_index :comments, [ :commentable_type, :commentable_id]

This is going to generate an index with both columns and your queries for your polymorphic models will now really be faster than before.

Obviously, you can also create indexes for the :commentable_type and :commentable_id columns if you search for them in separate, but having a lot of indexes in your table slows down update calls and might also create big tables in your filesystem. So, when defining polymorphic associations, remember to create an index for both columns and not just one for each of them. Also, if you know that the column will always have a value, make it not null, as searching on indexes of nullable columns in some databases (like MySQL) is slower then searching on not-null column indexes.

And before you go, when ActiveRecord creates a string column at the database level, you can define a :limit option that defines the size of the VARCHAR column at the database. If you don’t give a limit, it’s going to be set as a VARCHAR(255) and I really believe you will not have a model class with a name that has 255 characters, so, instead of creating a column with an unreasonable size (that is going to slow down queries and generate bigger indexes), give it a limit that’s real. Our final table definition would look like this one:

create_table :comments do |t|
  t.integer :user_id, :null => false #indexing nullable columns is slower, try to make all columns that are going to be in indexes not-null
  t.integer :commentable_id, :null => false
  t.string :commetable_type, :null => false, :limit => 20 #could be even less
  t.text :comment
end

add_index :comments, :user_id
add_index :comments, [:commentable_id, :commentable_type]

This post was originally published at the CodeVader weblog.

Unit tests don’t guarantee that your system works

Last week we had an interesting message at the RSpec users list, the most interesting part of it is the following:

“I also had to go into specs on a project I’m not working on, and found an unholy hive of database-accessing specs. It’s disheartening. Basically, it’s cargo cult development practices – using the “bestpractice” without actually understanding it.”

You might have read this before, “/specs|tests/ that access the database are evil”, but have you ever asked yourself why?

Behavior Driven Development is the next step after Test Driven Development and it borrows many best practices found in the later. The two principles that interest us most in this conversation is test-first development and unit testing.

The idea behind test-first development is that before writing your code, you should write a test stating what you want you “future” code to do. By writing the test before the code you get to work on the public interface provided by your object, the test is the first client of your code, so, if your public interface is cumbersome or difficult to use, this test will be able to catch a bad idea before it’s materialized in your code.

And where is unit testing in all this? You should be doing test-first using unit tests, as unit tests will guarantee that the code you wrote for that single unit (a method, probably) works alone. If you have more objects that need to be used to test this specific behavior, you should use mock objects (fake objects) in their places, so you won’t be testing them in your unit test. Remember, unit tests should only test a unit of code, no more than that. We should do it this way so we don’t get distracted with the other objects implementation, we focus in testing our target, not it’s dependencies.

When we’re writing specs for our objects they should usually work as unit tests, they should only assert the behaviors of a single unit of code, everything else should be done using mocks and stubs. But I said usually.

As I said before, unit tests and your common specs, should only assert the behaviors of a unit of code without considering their relationships with the other objects on the system, but this only guarantees that they work as units. This will never guarantee that they will really work when in real contact with the other objects in the system, unit testing don’t guarantee that your system works, they surely help you to reach this goal, but they aren’t enough.

And what it has to do with that message, anyway?

That spec that access the database is just like an integration test, it asserts that the code being tested works fine when integrated with the database. So, the integration tests are the ones that really show you that your code works as a system, not only as a group of lonely objects.

I’m not saying that you should leave the unit tests behind, because they have a big importance to help you design your code and be sure that it works as a unit, but you shouldn’t rely only in them to test your system, a good suite of integration tests will give you the trust that everything works fine in conjunction.

And sometimes you can’t unit test a functionality, it’s all about integration. Let’s take the “validates_uniqueness_of” validation in ActiveRecord as an example, if you’re writing a spec for your ActiveRecord model, you should add one ‘it’ statement showing that this is needed (you’re specifying how your model behaves, remember?), so here’s how it could look:

it 'Should not be valid if there is another one with the same name' do
           @common_name = 'testuser'
           @user = User.create( :name => @common_name )
           @another_user = User.new( :name => @common_name )
           @another_user.should have(1).error_on(:name)
end

How could you perform this spec without touching the database?

First, you could look ad the “validates_uniqueness_of” source code, figure out how it works and stub it to return what you want, but this is bad because if the framework code changes your specs would break. The other way would be changing the database adapter to a mocked one and send exactly the result you wanted, but this is basically overkill. So why don’t you just leave the “purism” behind, test it in your database and be happy that your code works fine?

One important thing to notice is that integration tests are also slower to run, so you wouldn’t like to wait for the full suit run before performing a commit, usually you would run the unit and integration tests that are most likely to break if you did something wrong, the ones related to what you’re doing now and just be done with it.

So, if you’re in a project that has database accessing specs or specs that are using many real objects (and not mocks), don’t feel bad, but be sure that who wrote it knows that he is doing and that everything that can be unit tested is being unit tested. Integration tests should be written after your functionality is implemented and tested with unit tests, they are not interchangeable, nor you will replace one with the other.

And be sure to never commit your code before running your tests :)

PS: Originally published at the CodeVader blog