Setting up your Ruby on Rails application in an Ubuntu Jaunty Jackalope (9.04) server with Nginx, MySQL, Ruby Enterprise Edition and Phusion Passenger

There are many ways to deploy and run Ruby applications with the Ruby on Rails framework but it’s unlikely that you’re going to find a simpler and faster solution than using Ruby Enterprise Edition (REE from now on) with Nginx and Phusion Passenger. Nginx is a fast, scalable and lightweight HTTP server, that is able to serve a lot of content without using up all your memory and Passenger is a module that can be tied into Apache or Nginx to handle your Ruby (and RoR) applications automatically.

When using Passenger you don’t need to worry about managing a pack of Mongrels or use a proxy HTTP server, Passenger lives inside your web server and just takes care of everything for you. Here you’ll learn how to use Passenger in conjunction with Nginx to deploy your applications in the wild.

This tutorial assumes that you’re building a brand new Ubuntu server with none or little custom packages installed. Does this mean you can’t use this with an already customized server? No, but it’s easier if you can follow it step by step to avoid problems, as this has already been tried and tested to be sure that it works. We’re using MySQL here because it’s what I’m using right now but can easily change the apt-get calls to use whatever database you’re using yourself.

Setting up users

If you’re really starting up from a brand new install with no users created beyond the default ones you might want to create a user for yourself so that you don’t need to be logged in as a “root” forever. To create a new user in a Linux box the command is “useradd”:

useradd -m -g staff -s /bin/bash mauricio

This will create a user called “mauricio” with a “/home/mauricio” home directory (as defined by the “-m” param), with “staff” as it’s default group and using the “/bin/bash” shell. After creating a user for yourself you might also want to create a user for the application you’re deploying or a “deployment” user. This is the user that’s going to be used to deploy the application and run all application related processes. Just use the same command above changing the username to your deploy user, this can be the name of the application you’re deploying or just “deploy” (keep all your users belonging to the same “staff” group to avoid file permission issues when you edit or create files).

After doing this you can also make all users that belong to the staff group be able to use the “sudo” command. To do this just open the “/etc/sudoers” file with a text editor (I usually use “nano”) and add this line:

%staff ALL=(ALL) ALL

Setting up your ssh keys

If you’re running in a Linux/Unix box and haven’t generated your SSH keys, it’s time to do it. If you have never heard of them, SSH keys can get you to login into servers where you have a user account without asking you for the password, which is really cool if you have to handle a lot of servers at the same time (and if you don’t want to type passwords every time you do a “git pull|git push”. To generate them do this as your local user in your local machine:

ssh-keygen -t dsa

This command will create a folder called “.ssh” in your home directory (as in “/home/your_user/.ssh” with a bunch of files. It will ask you for a password to protect these files, the password isn’t required but it’s nice to be cautious here as if you don’t set a password anyone with physical access to your machine (or can log in as you) could log in into all machines to where your SSH keys were copied to.

Now that the keys are already generated, you can copy them to the servers you usually log into, to do this, first log in to the server using your account and at your user’s home folder create a .ssh folder:

mkdir ~/.ssh

Log off and, from your local pc, copy the ~/.ssh/id_dsa.pub file to the remote machine using scp:

scp ~/.ssh/id_dsa.pub your_remote_user@host:.ssh/authorized_keys2

This will copy your public key to the remote server and you’ll be able to log into that server from your current local machine and local user to the user you copied the key to in your remote server. You can obviously copy this to as many servers and user accounts as you like and none of them will ask you for a password again.

Getting Ubuntu up-to-date

First thing we need to do is to be sure that our server is up-to-date with the currently installed software:

sudo apt-get update
sudo apt-get upgrade

Then we need to install some basic libraries and MySQL:

sudo apt-get install build-essential mysql-server libmysqlclient15-dev libmagickcore-dev imagemagick libpcre3 libfcgi-dev libfcgi0ldbl libxml2-dev libxslt1-dev -y

This is going to install the MySQL server, the ImageMagick library to handle image processing and the XML and XSLT libraries needed for some common gems like Nokogiri.

Installing Ruby

We’re not going to use the default Ruby interpreter that comes with Ubuntu but Phusions’s Ruby Enterprise Edition. REE is a reliable and memory friendly fork of the main Ruby interpreter from the Phusion guys. Go to the REE download page and grab the “.deb” files for your architecture (look for the “Ubuntu Linux” tab). Install the “.deb” with:

sudo dpkg -i path-to-the-deb-file

This will get REE installed to “/opt/ruby-enterprise” but the binaries will not be available at your PATH, we’ll need to add the “bin” dir to our PATH variable manually. Open up your “/etc/environment” file with your preferred command line text editor (mine is “nano”):

sudo nano /etc/environment

And add the “/opt/ruby-enterprise/bin” dir to your PATH variable like this:

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/opt/ruby-enterprise/bin"

This will get the scripts at the “bin” folder available to your user but not when you use “sudo” calls (Ubuntu just overrides the PATH when you call “sudo” for security reasons) so we’ll need to symlink some of the files to “/usr/bin” to be sure that they’re visible when you’re sudoing:

ln -s /opt/ruby-enterprise/bin/ruby /usr/bin/ruby
ln -s /opt/ruby-enterprise/bin/gem /usr/bin/gem
ln -s /opt/ruby-enterprise/bin/ri /usr/bin/ri
ln -s /opt/ruby-enterprise/bin/rdoc /usr/bin/rdoc
ln -s /opt/ruby-enterprise/bin/irb /usr/bin/irb

Now let’s install some gems to be sure that everything is ok:

sudo gem install rails mysql nokogiri rmagick mislav-will_paginate --no-ri --no-rdoc

The “–no-ri –no-rdoc” option is to avoid creating docs that we’re not really going to use and that will take a long time to be generated (also, if you’re into a VPS and don’t have a lot of memory those commands are surely going to throw out of memory errors). If you got no errors here, we’re good to go and install Nginx and Passenger.

Installing Nginx and Passenger

Installing Nginx with Passenger and Ruby EE is as easy as calling this command:

sudo /opt/ruby-enterprise/bin/passenger-install-nginx-module --auto --auto-download

Those “–auto” options are there to tell the installer that we’re saying yes to all defaults and we want it to download a brand new Nginx copy and build it with the Passenger module. The installer is going to ask you where to install Nginx with a default of “/opt/nginx”, just hit enter to get it installed at the default path.

As you can see from the messages printed, Passenger has already generated a sample configuration file with the basic config needed to run the application, here’s an example of how it would look like.

It’s VERY important to set the Nginx user to be the same user that’s going to deploy and create the application files as this will avoid permission issues that are one of the most common problems you’re going to have. With Nginx configured to load your application, start it to be sure that everything is OK again:

sudo /opt/nginx/sbin/nginx

Open up your browser pointing to the server where Nginx is running and you should see your application running correctly. If it isn’t, check the application logs and also Nginx error logs at “/opt/nginx/logs/error.log”. You can kill Nginx with a simple:

sudo pkill nginx

Getting Nginx to run as a Daemon

Now that Nginx is running correctly and serving your application you need to set it up to run as a daemon. To do this we need to create a script that’s going to handle the Nginx daemon and install it using the update-rc.d utility. You can get the script here.

You should save this script at “/etc/init.d/nginx” (sudo to do it), mark it as executable and install it as a daemon:

sudo chmod +x /etc/init.d/nginx
sudo update-rc.d nginx defaults

Now when the machine reboots Nginx will be started automatically. As a last touch, start the Nginx daemon and your server is ready to roll:

sudo /etc/init.d/nginx start

Quick Tip – Using to_s as a label and simplified link_to calls to your ActiveRecord models

One of the things you’ll find in every rails application is links like this one:

< %= link_to user.login, user %>

Or maybe like this one:

< %= link_to user.login, user_path( user ) %>

Or maybe something ugly like this one:

< %= link_to user.to_label, user_path( user ) %>

How about throwing all of them and just doing it like this:

< %= link_to user %>

Cool, isn’t it?

Now “how can I do that” you ask, it’s dead simple. First, remember that every object responds to a method called “to_s” and this “to_s” method is defined as “a method that returns a string representation of your object” in most programming languages, including Ruby.

A string representation of your object is something human readable that would tell someone else what this object represents. “to_s” isn’t meant to be a debug like method, we already have “inspect” to do that, so why not put it to work and simplify our links?

At every ActiveRecord model in your application you’ll define a to_s method that returns one (or maybe more, if needed) attributes of your object as a string (if they’re not strings, turn them into strings and return). Let’s see how your user would look like:

class User < ActiveRecord::Base
    validates_presence_of :login
    validates_uniqueness_of :login
    def to_s
        self.login
    end
end

I decided that the “login” method is the one that best represents the User object and it’s also the one I want to use then someone else seers users on the website. They won’t see their real names by default, but their logins.

Why is it better to do this with the “to_s” method instead of adding a “to_label” method to all objects? ‘Cos many helpers will already call the to_s method by default on your object (as we’ll see with the link_to helper), so you just get full compatibility for free. Check out our new link_to implementation:

module ApplicationHelper
  #sample link_to override that will generate urls for active_record objects by default
  #if the first parameter is an active_record object and you just want a link to it,
  # you can call it just like this:
  # <%= link_to user %>
  # the helper will take care of generating the correct url
  def link_to( *args )
    options = args.extract_options!
    if args.size == 1 && args.first.is_a?( ActiveRecord::Base )
      super( *([ args.first, args.first ] + [ options ]) )
    else
      super( *( args + [ options ] ) )
    end
  end
end

If there’s only one object (besides the options hash) and it is an ActiveRecord::Base instance, just use the object itself as the url parameter (the real link_to helper will call polymorphic_url on it automatically) and also use the object as the link_to label (the first parameter), this will call the to_s method on our user and the link will use it as the label.

What do you get with this?

A seamless and clear way of defining labels for your models (the “to_s” method is part of the core of the language anyway) and also a simpler way of generating links for your objects. Common methods for object labels are very important because you can never be sure if the property you’re using today as a label will be used forever. Using the “to_s” method as the default “label method” will allow you to change the label property at any time with almost no change to other parts of the code.

Building a I18N aware form builder for your Rails applications

Form builders are one of the coolest features when building Rails applications, they streamline the task of writing complex forms and you can usually write your own form builder to maintain their consistency in your application. One of the most common customized form builders is the one that users the field name to show a label:

< % form_for @user, :builder => SimpleFormBuilder do |f| %>
    < %= f.text_field :date_of_birth %>
< % end %>

We would probably use the “date_of_birth” symbol, turn it into a String and then call “humanize” on it ( you can see a custom form builder example that does exactly this here ):

:date_of_birth.to_s.humanize
=> “Date of birth”

That’s awesome if you’re building a website for a language that uses only ASCII characters, but if you’re building a form in Portuguese you’re doomed. Imagine that my “usuario” (user) has a “profissão” (profession) and I try to use it using a common form builder:

< % form_for @usuario, :builder => SimpleFormBuilder do |f| %>
    < %= f.text_field :profissao %>
< % end %>

What do I get? “profissao”, no tilde. And if I try to create a method called “profissão” at my object it’s just going to break.

So, customized form builders for Rails using funny characters are impossible? Never! With the new I18N support this trouble as been completely removed, let’s see how we can write a customized form builder that uses the attribute names to generate labels and will respect funny Portuguese, Russian or characters from any other language.

Here’s the builder code:

class SampleFormBuilder < ActionView::Helpers::FormBuilder

  attr_accessor :object_class

  helpers = field_helpers +
            %w{date_select datetime_select time_select} +
            %w{collection_select select country_select time_zone_select} -
            %w{hidden_field label fields_for submit} # Don't decorate these

  helpers.each do |name|
    class_eval %Q!
    def #{name}(field, *args)
      options = args.extract_options\!
      return super if options.delete(:disable_builder)
      @template.content_tag(:p, field_label(field, options) << '<br/>' < < super)
    end
    !
  end

  def submit(value = nil, options = {})
    if self.object && value.nil?
      value = self.object.new_record? ? I18n.t( 'txt.shared.create' ) : I18n.t( 'txt.shared.update' )
    end
    @template.content_tag( :p, super( value, options ) )
  end

  def field_label( field, options )
    self.label( field, options.delete( :label ) || self.object_class.human_attribute_name( field.to_s ), :class => options[:label_class])
  end

  def initialize(object_name, object, template, options, proc)
    super
    self.object_class = self.object.nil? ? self.object_name.to_s.camelize.constantize : self.object.class
  end

end

We’ve created our own form builder that inherits from the ActionView::Helpers::FormBuilder and it redefines all field helpers using a class_eval call (don’t know what class_eval does? Learn here).

Our version isn’t really that different from the usual solution, it looks for a :disable_builder option, if there’s one and it’s true, the builder will just call the original method, without the custom decoration. If there’s no :disable_builder option the builder will set out to do its work. Also, we need the object class to find out the correct attribute names, so your form builder also holds the class of the object that’s being used in the form.

If there’s a :label option available, it’s the one that’s going to be used, if there’s no :label option the builder will access the class of the object that’s being used in the form and call the “human_attribute_name” with the field name as a parameter on it. By default, “human_attribute_name” will just call “humanize” on your field name (same as our code above) but, when we’re using the Rails I18N support things change a bit.

The first thing that changes is that we can use the I18N support to define labels for those fields (and also for the class names). Let’s take a look at the “config/locale/en.yml” to check out how it looks like:

en:
  activerecord:
    models:
      user:
        one: User
        other: Users
    attributes:
      user:
        name: Name
        date_of_birth: Date of birth
        login: Login
        password: Password
        password_confirmation: Confirm Password
        profession: Profession

Under the “activerecord” namespace we have the “models” and “attributes” namespaces. As you might have guessed, the “models” namespace is used to internationalize your model names and the “attributes” to do the same to your object’s attribute names. While we’re using English as the language things aren’t really that interesting, so we’re going to add Portuguese support:

pt-BR:
  activerecord:
    models:
      user:
        one: Usuário
        other: Usuários
    attributes:
      user:
        name: Nome
        date_of_birth: Data de Nascimento
        login: Login
        password: Senha
        password_confirmation: Confirmação da Senha
        profession: Profissão

And now your form builder will use the Portuguese field names on its labels whenever the current locale is set to “pt-BR” (this isn’t the full file, you can check it out at the project repo). The real catch here is to use the “human_attribute_name” instead of just “humanizing“ the field name.

When human_attribute_name is called it will first try to get the attribute name from your I18N files using the current locale and you don’t really need to be writing a Multilanguage application to use the I18N support, whenever you’re using a language that isn’t pure ASCII only you can use the I18N support and have nice default labels for your form fields. Translating your models using the default “models” and “attributes” namespaces will also internationalize Rails default error messages, as they’re going to use the names of the current locale.

You can see this form builder in action at the sample_social_network project.

Understanding class_eval, module_eval and instance_eval

Most of Ruby’s fame is due to it’s dynamic capabilities. In Ruby you can define and redefine methods at runtime, create classes from nowhere and objects from pure dust. Most of these dynamical features are done using one of those methods at the title, class_eval, module_eval and instance_eval, they’re usually the ones responsible for the show and now we’re going to learn a little bit about how they work and how we could use them in our objects.

class_eval and module_eval

These two methods are responsible to granting your access to a class or module definition, as if you were writing their code by yourself. When you do something like this:

Dog.class_eval do
    def bark
        puts “Huf! Huf! Huf!”
    end
end

It’s almost the same as doing this:

class Dog
    def bark
        puts “Huf! Huf! Huf!”
    end
end

What’s the difference?

With the class_eval you’re adding a method to a pre-existing class. If a class called Dog is not defined before our class_eval runs you’d see an “NameError: uninitialized constant Dog”. A class_eval call opens up an existing class for you, it won’t create or open a class that doesn’t exist yet.

And you don’t need to always write real code inside your class_eval calls, you can also send a string object containing the code you want to have ‘evaled inside your class. Let’s see how we could define a macro just like the attr_accessor using class_eval’ed strings:

Object.class_eval do

  class < < self

    def attribute_accessor( *attribute_names )

      attribute_names.each do |attribute_name|
        class_eval %Q?
          def #{attribute_name}
              @#{attribute_name}
          end

          def #{attribute_name}=( new_value )
              @#{attribute_name} = new_value
          end
        ?
      end

    end

  end

end

class Dog
  attribute_accessor :name
end

dog = Dog.new
dog.name = 'Fido'

other_dog = Dog.new
other_dog.name = 'Dido'

puts dog.name
puts other_dog.name

As you can see, we used both kinds of class_eval. First we opened up the Object class and added a new class method called attribute_accessor with direct code, but then, at the attribute_acessor I had no way to figure out the method name when I was writing the code, so, instead of just writing the code directly inside the class_eval call I’ve created a string object containing the code that I wanted to have ‘evaled by the class_eval method. The string is then turned into something like this:

def name
    @name
end

def name=( new_value )
    @name = new_value
end

And this is the parameter passed on to the class_eval call. Wrapping up, you can use class_eval to open classes (and modules) and add real code on it as you also can just pass a string containing valid Ruby code and it’s going to be ‘evaled as it was at the class definition body.

The module_eval method is just an alias to class_eval so you can use them both for classes and modules.

The instance_eval method works just like class_eval but it will add the behavior you’re trying to define to the object instance where it was called.

But hey, isn’t this exactly what we were doing with class_eval?

No, it isn’t. With class_eval we opened up a class definition and added code to it’s body. Any kind of code valid inside a class definition was also valid in there. When we’re using instance_eval the rules change a bit ‘cos we’re not opening up a class, but a single object instance.

How’s that? Let’s see an example:

class Dog
  attribute_accessor :name
end

dog = Dog.new
dog.name = 'Fido'

dog.instance_eval do
    #here I am defining a bark method only for this “dog” instance and not for the Dog class
  def bark
   puts 'Huf! Huf! Huf!'
  end

end

other_dog = Dog.new
other_dog.name = 'Dido'

puts dog.name
puts other_dog.name

dog.bark
other_dog.bark #this line will raise a NoMethodError as there’s no “bark” method
                      #at this other_dog object

Not really that interesting, is it? We can also use instance_eval to define methods in Class objects (which in turn will be class methods at that Class object instances) and we can do just that to our attribute_accessor:

Object.instance_eval do

  def attribute_accessor( *attribute_names )

    attribute_names.each do |attribute_name|
      class_eval %Q?
          def #{attribute_name}
              @#{attribute_name}
          end

          def #{attribute_name}=( new_value )
              @#{attribute_name} = new_value
          end
      ?
    end

  end

end

By using instance_eval instead of class_eval we don’t need the “class << self” as the method is defined directly at the Object class and will then be available as a class method for Object instances and Object subclasses instances.

As you might have noticed, these methods are also related to the difference between including and extending modules in Ruby.

with_scope and named_scopes ignoring stacked :order clauses

If you’ve been using with_scope and named_scopes a lot with ActiveRecord you have probably noticed that the :order clauses defined at the scopes are lost and only the first :order clause is used. If you defined an :order clause you’d like to have it merged with the other ones already provided. Here’s a simple example:

class User
  named_scope :by_first_name, :order => "#{quoted_table_name}.first_name ASC"
  named_scope :by_last_name, :order => "#{quoted_table_name}.last_name ASC"
end

Our user has two named scopes defined and both of them define an :order clause, if we try to run a finder like this:

User.by_first_name.by_last_name.all

This is the generated query:

SELECT * FROM `users` ORDER BY `users`.first_name ASC

As you’ve noticed, only the first :order clause was used, the last one was lost. Our ideal SQL query would have to look like this, with both :order clauses being used:

SELECT * FROM `users` ORDER BY `users`.last_name ASC , `users`.first_name ASC

That’s why we’re going to hack the with_scope method a litle bit to reach our goal. This issue was already reported to the Rails issue tracker but there’s no fix yet so our only hope is to monkeypatch Rails to behave as we expect it to, so here’s a really simple fix for the problem:

ActiveRecord::Base.class_eval do

  class << self

    def merge_orders( *orders )
      orders.map! do |o|
        if o.blank?
          nil
        else
          o.strip!
          o
        end
      end
      orders.compact!
      orders.join( ' , ' )
    end

    def with_scope_with_hack(method_scoping = {}, action = :merge, &block)
      method_scoping = method_scoping.method_scoping if method_scoping.respond_to?(:method_scoping)

      # Dup first and second level of hash (method and params).
      method_scoping = method_scoping.inject({}) do |hash, (method, params)|
        hash[method] = (params == true) ? params : params.dup
        hash
      end

      method_scoping.assert_valid_keys([ :find, :create ])

      if f = method_scoping[:find]
        f.assert_valid_keys(VALID_FIND_OPTIONS)
        set_readonly_option! f
      end

      # Merge scopings
      if [:merge, :reverse_merge].include?(action) && current_scoped_methods
        method_scoping = current_scoped_methods.inject(method_scoping) do |hash, (method, params)|
          case hash[method]
          when Hash
            if method == :find
              (hash[method].keys + params.keys).uniq.each do |key|
                merge = hash[method][key] && params[key] # merge if both scopes have the same key
                if key == :conditions && merge
                  if params[key].is_a?(Hash) && hash[method][key].is_a?(Hash)
                    hash[method][key] = merge_conditions(hash[method][key].deep_merge(params[key]))
                  else
                    hash[method][key] = merge_conditions(params[key], hash[method][key])
                  end
                elsif key == :include && merge
                  hash[method][key] = merge_includes(hash[method][key], params[key]).uniq
                elsif key == :joins && merge
                  hash[method][key] = merge_joins(params[key], hash[method][key])
                elsif key == :order && merge
                  hash[method][key] = merge_orders(params[key], hash[method][key])
                else
                  hash[method][key] = hash[method][key] || params[key]
                end
              end
            else
              if action == :reverse_merge
                hash[method] = hash[method].merge(params)
              else
                hash[method] = params.merge(hash[method])
              end
            end
          else
            hash[method] = params
          end
          hash
        end
      end

      self.scoped_methods << method_scoping
      begin
        yield
      ensure
        self.scoped_methods.pop
      end
    end

    alias_method_chain :with_scope, :hack

  end

end

You can place this code at an initializer (maybe called with_scope_fix.rb) or at your lib folder and require it in your initializers. And now all your :order clauses defined by named_scope or with_scope calls will be correctly merged and will not be lost in your code.

SQL functions in WHERE clauses are evil

Once we get up an running with the basic SQL syntax, doing inserts, updates, deletes and simple selects we start to learn about the SQL functions, the default ones like LOWER, COUNT, AVG and then the functions that are specific for the database you’re using. We learn them and start to feel that your fingers are itching to try them, to use them in the real world. Why would you learn them if you can’t use them anyway?

Well, I can’t tell you that you should never use them, but listen to my advice, do not ever use these SQL functions in WHERE clauses to filter data. They’re evil and they’ll try to kill your database and prevent you from working by having to discover crazy performance bottlenecks and slow queries that don’t look slow at all. At least until you run an “EXPLAIN” on them.

Let’s start with a simple example, we want to know, from our USERS table, which ones where born today so we can send them a beautiful and unpersonal e-mail to remember that they are getting older today. In your USERS table there is a column called “date_of_birth” which is a DATE and obviously it stores the date that the user said he was born at, so we know the day, month and year, we already have all information that we need, now it’s time to write the SQL code to find them.

In our first attempt, we write the following simple SQL:

SELECT * FROM users u where DAYOFMONTH(u.date_of_birth) = 12 AND MONTH(u.date_of_birth) = 1

Pretty simple and does the job perfectly, running it in your test database returns the correct users and it’s definitely fast. Now we get to try it in our production database so you can figure out which users are going to receive the e-mail. We’re running a big website, with a gazillion of users, so there should be some of them that are getting older today, we type in the query at your production database console and…

Wait.

Waiting.

Still waiting and looking at that blank console screen.

Well, you should find yourself something else to read, it’s going to take a while. After a long time waiting you get a list with 10 users with birthdays today. WTF? Why so long just to find 10 users?

A light shines in our heads and we remember that there is no index at the “date_of_birth” column, we never thought about using it in queries so we, as good database guys, did not create an index when it wasn’t needed. But now it is and you just type in the command to create the index for the “date_of_birth” column.

After waiting a little bit to have the index created, we type again our beautiful query and we wait again. This time it seems that’s it’s taking even longer to finish. This is clearly wrong, we have created an index at that database field and queries against that field should use that index. Now we have to bring the most important tool of anyone that has ever used a database, the “explain query” feature, which explains how a query is going to be executed by the database. At your database console, we type (this is for MySQL):

explain select * from users u where DAYOFMONTH(u.date_of_birth) = 12 and MONTH(u.date_of_birth) = 1\G

And here’s our output:

*************************** 1. row ***************************

	id: 1
	select_type: SIMPLE
	table: users
	type: ALL
	possible_keys: NULL
	key: NULL
	key_len: NULL
	ref: NULL
	rows: 4558
	Extra: Using where
1 row in set (0.00 sec)

The most important lines are “possible_keys” and “key”, both of them are NULL, which means that our beautiful query isn’t using any indexes. But another information is even more alarming, the database is looking at 4558 rows to retrieve my results (and this is exactly the count of rows available at the users table). The database is scanning the WHOLE users table to fetch some 8 rows. Can you feel it’s pain?

We’ve created the index and we’re trying to filter just on that column, why is it not being used?

Because we’re using SQL functions, that’s the reason. The DAYOFMONTH function is a transforming function, it takes an argument and generates a value based on that argument and our query is performing it’s filtering based on this generated value. But here’s the problem, the database optimizer doesn’t know the generated value, it has no index for it nor a way to infer which value could be generated because it doesn’t know what this SQL function does. The optimizer can’t perform any optimizations at all.

When faced with such a complex query (from it’s point of view) the optimizer has no option but let the query run against the whole database, selecting every row, applying the function to every column and then finally filtering the result. Every time you use a SQL function that is not native to your database, like DAYOFMONTH, LEFT, RIGHT or MONTH, you might be leading yourself to such a bad query and future bottleneck. When you’re at your development database with a bunch of records, it might not yield any perceivable performance problems but once you reach the production environment with hundreds of rows, your problems will start to rise.

You should avoid filtering based on calculated or transformed data in you queries, as your database optimizer will not be able to give you the best “query plan”. If you’re faced with such a problem, you should create a separate column at your table and generate the value beforehand. In our case, we would need to create two new columns, “day_of_birth” and “month_of_birth”, create an index for both of them and every time a row has it’s “date_of_birth” updated, it should automatically update the “day_of_birth” and “month_of_birth” columns.

From now on, learn the mantra, SQL functions in where clauses are evil :)

Why learning HTTP does matter

It’s interesting to notice that there’s so many people working with web applications that don’t understand the basics of the Internet and the HTTP protocol. You might find applications that exibit bizarre behaviors anywhere, people just forget to read the specs or sleep during the HTTP protocol classes at college.

One of the most harmful exhibitions of this lack of knowledge is the “POST fever”. Every form in the application performs a POST, no matter what it’s doing or the side effects involved in it, it just works that way and people just don’t have a reason not to go like that, usually, if you ask them, they’ll probably say “oh, someone told me that the GET method has size limit in it’s parameters size”.

But, what’s so bad about it?

If you take a look at the HTTP RFC, you will find that the GET method is described as a “safe” method. Safe, in the HTTP context, means that you should be able to perform GETs to a web application and this should have no side effects on it, it should not change the resource you are requesting, because the whole idea of the GET method is that you should just GET a copy of the resource at that specific URL, you’re not doing anything funny with it, you should just receive it anywhere and anytime you want to.

But if you look at the POST method description, it’s defined as an “unsafe” method. If you send a POST to a URL you might be definitely changing something and generating an evil side effect that might render the whole application useless and bring Skynet and the Terminators to lay Armageddon on Earth. Or you might just be creating a new resource, as a blog posting like this one.

The obvious difference is that POSTs can (and usually should) change the state of something at the server side, while a GET should never do something like that. If you’re keen to SQL databases, GETs are just like “select” commands and POSTs like “insert” commands. Have you ever seen an “insert” returning a result set or a “select” inserting data? Neither me :)

But bear with me, it GETs even worse. Imagine that you’re the owner of that evil website that I said that just uses POSTs in it’s forms and one of those forms is a search form. Users will use it to search for your products and add them to their shopping carts. A user wants to buy the new AC/DC records but he’s not sure about it’s name, so he just types AC/DC and hits enter.

Voila!

There, at the top of the list, is “Black Ice”, their new record (Have you already bought yours?). He clicks on the link and while he’s viewing the CD page he remembers that he hasn’t bought the “Stiff Upper Lip” album. “Let me hit the back button and look for it too”, thinks the poor user and when he hit the button, the browser shows an interesting message:

“The browser will need to send data to the server to perform this action. Are you sure you want to do this?”

The user looks terrified to the message. “What have I done? Will they bill me for this? Are they going to send me the new Britney Spears album ‘cos I’m trying to hit the back button?”

As the HTTP protocol mandates, POSTs are not safe and the tools (usually, our browsers) should tell the user that something bad might happen if they try to POST by accident and that’s exactly what happens if you try to hit the back button after a POST. In this example, the user wouldn’t be doing anything wrong, but instead of coming back to a search page, he could be at a “add client” page and a “back” would make him re-create the last client he sent to the database, which isn’t really interesting.

Worse, if you’re using POST in a search form, they aren’t going to be able to use the back button (and the usability gurus say that it’s the most used feature in browsers) and they aren’t going to be able to bookmark the search results! Can you imagine something worse than that? You are keeping people from expressing their love for you website by posting it in their del.icio.us favorites!

Now, the reasoning is simple, if you’re not changing anything at the server side, you should always perform GETs. They don’t break the back button, they let the users bookmark their pages and they aren’t going to make the browser show the user any funny messages. If you’re changing state at the server side you should definitely use POST (and the other HTTP methods that are designed to change state, like PUT and DELETE), GET requests should NEVER change any state at the server side.

And before I forget, after every successful POST you should REDIRECT the user to a new page and not just render the page for him in response for the POST. Redirecting the user to the “response” page keeps the user from hitting the “back” button and re-entering the data they have already sent during the last POST.

Handling database indexes for Rails polymorphic associations

One thing that is usually overlooked when defining tables and their associations in a Rails application are the indexes. Usually, this comes from the idea that “my ORM tool does the job” and in fact it might be true sometimes. One of the most successful ORM tools in the Java land, Hibernate, generates a database with indexes for all foreign keys that you have, so Java programmers that use it don’t really worry about these issues (at least not until their database is slowing down to death).

ActiveRecord migrations, on the other side, don’t really worry about these things ( unless you’re using the cool Foreign Key Migrations plugin ), you must define the indexes that you need by yourself. Usually this is done by a simple call like this:

add_index :comments, :user_id

This will create an index for the column :user_id at the :comments table. For simple associations this is straightforward, but ActiveRecord offers goodies that are not so common in other tools and one of them is the “polymorphic associations”. With polymorphic associations you can define an association without defining the kind of the object you will be associated with, you just say that it’s a polymorphic association and you’re done. The code would look like this:

class Comment
    belongs_to :commentable, :polymorphic => true
    belongs_to :user
end

To make this work, at the database level you would need two columns at the :comments table, one called :commentable_id, that will hold the id of the object that owns the comment, and another called :commentable_type, that will hold the full class name of the object that owns the comment. So, if you’re commenting in a Post object with an ID of 1, the commentable_id would be 1 and the commentable_type would be “Post”. At the Post model the association would look like this:

class Post
    has_many :comments, :as =>  :commentable
    has_one :user
end

When you’re using polymorphic associations, your queries for the object will usually contain the commentable_id and the commentable_type in the where clause, as you would be looking for comments for the commentable_id of 1 and commentable_type of “Post”, so it makes sense to create indexes for these columns. As you’ve already saw, you could do this with the following code:

add_index :comments, :commentable_id
add_index :comments, :commentable_type

And now you have two indexes, one for each column, your database searches should fly with this, shouldn’t they?

Well, they will not. You’re defining two different indexes, one for each column, but you almost never search for them in separate, you’re always searching for the :commentable_id and also for the :commentable_type, so you should create an index for both columns and not for each one of them, the call should be something like this:

add_index :comments, [ :commentable_type, :commentable_id]

This is going to generate an index with both columns and your queries for your polymorphic models will now really be faster than before.

Obviously, you can also create indexes for the :commentable_type and :commentable_id columns if you search for them in separate, but having a lot of indexes in your table slows down update calls and might also create big tables in your filesystem. So, when defining polymorphic associations, remember to create an index for both columns and not just one for each of them. Also, if you know that the column will always have a value, make it not null, as searching on indexes of nullable columns in some databases (like MySQL) is slower then searching on not-null column indexes.

And before you go, when ActiveRecord creates a string column at the database level, you can define a :limit option that defines the size of the VARCHAR column at the database. If you don’t give a limit, it’s going to be set as a VARCHAR(255) and I really believe you will not have a model class with a name that has 255 characters, so, instead of creating a column with an unreasonable size (that is going to slow down queries and generate bigger indexes), give it a limit that’s real. Our final table definition would look like this one:

create_table :comments do |t|
  t.integer :user_id, :null => false #indexing nullable columns is slower, try to make all columns that are going to be in indexes not-null
  t.integer :commentable_id, :null => false
  t.string :commetable_type, :null => false, :limit => 20 #could be even less
  t.text :comment
end

add_index :comments, :user_id
add_index :comments, [:commentable_id, :commentable_type]

This post was originally published at the CodeVader weblog.

Including and extending modules in Ruby

One of the coolest features in Ruby is the existence of modules and the possibility of including their implementation in any object. This simple behavior is the source of things like the Enumerable module, that gives you a bunch of methods to work with a collection of objects and just expects that the class that included it to define an “each” method. You write a class, define an “each” method, include Enumerable and your’re done, all Enumerable methods are available for you.

Another example is the Comparable module, when you include the Comparable module in your class, you must define the operator (the UFO operator), the Comparable module will give you the implementation the following operators/methods:

< , <= , > , >= , ==, between?

This is usually why we call them mixins, because they are “mixing in” their behaviors (their methods/messages) into our objects. The idea of mixins serve a purpose similar to that of the multiple inheritance, that is to inherit an implementation from “something” without having to be a direct child of that “something”, in multiple inheritance you would be able to inherit from as many classes as you wanted to. In Ruby we don’t have multiple inheritance, but we can include as many modules as we want, so they give us the same feature, without all the hassle that multiple inheritance usually brings to a language.

The method resolution mechanism is pretty simple, first, if a method in a module that is being included is already defined in the class that is including it, the method of the class has precedence (which means that the method on the module will be ignored). If two modules define a method with the same name, the method on the last module included will be the one available at the class that has included both modules (remember that in Ruby there is no method overloading mechanism). Here’s an example of how it works:

module SimpleModule

  def a_method
    puts 'a_method at module'
  end

  def another_method( parameter )
    puts "Calling another method with parameter -> #{parameter}"
  end

end

module AnotherModule

  def another_method
    puts 'Calling another method without a parameter'
  end

end

class SimpleClass

  include SimpleModule
  include AnotherModule

  def a_method( param )
    puts "a_method at class -> #{param}"
  end

end

instance = SimpleClass.new

#calling the method defined on the class
instance.a_method 'parameter' 

#calling method on the AnotherModule
instance.another_method 

#this line will throw a 'wrong number of arguments' error
instance.a_method

An ugly example for an ugly practice, don’t rely on these things when you’re writing your own modules, strive to create unique modules that aren’t going to have method names clashing when they are included in other classes. If you have to rely on these rules to write and use your modules, maybe there is a problem in your code or in what you’re trying to do.

Extending methods

As the title of this post says, you can include and also extend modules, but what does it means to extend a module?

When you extend a module, you are adding the methods of that specific module into the object instance you call “extend”. So, the methods of that module will only be available at that specific instance (and not all objects of that class), other objects of the same class will not have the methods of the module available. With this, you can add specific behaviors to just one object of your system, without changing the other ones. Here’s an example:

module InstanceMethods

  def simple_method
    puts "im a method that belongs to an instance"
  end

end

class SimpleObject
end

object = SimpleObject.new
object.extend InstanceMethods
object.simple_method

another_object = SimpleObject.new

#the following line will throw an error, as this instance doesn't extends the module
another_object.simple_method

This might look like a weird feature, how many times have you wanted to introduce a method into a single object?

Not that many, probably, unless this instance is in fact an instance of the Class class (that contains the class methods of your object), and this is where extending modules get interesting and this is how many of the Rails plugins are written, let’s see how we can use this to write our own acts_as_votable plugin.

Rails, extending and including modules

First thing to do is create your Rails project:

rails --database=mysql include_extend_modules

With the project created, we have to create our plugin (enter in your Rails project folder):

script/generate plugin acts_as_votable

This will create a folder called acts_as_votable at the vendor/plugins folder and the plugin skeleton code. The first thing to do is to create our Vote model. It’s a dead simple model, with a polymorphic relationship with a “votable” and a boolean column called “up”, representing if this vote is “up” or “down”. The vote.rb file should live at the vendor/plugins/acts_as_votable/lib folder. Here’s the model code:

#vendor/plugins/acts_as_votable/lib/vote.rb

class Vote < ActiveRecord::Base
  belongs_to :votable, :polymorphic =&gt; true
  validates_presence_of :votable
end

Now we have to create a migration to create the votes table at the database:

script/generate migration create_votes

And there is the migration code:

class CreateVotes < ActiveRecord::Migration
  def self.up

    create_table :votes do |t|
      t.integer :votable_id, :null => false
      t.string :votable_type, :limit => 15, :null => false
      t.boolean :up, :default => false, :null => false
      t.timestamps
    end

    add_index :votes, [ :votable_id, :votable_type ]

  end

  def self.down

    drop_table :votes

  end
end

After creating the Vote model and it’s migration, we’ll head to that acts_as_votable.rb file in our plugin folder, it’s where the code that ties the Vote model with the application will live, here’s the code that will be in there:

#vendor/plugins/acts_as_votable/lib/acts_as_votable.rb

module ActsAsVotable

  module ClassMethods

    def acts_as_votable
      has_many :votes, :as => :votable, :dependent => :delete_all
      include InstanceMethods
    end

  end

  module InstanceMethods

    def cast_vote( vote )
      Vote.create( :votable => self, :up => vote == :up )
    end

  end

end

We have created a module called ActsAsVotable to serve as our namespace and in it we have two modules ClassMethods and InstanceMethods. The ClassMethods module defines the methods that we want to introduce at the ActiveRecord::Base class, so that we can just call “acts_as_votable” in any model that inherits from ActiveRecord::Base (just like any other ActiveRecord plugin) and the InstanceMethods module contains the methods that we want an instance that is “votable” to have.

So, if I say that a NewsArticle class is votable, its instances will have the cast_vote method, as the module InstanceMethods was included when they called acts_as_votable. But before creating the NewsArticle model, we have to do some changes in our init.rb file for the acts_as_votable plugin, here’s how it should look like:

#vendor/plugins/acts_as_votable/init.rb

require 'vote'
require 'acts_as_votable'

ActiveRecord::Base.extend ActsAsVotable::ClassMethods

This is where we make the acts_as_votable method available to all classes that inherit from ActiveRecord::Base and this is one of the most common uses of “extending” modules you will see.

Now that we have the code hooked to ActiveRecord, let’s create a simple model to try some tests, create our NewsArticle model:

script/generate model NewsArticle title:string article:text

Now, at the news_article.rb file:

#app/models/news_article.rb

class NewsArticle < ActiveRecord::Base

  acts_as_votable
  validates_presence_of :title, :article

end

We just call the acts_as_votable class method, that is available as we “exetended” the ActsAsVotable::ClassMethods module into the ActiveRecord::Base class, the superclass of our NewsArticle class. Here’s an example of you could do with our models:

article = NewsArticle.create(:title => 'sample', :article => 'sample')

#calling the cast_vote method from the ActsAsVotable::InstanceMethods module
article.cast_vote :up
article.cast_vote :down

#acessing the votes association defined when you called the acts_as_votable method
article.votes

And that’s it, you now know how and when to include or extend modules and even how to build a simple acts_as plugin for your models.

PS: You can get the full code for this example here. This post was originally published at the CodeVader blog.

Unit tests don’t guarantee that your system works

Last week we had an interesting message at the RSpec users list, the most interesting part of it is the following:

“I also had to go into specs on a project I’m not working on, and found an unholy hive of database-accessing specs. It’s disheartening. Basically, it’s cargo cult development practices – using the “bestpractice” without actually understanding it.”

You might have read this before, “/specs|tests/ that access the database are evil”, but have you ever asked yourself why?

Behavior Driven Development is the next step after Test Driven Development and it borrows many best practices found in the later. The two principles that interest us most in this conversation is test-first development and unit testing.

The idea behind test-first development is that before writing your code, you should write a test stating what you want you “future” code to do. By writing the test before the code you get to work on the public interface provided by your object, the test is the first client of your code, so, if your public interface is cumbersome or difficult to use, this test will be able to catch a bad idea before it’s materialized in your code.

And where is unit testing in all this? You should be doing test-first using unit tests, as unit tests will guarantee that the code you wrote for that single unit (a method, probably) works alone. If you have more objects that need to be used to test this specific behavior, you should use mock objects (fake objects) in their places, so you won’t be testing them in your unit test. Remember, unit tests should only test a unit of code, no more than that. We should do it this way so we don’t get distracted with the other objects implementation, we focus in testing our target, not it’s dependencies.

When we’re writing specs for our objects they should usually work as unit tests, they should only assert the behaviors of a single unit of code, everything else should be done using mocks and stubs. But I said usually.

As I said before, unit tests and your common specs, should only assert the behaviors of a unit of code without considering their relationships with the other objects on the system, but this only guarantees that they work as units. This will never guarantee that they will really work when in real contact with the other objects in the system, unit testing don’t guarantee that your system works, they surely help you to reach this goal, but they aren’t enough.

And what it has to do with that message, anyway?

That spec that access the database is just like an integration test, it asserts that the code being tested works fine when integrated with the database. So, the integration tests are the ones that really show you that your code works as a system, not only as a group of lonely objects.

I’m not saying that you should leave the unit tests behind, because they have a big importance to help you design your code and be sure that it works as a unit, but you shouldn’t rely only in them to test your system, a good suite of integration tests will give you the trust that everything works fine in conjunction.

And sometimes you can’t unit test a functionality, it’s all about integration. Let’s take the “validates_uniqueness_of” validation in ActiveRecord as an example, if you’re writing a spec for your ActiveRecord model, you should add one ‘it’ statement showing that this is needed (you’re specifying how your model behaves, remember?), so here’s how it could look:

it 'Should not be valid if there is another one with the same name' do
           @common_name = 'testuser'
           @user = User.create( :name => @common_name )
           @another_user = User.new( :name => @common_name )
           @another_user.should have(1).error_on(:name)
end

How could you perform this spec without touching the database?

First, you could look ad the “validates_uniqueness_of” source code, figure out how it works and stub it to return what you want, but this is bad because if the framework code changes your specs would break. The other way would be changing the database adapter to a mocked one and send exactly the result you wanted, but this is basically overkill. So why don’t you just leave the “purism” behind, test it in your database and be happy that your code works fine?

One important thing to notice is that integration tests are also slower to run, so you wouldn’t like to wait for the full suit run before performing a commit, usually you would run the unit and integration tests that are most likely to break if you did something wrong, the ones related to what you’re doing now and just be done with it.

So, if you’re in a project that has database accessing specs or specs that are using many real objects (and not mocks), don’t feel bad, but be sure that who wrote it knows that he is doing and that everything that can be unit tested is being unit tested. Integration tests should be written after your functionality is implemented and tested with unit tests, they are not interchangeable, nor you will replace one with the other.

And be sure to never commit your code before running your tests :)

PS: Originally published at the CodeVader blog