Blog

Refactoring our Rails app out of single-table inheritance

James Coglan, a Developer at FutureLearn, explains a common problem with STI and how we refactored our Rails app to solve it.

Single-table inheritance (STI) is the practice of storing multiple types of values in the same table, where each record includes a field indicating its type, and the table includes a column for every field of all the types it stores. In Rails, the type column is used to determine which type of model to instantiate for each row; a row with type = 'Article' will make Rails call Article.new when turning that row into an object.

A common problem with STI

A common problem with STI is that, over time, more types get added to that table, and it grows more and more columns, and the records in the table have less and less in common with one another. Each type of record uses some subset of the table’s columns, and none uses all of them, so you end up with a very sparsely populated table. Those types create costs for one another: when you query for articles, you must remember to filter out all the other types of values and to only select columns relevant to articles, or else pay a huge performance cost. You reproduce a lot of work the database would do for you, if you only had each data type in its own table.

By the time this storage model becomes a problem, it can be really hard to see a way out of it. You want to break this giant table into smaller focussed ones, but that will involve moving a lot of data and updating references. And, given the number of different types the table stores, it’s quite likely that a lot of your model code is coupled to it.

Rails adds to the problem by tightly coupling your model API to the schema, and encouraging you to use this API directly from controllers and views. Since the entire application is then coupled to the schema, any change to the database ripples through the entire codebase unless you find a way to contain the change. Fortunately, there are ways to work around this and perform the migration in incremental steps.

One of the first projects I took on after joining FutureLearn was a refactoring of our content model. All the types of content that make up our courses – articles, videos, discussions, quizzes and so on, ten types in all – were being stored in a single table called steps. In this article, I’ll explain how I broke the table up and how I tricked Rails into letting me do it.

How did we get here?

You might be wondering why we went with this design in the first place. After all, articles, videos, discussions and quizzes don’t seem like subtypes of one another. Well, it all comes down to courses. All these types of content have to be put into a sequence to make a course. So, each content item belongs to a course, and we use acts_as_list to keep the items, or ‘steps’, in order. This is most easily realised if all the steps live in the same table.

So, we had our Course and Step models:

class Course < ActiveRecord::Base
  has_many :steps, -> { order('position ASC') }
end

class Step < ActiveRecord::Base
  belongs_to :course
  acts_as_list scope: :course
end

and their underlying schema:

create_table 'courses' do |t|
  t.string 'title'
end

create_table 'steps' do |t|
  t.string  'type',       null: false
  t.integer 'course_id'
  t.integer 'position',   null: false
  t.string  'title'
  t.text    'body'
  t.integer 'asset_id'
  t.string  'url'
  t.string  'copyright'
end

(The null settings aren’t too important right now, but they create an interesting problem later on.)

This design means that we can display all the steps for a course by calling @course.steps; if the steps live in ten different tables this would be more complicated and it would be harder to maintain their order.

The columns in the steps table correspond to the various content types. All content types must have a title and belong to a Course, so the Step class validates those. Some content types (but not all) can have a body, which if present must contain only links to images served via https:. Rather than implement that on every content type, we implement it conditionally in the Step class:

class Step
  validates :course_id, presence: true
  validates :title,     presence: true, length: { maximum: 255 }

  validate :secure_asset_urls_in_body

  INSECURE_MARKDOWN_IMAGE = /!\[[^\]]*\]\(http:/

  def secure_asset_urls_in_body
    if body =~ INSECURE_MARKDOWN_IMAGE
      errors.add(:body, 'must not contain insecure URLs')
    end
  end
end

Then we have the content types themselves, for example Article has body and copyright attributes, Video has asset_id and copyright, and Exercise has body and url.

class Article < Step
  validates :body,      presence: true
  validates :copyright, length: { maximum: 255 }
end

class Video < Step
  validates :asset_id,  presence: true
  validates :copyright, length: { maximum: 255 }
end

class Exercise < Step
  validates :body, :url, presence: true
end

Although the intention is that each model uses a subset of the table’s columns, that constraint is not enforced anywhere. It relies on the forms and controllers we use to create the content not exposing certain fields to the user. The model validations ensure that we must supply at least the required properties for a content type, so these are all legal:

course = Course.create(title: 'Moons')

course.steps << Article.new(
  title: 'Pluto’s eccentric orbit',
  body: <<-STR,
    ![](https://www.example.com/moons/ou_moon_art_2028_plutos_eccentric_orbit.jpg)

    Pluto’s orbit is so elliptical that it strays inside Neptune’s orbit for 20
    years out of its 248-year circuit round the Sun. It last crossed inside
    Neptune’s orbit in 1979. Pluto orbits the Sun in an inclined orbital plane,
    and goes around the Sun twice for every three orbits that Neptune makes.
  STR
  copyright: '© The Open University'
)

course.steps << Video.new(
  title: 'Tidal effects on Io and Europa',
  asset_id: 1,
  copyright: '© The Open University'
)

course.steps << Exercise.new(
  title: 'Ordering Saturn’s moons',
  body: <<-STR,
    Put the names of some of Saturn’s moons in order starting with the closest
    to the planet by dragging the names up and down the list. The computer will
    tell you when you’ve got it right, and will offer you help after you’ve made
    a certain number of moves.
  STR
  url: 'https://www.example.com/moons/ordering_saturns_moons.html'
)

However, there is nothing to stop someone (via the console, or via a programming error exposed to users) giving an Article a url. As an STI table accepts more types, the more it resembles a schemaless document store.

Planning the new design

We knew that in order to make it easier to add new content types, we wanted each type in its own table. A Step would then be reduced to a (content_type, content_id) reference to a content item, and a course_id and position to place the content in order within a course. It would essentially be a join model:

class Step < ActiveRecord::Base
  belongs_to :course
  belongs_to :content, polymorphic: true

  acts_as_list scope: :course
end

Since the STI system does not enforce any rules about which types use which columns, we have to determine this new schema by inspection. A SQL query will show us all the types that are using, say, the body column:

> SELECT DISTINCT(type) FROM steps WHERE body IS NOT NULL;

+----------+
| type     |
+----------+
| Article  |
| Exercise |
+----------+

Running this query for all the types in the database leads to the new schema:

create_table 'steps' do |t|
  t.integer 'course_id'
  t.integer 'position',     null: false
  t.string  'content_type'
  t.integer 'content_id'
end

create_table 'articles' do |t|
  t.string  'title'
  t.text    'body'
  t.string  'copyright'
end

create_table 'videos' do |t|
  t.string  'title'
  t.integer 'asset_id'
  t.string  'copyright'
end

create_table 'exercises' do |t|
  t.string  'title'
  t.text    'body'
  t.string  'url'
end

Plotting a course

At this stage, a dilemma presents itself. Rails, or specifically ActiveRecord, requires that your model classes are a reflection of the database. Their attributes are inferred from the schema and Rails generates methods to reflect those attributes. So, changing the schema will change the API of your model classes, and you must update your application appropriately.

In our case, the Step class is a central model in our system: most of FutureLearn is concerned in some way with displaying course material or working out a learner’s progress in relation to that material. So, the codebase is littered with calls to things like step.body that would be step.content.body under the new scheme, but making all those changes to the views, the controllers, the access control rules, and all the models related to course content would be expensive.

The dilemma is as follows: we could avoid changing all the frontend code by wrapping the step/content models in a layer of indirection that would preserve the ability to call step.body, or we could decide that we had in fact made a conceptual change to our data model that should be rolled out across the codebase. The layer of indirection could be a domain-spanning service layer between ActiveRecord and the controllers and views, insulating the frontend from any future schema changes via a more abstract API, or it could be as simple as adding delegation methods to Step:

class Step
  def title
    content.title if content.respond_to?(:title)
  end
end

I rejected the service layer option on the grounds that it would be an awful lot of work to pull all our domain logic out of ActiveRecord and it’s too early in our product development to develop a good domain API that we know we can keep stable. The product is still evolving too quickly, and if this layer is not stable then it won’t achieve our aim of avoiding having to update frontend code when the model changes.

I also rejected the option of having delegating methods in Step. It’s clear from the example above that Step would grow methods covering all the things that all the content types can do, and this would not be a very well-defined API and would be confusing to its callers. We decided that ‘step’ and ‘content’ are two distinct concepts: content is things like articles and videos in isolation, and a step is a content item placed in context on a course, with a comment thread, associated learner progress records, and so on.

I opted instead to roll the API change out across the codebase, but to do it in two stages: first, I would change the model APIs to reflect the planned schema, but without writing any migrations: I would use some indirection inside the model classes to map the planned API onto the existing schema. Later, I would migrate the database and remove the support code introduced in the first stage. This would allow us to deploy the large volume of code changes required without having to wait for a long-running migration, with the possibility of having to roll that migration back if the release had problems in production. This is similar to the option of putting an insulation layer between the models and the frontend, but a layer down: we’re putting a model API in place that will shield the frontend from the effect of migrations.

(I changed about a quarter of our codebase while making these changes, and we did have to roll back the first deploy since I’d missed a few places where the wrong type of value was being used.)

To summarise, we want to move from a model where course.steps returns a list of polymorphic things like Article or Video, to a model where course.steps returns a list of Step objects, and each of those has a polymorphic content attribute that refers to one of the content types.

Splitting the models

Here’s where we start lying to Rails. We have to make it think that what is currently a single row in the steps table is actually two objects: a Step and its content. Currently, if we look up a Step we get one of its subtypes:

>> Step.first
=> #<Article id: 1, type: "Article", ...>

We want Step.first to return a Step, and Step.first.content to return the Article. We can achieve this by first telling Rails not to use the type column to cast the objects it finds:

class Step
  def self.inheritance_column
    :no_such_column_because_we_dont_want_type_casting
  end
end

This causes Rails to give us a Step instead of an Article:

>> Step.first
=> #<Step id: 1, type: "Article", ...>

Then, we need to convince Rails that step.content is another object, only it’s derived from the same database record as step. We can do this by saying that Step#content is a polymorphic association where the foreign type is type and the foreign key is id, effectively turning the row into a reference to itself but creating a different kind of object.

class Step
  belongs_to :content, polymorphic: true, foreign_type: :type, foreign_key: :id
end

Now, Rails will give us the row as an Article when we ask for the step’s content:

>> Step.first.content
=> #<Article id: 1, type: "Article", ...>

We also need to stop the content types inheriting from Step; they are now distinct things and should not inherit its behaviour. However, we still need to tell Rails they live in the step table, so

class Article < Step
end

becomes

class Article < ActiveRecord::Base
  self.table_name = 'steps'
end

But that’s not quite enough: a call to Video.first will run this SQL:

SELECT steps.* FROM steps ORDER BY steps.id ASC LIMIT 1

Because Video.first can return any row in the steps table, it might return the wrong type:

>> Video.first
=> #<Article id: 1, type: "Article", ...>

So we have to provide a default scope for each content type that limits it to only include rows of the right type:

class Article < ActiveRecord::Base
  self.table_name = 'steps'
  default_scope { where(type: 'Article') }
end

Including this boilerplate for all ten of our original content types is tedious, and there is more to come, so let’s begin extracting it into a concern. We’ll also move the content-related validations out of Step and into this module.

class Article < ActiveRecord::Base
  include Content
end

module Content
  extend ActiveSupport::Concern

  included do
    self.table_name = 'steps'
    default_scope { where(type: name) }

    INSECURE_MARKDOWN_IMAGE = /!\[[^\]]*\]\(http:/

    validates :title, presence: true, length: { maximum: 255 }
    validate :secure_asset_urls_in_body
  end

  def secure_asset_urls_in_body
    if body =~ INSECURE_MARKDOWN_IMAGE
      errors.add(:body, 'must not contain insecure URLs')
    end
  end
end

Now, loading videos works correctly:

>> Video.first
=> #<Video id: 2, type: "Video", ...>

Restricting the API

We have now made course.steps and step.content return the right types, and we’ve made the content classes appear to work like independent ActiveRecord classes in their own tables. But in order to force ourselves to update all the controller and view code, we need to lock down the APIs of these models. We’ve separated their behaviour by removing the inheritance from the Step class, but it’s still possible to access all the table’s columns as attributes on a Step or any content type. We should not be able to call step.title or video.body; so I wanted accessing attributes that will not exist in future to raise an error:

>> video = Video.first
=> #<Video id: 2, type: "Video", ...>

>> video.copyright
=> "© The Open University"

>> video.body
NoMethodError: `body' cannot be called on Video objects

Rails generates a #body method on Video because that class is backed by the steps table, and the steps table has a body column. But, we only want Video objects to respond to the columns that will be in our planned videos table. Let’s declare that API:

class Video < ActiveRecord::Base
  include Content
  restrict_attrs :asset_id, :copyright
end

We need to implement the restrict_attrs macro such that it hides any methods that Rails generates on Video that we don’t want. This includes any attributes and associations from Step, except for those we explicitly allow, and except any attributes that Rails uses internally, like the id, type and timestamp attributes.

The following Content::Attributes module defines such a macro. It uses the Rails reflection APIs to get all the attributes from Step, and subtracts the attributes we’ve allowed and the CORE_ATTRIBUTES used by Rails. For each remaining attribute, it uses define_method to override both the reader and the writer – that’s the [name, "#{name}="] pair – with an implementation that raises a NoMethodError. The module also implements respond_to? so that it will return false for any attributes we’ve hidden.

module Content
  module Attributes

    extend ActiveSupport::Concern

    CORE_ATTRIBUTES = %w[id type created_at updated_at]

    def self.step_attributes
      @step_attributes ||= (Step.columns + Step.reflect_on_all_associations).map { |c| c.name.to_s }
    end

    included do
      def self.blocked_attributes
        @blocked_attributes || []
      end

      def self.restrict_attrs(*attribute_names)
        @blocked_attributes = Attributes.step_attributes -
                              attribute_names.map(&:to_s) -
                              CORE_ATTRIBUTES

        @blocked_attributes.map! { |name| [name, "#{name}="] }
        @blocked_attributes.flatten!

        @blocked_attributes.each do |name|
          hide_method(name)
        end
      end

      def self.hide_method(name)
        define_method(name) do |*args, &block|
          raise_no_method_error_on(name)
        end
      end
    end

    def respond_to?(*method_names)
      method_names.all? do |name|
        next false if self.class.blocked_attributes.include?(name.to_s)
        super(name)
      end
    end

  private

    def raise_no_method_error_on(name)
      klass, type = self.class.name, attributes['type']
      type_name = (klass == type) ? klass : "#{klass}<#{type}>"
      message = "`#{name}' cannot be called on #{type_name} objects"
      raise NoMethodError, message
    end

  end
end

If we mix this into the Content and Step classes then we can lock down the API. Note that any associations declared after restrict_attrs is used will not be hidden.

module Content
  include Attributes
end

class Step
  include Content::Attributes
  restrict_attrs :course_id, :position

  belongs_to :course
  # etc.
end

class Article
  include Content
  restrict_attrs :title, :body, :copyright
end

class Video
  include Content
  restrict_attrs :title, :asset_id, :copyright
end

class Exercise
  include Content
  restrict_attrs :title, :body, :url
end

Creating new records

We’ve successfully locked down the API so that we cannot read or write to attributes that don’t apply to the present type. However, it creates a problem with creating new records: in our example, the steps.position and steps.type columns are not nullable, and so omitting them from a call to create raises an error:

>> Article.create(title: '...', body: '...', copyright: '...')
ActiveRecord::StatementInvalid: Mysql2::Error: Field 'position' doesn't have a default value:
    INSERT INTO `steps` (`body`, `copyright`, `title`, `type`) VALUES ('...', '...', '...', 'Article')

(The type value is filled in thanks to the default_scope we added to all the content classes.)

But, adding a position attribute to the new object raises an error in Rails:

>> Article.create(title: '...', body: '...', copyright: '...', position: 1)
ActiveRecord::UnknownAttributeError: unknown attribute: position

Because article.respond_to?(:position) is falseposition is a Step attribute, not an Article one – Rails refuses to set the position attribute even though the object’s internal attributes hash will contain this key. This coupling of Rails to the public API of your models, rather than to the underlying table’s columns, causes a big problem with creating new objects.

So, we cannot create new articles; the database will not accept a null position value but Rails won’t let us set one. But there is a way out: notice that all the required (i.e. non-nullable) attributes are core Step properties, not properties of content types. So, we can create a Step with the required values for those columns, then convert it to a content type and fill in the missing values.

>> course = Course.first
=> #<Course id: 1, title: "Moons">

>> step = Step.create(type: 'Article', course: course)
=> #<Step id: 4, type: "Article", course_id: 1, position: 4, title: nil,
          body: nil, asset_id: nil, url: nil, copyright: nil>

>> article = step.becomes(Article)
>> article.update_attributes(title: '...', body: '...', copyright: '...')
=> true

Since we need to create content items in many places the codebase, not least the separate admin controllers we have for each type, I added a class method to all the types that include Content to encapsulate this dance we must do to create items:

module Content
  included do
    # ...

    def self.create_for_course(course, attributes = {})
      content = new(attributes)
      if content.valid?
        step = Step.create(type: name, course: course)
        instance = step.becomes(self)
        instance.update_attributes(attributes)
        instance
      else
        step = content.becomes(Step)
        step.course = course
        step.type = name
        step.becomes(self)
      end
    end

    # ...
  end
end

Note that this only works because there is one type – Step – that can set all the non-nullable fields. If the non-null fields were spread across both Step and content classes, then we would not be able to hide their surplus attributes in the way we have without breaking our ability to create objects.

Trouble at t’mill

Well, so far so good. We’ve migrated the models so their API reflects the planned schema, without changing the database, and we can still create all the objects that we need. But there’s one more stumbling block in store. It’s not just Rails that uses the public model APIs to drive data management; factory_girl, does the same thing and is affected by our API changes.

Here are the factory definitions from the original model that we started with. A course needs a title, a step needs a course and a title, and all the Step subclasses reuse the step factory and set some type-specific attributes.

FactoryGirl.define do
  sequence(:title) { |n| "title-#{n}" }

  factory :course do
    title
  end

  factory :step do
    course
    title
  end

  factory :article, parent: :step, class: Article do
    body      'article-body'
    copyright 'article-copyright'
  end

  factory :video, parent: :step, class: Video do
    asset_id  { rand(100) }
    copyright 'video-copyright'
  end

  factory :exercise, parent: :step, class: Exercise do
    body 'exercise-body'
    url  'http://www.example.com/exercise'
  end
end

The new model differs from this: step no longer has a title, and that attribute has been moved into the various content tables. We’re no longer using STI, so we have to explicitly set the type attribute on the step factory to prevent the database rejecting null values. And, as we saw above, we cannot create content items on their own: we must create a Step first and then convert it to another type. It turns out we can model this by making a step factory for each content type, and then a factory for that type based on this step.

For example, exercise_step inherits from step but sets type to Exercise. The exercise factory sets the step association using the exercise_step factory, then sets the exercise-specific properties.

FactoryGirl.define do
  # ...

  factory :step do
    course
    type 'Step'
  end

  factory :exercise_step do
    type 'Exercise'
  end

  factory :exercise do
    step  factory: :exercise_step
    title 'exercise-title'
    body  'exercise-body'
    url   'http://www.example.com/exercise'
  end

  # and so on for other content types
end

To make this work, we need to give content objects a way to set and retrieve their associated Step. Assigning a Step to a Content object means copying its attributes over; remember a Step and its content are the same database row. We’ll add some methods to Content to achieve this.

module Content
  def step
    becomes(Step)
  end

  def step=(step)
    step.attributes.each do |key, value|
      write_attribute(key, value) unless value.nil?
    end
    @new_record = false
  end
end

But this still does not quite work. If we try to create an Exercise object then this happens:

     Failure/Error: let(:exercise) { FactoryGirl.create(:exercise) }
     NoMethodError:
       `course_id' cannot be called on Exercise objects

It seems as though FactoryGirl is trying to populate all the steps table attributes on Exercise, and therefore accessing attributes that we’ve hidden via NoMethodError. This seems like a blocker on this whole strategy; if we can’t make FactoryGirl work, then we either have a lot of tests to rewrite or we need to find another way to carry out this refactoring.

Thinking back to what I was actually trying to do, I realised there was another way. The purpose of the restrict_attrs interface is to stop controllers and views calling step.title and article.url=, i.e. accessing hidden attributes via the public API. By making all these calls raise NoMethodError, I’d made it so that there was no way whatsoever of calling these methods. But, Rails and FactoryGirl are probably not calling the methods like that; they most likely use Object#__send__ to call methods dynamically after reflecting on the schema. __send__ does not require that the methods be public.

So, if we just made the methods private, rather than non-existent, that would stop our application code misbehaving while still allowing frameworks to work with the code. Let’s change the definition of Content::Attributes.hide_method to do this, definition the method first if it does not already exist:

module Content
  module Attributes
    included do
      # ...

      def self.hide_method(name)
        unless instance_methods.map(&:to_s).include?(name)
          define_method(name) { |*a, &b| super(*a, &b) }
        end
        private name
      end

      # ...
    end
  end
end

And now, the factories work and all our application code can continue to work with the new model API without accessing anything it shouldn’t. Making the methods private rather than raising an error message means I couldn’t generate an explanatory error message any more, but this turned out not to be as big a problem as I’d anticipated.

Guilt by association

Step is a central class in our domain model, and so a lot of things are attached to it. For example, some content types (but not all) can have related links. This was implemented something like this:

class Step < ActiveRecord::Base
  has_many :related_links

  def can_have_related_links?
    true
  end
end

class RelatedLink < ActiveRecord::Base
  belongs_to :step
end

The can_have_related_links? method is there so that generic controllers can check whether the content type at hand can have related links, since all things that inherit from Step have a related_links association but not all of them should use it. Classes without such associations would override this:

class Exercise < ActiveRecord::Base
  def can_have_related_links?
    false
  end
end

There were many things associated to Step; some were course content items and some related to learners’ interaction with the course – comments, progress records and so on. I decided to split content-related associations off onto the content classes, to RelatedLink would belong to an Article or Video, but not an Exercise per the above implementation.

I wanted to make RelatedLink appear to belong to a content item, rather than a step, without changing the database. That means continuing to use step_id as the foreign key, but that’s okay because a step_id will currently uniquely identify a content item in the steps table. We just need to hide the step association and wrap it with an API that uses content items instead of steps.

module Content
  module BelongsToContent

    extend ActiveSupport::Concern

    included do
      belongs_to :step
      private :step, :step=
    end

    def content
      step.becomes(step.type.constantize)
    end

    def content=(content)
      self.step = content && content.step
    end

  end
end

We can mix that into RelatedLink and any other models we want to hang off of content items:

class RelatedLink
  include Content::BelongsToContent
end

On the has_many side of the association, we want to remove the related_links association from Step and add it to the appropriate content types, but maintain the can_have_related_links? API for the time being, without writing a lot of boilerplate. I wanted all content types to support that API, but only those with related links to actually have a related_links association.

class Article
  has_related_links
end

Since all content types include the Content module, we can add some helper methods there to support this annotation. I also added an any_related_links? method as that would simplify some generic template code.

module Content
  included do
    # ...

    def self.has_related_links
      has_many(:related_links, foreign_key: :step_id)
    end
  end

  def can_have_related_links?
    self.class.reflect_on_association(:related_links).present?
  end

  def any_related_links?
    can_have_related_links? && related_links.any?
  end
end

So now a content object has a related_links array, and a RelatedLink has a content attribute. When we migrate later, we’d replace step_id on associated tables with a (content_type, content_id) pair and migrate all the foreign key values to whatever the IDs of content records in their new tables are, and all the code will continue working.

And finally

This covers most of the support code I used to force myself to update all the controller and view call sites that access step- and content-related data. But it doesn’t solve all problems; there are still code paths that will work given a Step object, since it has the same ID as a content item. For example, various route helpers would previously have been given some content value; it was impossible to get a bare Step object since they would all be instances of a Step subtype. But if we have code like this:

%ul
  - course.steps.each do |step|
    %li= link_to step.title, course_step_path(step)

course_step_path would previously have been called with Article, Video or Exercise values and generate URLs containing /articles/, /videos/ or /exercises/ as appropriate. But now, it will be called only with Step values, and it will generate the wrong URLs, possibly with the wrong IDs once the tables are split apart.

To force myself to amend this code, I made that helper throw if given a Step:

def course_step_path(content)
  unless content.is_a?(Content)
    raise TypeError, 'Only Content values can be passed to course_step_path'
  end
  # ...
end

Making all the helper functions reject Step values forced me to update all the call sites and make the codebase ready for the new schema. When the migration was rolled out, it was mostly a case of updating the association definitions in the models to use new foreign keys, updating the factory definitions, and removing all the support code I detailed above, leaving Rails to faithfully reflect the true state of the database. This second stage was not without its difficulty, but isolating the controllers and views from these changes so they could be rolled out separately helped to break a big piece of work into smaller chunks, while creating minimal disruption for everyone else continuing to work on the app.

Wrapping up

Let’s recap what we did to migrate from a single multi-purpose steps table to many content-specific ones:

  • Stop Rails from using the type column to infer the model class to instantiate, allowing the base class (Step) to be instantiated.
  • Use the type and id fields to fake a belongs_to association on Step, making the row appear to be two objects.
  • Stop content types inheriting from the base class and set their table_name by hand.
  • Set a default_scope on the content types to restrict their queries based on the type column.
  • Extract the table_name and default_scope settings, and content-related Step code, into a mixin for code common to all content types.
  • Lock down the content models’ APIs to not expose unwanted steps columns, by making any properties that a type should not respond to raise NoMethodError.
  • Create an API for making new records, which deals with the required base class properties, before converting the object to another type to add the other attributes.
  • Fix problems with FactoryGirl by making deprecated methods private, and providing a step association on content.
  • Deal with records that should belong to content items by wrapping their existing step association in another named content.
  • Make view helpers, and other interfaces that use functions rather than methods, refused to accept Step values when they want content.

It may be tricky, but there are ways to work around Rails’s coupling and isolate your app from changes to the database. In our case, splitting this project into smaller chunks was instrumental in getting it done at all.

Category Making FutureLearn

Comments (8)

0/1200

  • Anton

    I did not understand. The goal was to move subclasses to their own tables. But in the end you said that “However, we still need to tell Rails they live in the step table: class Article < ActiveRecord::Base self.table_name = 'steps'" What problem did you solved?

  • L

    Could you wrap up your example in a github repo? Would be awesome.

  • Chris

    I’d recommend using something like https://github.com/hzamani/active_record-acts_as for true multiple table inheritance

  • Lenart

    Excellent article! I was looking for multiple table inheritance but insights like yours are always a pleasure to read.

  • Gunnar Thor

    Fantastic post. Thanks a ton for sharing your experience in such a detailed, clear way.

  • Bo-oz

    Great post! Just one question though… what if you would build this system from scratch, would you still choose a single table structure for the step content? Or is this only necessary because you guys already had the content in a single table?

  • Rohan Daxini

    Awesome post James, thanks for sharing your experience with STI and this giant refactoring. Indeed good learnings.

  • an

    Good read! Here’s also a nice post about toying with STI (a way to avoid creating additional columns for child classes using RoR 4 and PostgreSQL 9.3, with an example based on social media users data.) https://netguru.co/blog/renewed-life-for-sti-with-postgresql-json-type