Stubs, Mocks and Spies in RSpec

Simon Coffey, a Developer at FutureLearn, discusses testing simple Ruby objects with three types of test double: Stubs, Mocks and Spies.

Software testing is a very broad topic, and opinions differ widely both on what testing achieves, and how to achieve it. But whether we’re doing TDD, BDD or simply “checking that stuff works”, testing conceptually consists of just three activities:

  • Arrange — Set up the environment for our test. This may include instantiating the subject under test; instantiating collaborator objects for the test subject to use; loading fixture data into a database or from a file; and more besides.
  • Act — Provide a stimulus to the subject under test. This could be calling a method on an object, or automatically navigating a website in a browser test – anything that exercises the thing being tested in a predictable way.
  • Assert — Verify that the consequences of the stimulus were what we expected. We might assert that a calculation returns a particular value; that a record in a database is created with certain properties; or that an object in our system receives a certain call, with specified arguments.

In this post, I’m going to talk about some of the options we have for the Arrange phase. The examples will be presented for an RSpec unit test, but the techniques are general, and most are commonplace enough to have library support in a wide variety of languages and testing frameworks.

What test data needs instantiating?

The objects created during test setup can be considered in two groups:

  1. The subject under test
  2. The collaborators it communicates with

Here, I’m going to focus on setup for testing simple Ruby objects. In a follow-up post, I’ll look at setup for testing a more complex type of object: instances of Rails ActiveRecord models.

Subject Under Test

Let’s say we’ve written a Detective class, albeit one without much to do just yet:

class Detective
  def investigate
    "Nothing to investigate :'("
  end
end

Our test setup is about as exciting as you might expect – we just call
.new:

it "doesn't find much" do
  subject = Detective.new

  result = subject.investigate

  expect(result).to eq "Nothing to investigate :'("
end

Enough about the subject under test, then. What about our subject’s…

Collaborators

Our original Detective implementation was pretty boring, not talking to any other objects. Let’s make it a bit more interesting, and give it something to investigate:

class Detective
  def initialize(thingie)
    @thingie = thingie
  end

  def investigate
    "It went '#{@thingie.prod}'"
  end
end

Detective accepts a thingie on construction, and when we call #investigate, it conducts rigorous investigations using the method #prod, before breathlessly reporting its findings – just like a real detective. The thingie can be any object that responds to the #prod method.

What are our options for providing this thingie in our tests?

Real object

Let’s say there’s a concrete Thingie class in our project, which makes a random distressing noise when prodded:

class Thingie
  def prod
    [ "erp!", "blop!", "ping!", "ribbit!" ].sample
  end
end

One option is to provide a real instance of Thingie to our test
subject:

it "says what noise the thingie makes" do
  thingie = Thingie.new
  subject = Detective.new(thingie)

  result = subject.investigate

  expect(result).to match(/It went '(erp|blop|ping|ribbit)!'/)
end

This works, but there are a couple of issues here.

First, our test expresses a lot of detail relating to a specific type of
thingie – we’ve had to account for all of the possible responses
Thingie makes, encoding each of them in our test:

expect(result).to match(/It went '(erp|blop|ping|ribbit)!'/)

As a result our test is more complicated than it needs to be, and it’s not immediately clear which part of the output comes from the Detective, and which from the Thingie – we need to read and understand Thingie’s implementation to understand the assertion in our test.

Second, our test is brittle. If the behaviour of Thingie#prod changes (if it starts making a new noise, say), our test will break, despite the implementation of Detective being correct. There’s little that’s more frustrating when maintaining software than to make a change in one place, and have completely unrelated tests break for no good reason.

Instead of using the most convenient real object, then, let’s think about the absolute minimum our Detective needs to work. All it really cares about is that the thingie object has a single method, #prod, returning something that can be interpolated into a string.

As an alternative to a real Thingie, then, we might provide a test double – a stand-in for whatever might get passed to our Detective in actual code, doing the absolute minimum we require to exhibit the Detective’s behaviour.

Let’s take a look at the kinds of test doubles we might use.

Types of test double

The term “test double” is used to refer to a range of testing techniques, with the common theme that a substitute object is provided to the subject under test, taking the place of the objects it will communicate with in actual use.

Borrowing the definitions provided by Martin Fowler1, types of test double include (but are not limited to):

  • Dummy – a pure placeholder object with no behaviour (it doesn’t respond to any messages). This is used when a method requires an argument but doesn’t interact with it in a specific test.
  • Fake – a replacement object with real behaviour, but taking shortcuts that are helpful for testing purposes (a good example is using an in-memory database for faster testing of database-dependent code).
  • Stub – an object providing canned responses to specified messages.
  • Mock – an object that is given a specification of the messages that it must receive (or not receive) during the test if the test is to pass.
  • Spy – an object that records all messages it receives (assuming it is allowed to respond to them), allowing the messages it should have received to be asserted at the end of a test.

I’m going to ignore dummies and fakes (for the purposes of this post these are too boring and too specialised, respectively), and concentrate on Stubs, Mocks and Spies, providing illustrations of their differences.

Stubs

Here we provide an object that is given a set of canned responses to the messages it supports. We just want our stub to respond to the message #prod, so let’s create one that does so:

it "says what noise the thingie makes" do
  thingie = double(:thingie, prod: "oi")
  subject = Detective.new(thingie)

  result = subject.investigate

  expect(result).to eq "It went 'oi'"
end

Here, #double is an RSpec helper method that creates an object with certain properties that we specify. We can give it a label (useful for debugging, particularly when multiple test doubles are in use), and a hash of supported messages and responses.

In this case we label the double :thingie, so we know the general kind of object it’s intended to represent, and we tell it to respond to the single message #prod, always returning a canned response, "oi".

There are a couple of advantages here over using a real instance. First, we can directly control the responses our test subject receives, so our test won’t break for reasons outside of our control.

Second, the relationship between those responses and our subject’s behaviour is clearer in the test – we can clearly see how the thingie’s response (“oi”) fits into the Detective‘s response, whereas previously we would have needed to inspect the definition of Thingie to fully understand what Detective is supposed to do.

What if we wanted to specify that a Detective should only prod a thingie once, though, even if we ask it to investigate multiple times? (You can’t be too careful.)

One option is to make our stub’s canned response more complicated, adding behaviour to count the number of times #prod gets called:

it "prods the thingie at most once" do
  prod_count = 0
  thingie = double(:thingie)
  allow(thingie).to receive(:prod) { prod_count += 1 }
  subject = Detective.new(thingie)

  subject.investigate
  subject.investigate

  expect(prod_count).to eq 1
end

Here, instead of using a hash of canned responses, we’ve used RSpec’s #allow method to tell our double that it can respond to #prod, and given it a block that increments a counter each time that method is called. Then at the end of the test, we assert a value for the counter.

We can then update our Detective to make the test pass:

class Detective
  def initialize(thingie)
    @thingie = thingie
  end

  def investigate
    @results ||= "It went '#{@thingie.prod}'"
  end
end

This custom counter approach works, but it seems a bit circuitous (and has made our test setup relatively complicated). An alternative would be to use…

Mocks

Here, instead of implementing our own message counter, we place a “message expectation” on our test double, asserting that we expect the #prod method to be called exactly once during the test. We then make calls to our subject, and at the end of each test, RSpec implicitly verifies the test double to make sure that it received the messages it expected:

it "prods the thingie at most once" do
  thingie = double(:thingie)
  expect(thingie).to receive(:prod).once
  subject = Detective.new

  subject.investigate
  subject.investigate
end

This is effectively a generalised version of our counter from the previous example, with the addition that we could, if we wanted, make further assertions about the type of arguments passed (if there were any). If we’re regularly making assertions about communications between objects, it makes sense to use an assertion library that directly supports this.

However, the ordering of our test is a bit messy now. I previously mentioned that tests consist of three phases: Arrange, Act, Assert. For clarity, it’s nice (where possible) to keep these phases strictly separate. Our test seems a bit out of order, though:

it "prods the thingie at most once" do
  # Arrange
  thingie = double(:thingie)
  # Assert
  expect(thingie).to receive(:prod).once
  # Arrange
  subject = Detective.new(thingie)

  # Act
  subject.investigate
  subject.investigate
end

If this bothers you (it bothers me), a solution is to use…

Spies

Spies are similar to mocks, but instead of stating which messages our test double expects to receive at the start of the test, we only specify which messages it is allowed to receive (just like we did for stubs, earlier). The difference is that at this point we’re only saying how our test double behaves; we’re not saying anything about what we expect the outcome of our test to be.

The double then records any messages it receives during Act phase. Finally, in the Assert phase, we can make assertions about the messages it recorded.

Handily enough, RSpec’s test doubles automatically record any messages they receive (assuming they’re allowed to receive them). All we have to do is slightly change our assertions:

it "prods the thingie at most once" do
  # Arrange
  thingie = double(:thingie, prod: "")
  subject = Detective.new(thingie)

  # Act
  subject.investigate
  subject.investigate

  # Assert
  expect(thingie).to have_received(:prod).once
end

In the Arrange phase we’ve created a double that’s allowed to respond to #prod (for the purposes of this test we don’t care what it returns, so I’ve made it return the empty string). Then we Act, calling #investigate twice. Finally in the Assert phase we assert that that message was received only once.

Mocks vs Stubs vs Spies

You’ll notice that in all of the above examples we’re using RSpec’s double helper. The main difference is in the type of assertions that we made, rather than the tool we used. Aren’t mocks, stubs and spies all different things?

I would argue that there’s a more helpful way of looking at it. To me, stubbing, mocking and spying are techniques rather than tools. Specifically, they’re different ways of making assertions while using test doubles. In the examples above, we’re using a single type of test double that supports multiple types of assertion.

When we wanted to verify a property of the return value of a method, we used stubbing to provide a consistent value to our test subject, so we could make a clear assertion about the value it returns.

When we wanted to verify a property of the communication between two objects, we used mocking to make assertions about the messages our test double received.

Finally, when we wanted to verify the same communication property as before, but make our assertions at the end of the test, we used spying.

These have been contrived examples, but hopefully they illustrate the point: the appropriate technique for one test may not be ideal for another. Sometimes there won’t be a clear advantage for any particular technique, and your choice may fall to personal preference, or agreed team style.

However, if the question, “what should I use: a stub, a mock, or a spy?” is proving a hard one to answer, perhaps ask: “what do I want to verify in this test?” instead.

Category Making FutureLearn

Comments (4)

0/1200

  • Geoff Barnes

    Thanks for this!

  • Alma

    Nice. Thanks.

  • Esteban G.

    Nice explanation, thanks!

  • Greg H

    Excellent approach to distinguishing these usages! I see some test refactoring in my future …