An in-depth look at quickcheck-state-machine
28 January 2019 46 mins read
Please note: this post originally appeared on the Well-Typed blog. Stateful APIs are everywhere: file systems, databases, widget libraries, the list goes on. Automated testing of such APIs requires generating sequences of API calls, and when we find a failing test, ideally shrinking such a sequence to a minimal test case. Neither the generation nor the shrinking of such sequences is trivial. After all, it is the very nature of stateful systems that later calls may depend on earlier calls: we can only add rows to a database table after we create it, we can only write to a file after we open it, etc. Such dependencies need to be tracked carefully. Moreover, in order to verify the responses we get back from the system, the test needs to maintain some kind of internal representation of what it thinks the internal state of the system is: when we read from a file, we need to know what was in the file in order to be able to verify if the response was correct or not.
In this blog post we will take an in-depth look at quickcheck-state-machine
, a library for testing stateful code. Our running example will be the development of a simple mock file system that should behave identically to a real file system. Although simple, the example will be large enough to give us an opportunity to discuss how we can verify that our generator is producing all test cases we need, and how we can inspect whether the shrinker is doing a good job; in both cases, test case labelling will turn out to be essential. Throughout we will also discuss design patterns for quickcheck-state-machine tests which improve separation of concerns and reduce duplication. It should probably be pointed out that this is an opinionated piece: there are other ways to set things up than we present here.
We will not show the full development in this blog post, and instead focus on explaining the underlying concepts. If you want to follow along, the code is available for download. We will assume version 0.6 of quickcheck-state-machine, which was recently released. If you are using an older version, it is recommended to upgrade, since the newer version includes some important bug fixes, especially in the shrinker.
Introducing the running example
Our running example will be the development of a simple mock file system; the intention is that its behaviour is identical to the real file system, within the confines of what it needs to support. We will represent the state of the file system as
data Mock = M {
dirs :: Set Dir
, files :: Map File String
, open :: Map MHandle File
, next :: MHandle
}
type MHandle = Int
emptyMock :: Mock
emptyMock = M (Set.singleton (Dir [])) Map.empty Map.empty 0
We record which directories (folders) exist, the contents of all files on the system, the currently open handles (where mock handles are just integers), and the next available mock handle. To avoid confusion between files and directories we do not use FilePath but instead use
data Dir = Dir [String]
data File = File {dir :: Dir, name :: String}
As one example, here is the mock equivalent of readFile:
type MockOp a = Mock -> (Either Err a, Mock)
mRead :: File -> MockOp String
mRead f m@(M _ fs hs _)
| alreadyOpen = (Left Busy , m)
| Just s <- Map.lookup f fs = (Right s , m)
| otherwise = (Left DoesNotExist , m)
where
alreadyOpen = f `List.elem` Map.elems hs
We first check if there is an open handle to the file; if so, we disallow reading this file (“resource busy”); if the file exists, we return its content; otherwise we report a “does not exist” error. The implementation of the other mock functions is similar; the full API is
mMkDir :: Dir -> MockOp ()
mOpen :: File -> MockOp MHandle
mWrite :: MHandle -> String -> MockOp ()
mClose :: MHandle -> MockOp ()
mRead :: File -> MockOp String
Finally, we should briefly talk about errors; the errors that the mock file system can report are given by
data Err = AlreadyExists | DoesNotExist | HandleClosed | Busy
and they capture a subset of the IO exceptions
fromIOError :: IOError -> Maybe Err
fromIOError e =
case ioeGetErrorType e of
GHC.AlreadyExists -> Just AlreadyExists
GHC.NoSuchThing -> Just DoesNotExist
GHC.ResourceBusy -> Just Busy
GHC.IllegalOperation -> Just HandleClosed
_otherwise -> Nothing
Testing
Typically we are developing some stateful code, and then write a pure (mock) implementation of the same thing to test it, making sure that the stateful implementation and the simpler pure model compute the same things. Here we are doing the opposite: we are adjusting the model (the mock file system) to match what the real file system does. Either way, the process is the same: we write tests, execute them both in the real thing and in the model, and compare results.
If we were writing unit tests, we might write tests such as
- Write to two different files
- Write to a file and then read it
- etc.
However, as John Hughes of QuviQ – and one of the original authors of QuickCheck – likes to point out, as systems grow, it becomes impossible to write unit tests that test the interaction between all the features of the system. So, don’t write unit tests. Instead, generate tests, and verify properties.
To generate tests for our mock file system, we have to generate sequences of calls into the API; “open this file, open that file, write to the first file we opened, …”. We then execute this sequence both against the mock file system and against the real thing, and compare results. If something goes wrong, we end up with a test case that we can inspect. Ideally, we should then try to reduce this test to try and construct a minimal test case that illustrates the bug. We have to be careful when shrinking: for example, when we remove a call to open from a test case, then any subsequent writes that used that file handle must also be removed. A library such as quickcheck-state-machine
can be used both to help with generating such sequences and, importantly, with shrinking them.
Reifying the API
It is important that we generate the test before executing it. In other words, the test generation should not depend on any values that we only get when we run the test. Such a dependency makes it impossible to re-run the same test multiple times (no reproducible test cases) or shrink tests to obtain minimal examples. In order to do this, we need to reify the API: we need to define a data type whose constructors correspond to the API calls:
data Cmd h =
MkDir Dir
| Open File
| Write h String
| Close h
| Read File
Cmd
is polymorphic in the type of handles h; this is important, because we should be able to execute commands both against the mock file system and against the real file system:
runMock :: Cmd MHandle -> Mock -> (.. Mock)
runIO :: .. -> Cmd IO.Handle -> IO ..
What should the return type of these functions be? After all, different functions return different things: Open returns a new handle, Read returns a string, the other functions return unit. To solve this problem we will simply introduce a union type
data Success h = Unit () | Handle h | String String
A response is then either a succesful response or an error:
newtype Resp h = Resp (Either Err (Success h))
It is now easy to implement runMock: we just map all the constructors in Cmd to the corresponding API calls, and wrap the result in the appropriate constructor of Success:
runMock :: Cmd MHandle -> Mock -> (Resp MHandle, Mock)
runMock (MkDir d) = first (Resp . fmap Unit) . mMkDir d
runMock (Open f) = first (Resp . fmap Handle) . mOpen f
runMock (Write h s) = first (Resp . fmap Unit) . mWrite h s
runMock (Close h) = first (Resp . fmap Unit) . mClose h
runMock (Read f) = first (Resp . fmap String) . mRead f
where first :: (a -> b) -> (a, x) -> (b, x)
comes from Data.Bifunctor
.
We can write a similar interpreter for IO; it will take a FilePath as an additional argument that it will use as a prefix for all paths; we will use this to run the IO test in some temporary directory.
runIO :: FilePath -> Cmd IO.Handle -> IO (Resp IO.Handle)
References
Our interpreter for IO takes real IO handles as argument; but will not have any real handles until we actually run the test. We need a way to generate commands that run in IO but don’t use real handles (yet). Here is where we see the first bit of infrastructure provided by quickcheck-state-machine
, references:
data Reference a r = Reference (r a)
where we will instantiate that r
parameter to either Symbolic
or Concrete
:
data Symbolic a = Symbolic Var
data Concrete a = Concrete a
In other words, a Reference a Concrete
is really just a wrapper around an a
; indeed, quickcheck-state-machine
provides
reference :: a -> Reference a Concrete
concrete :: Reference a Concrete -> a
However, a Reference a Symbolic
is a variable:
newtype Var = Var Int
An example of a program using symbolic references is
openThenWrite :: [Cmd (Reference IO.Handle Symbolic)]
openThenWrite = [
Open (File (Dir []) "a")
, Open (File (Dir []) "b")
, Write (Reference (Symbolic (Var 0))) "Hi"
]
This program corresponds precisely to our example from earlier: “open this file, open that file, then write to the first file we opened”. Commands can return as many symbolic references in their result values as they want
Open
creates a new reference, and so Var 0
returns to the handle returned by the first call to Open
.When we execute the test, those variables will be instantiated to their real values, turning symbolic references into concrete references. We will of course not write programs with symbolic references in them by hand; as we will see later, quickcheck-state-machine
provides infrastructure for doing so.
Since we will frequently need to instantiate Cmd
and Resp
with references to handles, we will introduce some special syntax for this:
newtype At f r = At (f (Reference IO.Handle r))
type f :@ r = At f r
For example, here is a wrapper around runIO
that we will need that executes a command with concrete references:
semantics :: FilePath -> Cmd :@ Concrete -> IO (Resp :@ Concrete)
semantics root (At c) = (At . fmap reference) <$>
runIO root (concrete <$> c)
This is really just a call to runIO
, with some type wrapping and unwrapping.
Relating the two implementations
When we run our tests, we will execute the same set of commands against the mock implementation and in real IO, and compare the responses we get after each command. In order to compare, say, a command “write to this MHandle
” against the mock file system to a command “write to this IOHandle
” in IO, we need to know the relation between the mock handles and the IO handles. As it turns out, the most convenient way to store this mapping is as a mapping from references to the IO handles (either concrete or symbolic) to the corresponding mock handles.
type HandleRefs r = [(Reference IO.Handle r, MHandle)]
(!) :: Eq k => [(k, a)] -> k -> a
env ! r = fromJust (lookup r env)
Then to compare the responses from the mock file system to the responses from IO we need to keep track of the state of the mock file system and this mapping; we will refer to this as the model for our test:
data Model r = Model Mock (HandleRefs r)
initModel :: Model r
initModel = Model emptyMock []
The model must be polymorphic in r
: during test generation we will instantiate r
to Symbolic
, and during test execution we will instantiate r
to Concrete
.
Stepping the model
We want to work towards a function
transition :: Eq1 r => Model r -> Cmd :@ r -> Resp :@ r -> Model r
to step the model; we will gradually build up towards this. First, we can use the model to translate from commands or responses in terms of references to the corresponding commands or responses against the mock file system:
toMock :: (Functor f, Eq1 r) => Model r -> f :@ r -> f MHandle
toMock (Model _ hs) (At fr) = (hs !) <$> fr
Specifically, this can be instantiated to
toMock :: Eq1 r => Model r -> Cmd :@ r -> Cmd MHandle
which means that if we have a command in terms of references, we can translate that command to the corresponding command for the mock file system and execute it:
step :: Eq1 r => Model r -> Cmd :@ r -> (Resp MHandle, Mock)
step m@(Model mock _) c = runMock (toMock m c) mock
In order to construct the full new model however we also need to know how to extend the handle mapping. We can compute this by comparing the response we get from the “real” semantics (Resp :@ r
) to the response we get from the mock semantics (from step
), and simply zip the handles from both responses together to obtain the new mapping. We wrap all this up into an event:
data Event r = Event {
before :: Model r
, cmd :: Cmd :@ r
, after :: Model r
, mockResp :: Resp MHandle
}
and we construct an event from a model, the command we executed, and the response we got from the real implementation:
lockstep :: Eq1 r
=> Model r
-> Cmd :@ r
-> Resp :@ r
-> Event r
lockstep m@(Model _ hs) c (At resp) = Event {
before = m
, cmd = c
, after = Model mock' (hs <> hs')
, mockResp = resp'
}
where
(resp', mock') = step m c
hs' = zip (toList resp) (toList resp')
The function we mentioned at the start of this section is now easily derived:
transition :: Eq1 r => Model r -> Cmd :@ r -> Resp :@ r -> Model r
transition m c = after . lockstep m c
as well as a function that compares the mock response and the response from the real file system and checks that they are the same:
postcondition :: Model Concrete
-> Cmd :@ Concrete
-> Resp :@ Concrete
-> Logic
postcondition m c r = toMock (after e) r .== mockResp e
where
e = lockstep m c r
We will pass this function to quickcheck-state-machine
to be run after every command it executes to make sure that the model and the real system do indeed return the same responses; it therefore does not need to be polymorphic in r
. (Logic
is a type introduced by quickcheck-state-machine
; think of it as a boolean with some additional information, somewhat similar to QuickCheck’s Property
type.)
Events will also be very useful when we label our tests; more on that later.
Constructing symbolic responses
We mentioned above that we will not write programs with symbolic references in it by hand. Instead what will happen is that we execute commands in the mock file system, and then replace any of the generated handles by new variables. Most of this happens behind the scenes by quickcheck-state-machine
, but we do need to give it this function to construct symbolic responses:
symbolicResp :: Model Symbolic
-> Cmd :@ Symbolic
-> GenSym (Resp :@ Symbolic)
symbolicResp m c = At <$> traverse (const genSym) resp
where
(resp, _mock') = step m c
This function does what we just described: we use step to execute the command in the mock model, and then traverse the response, constructing a new (fresh) variable for each handle. GenSym
is a monad defined in quickcheck-state-machine
for the sole purpose of generating variables; we won’t use it anywhere else except in this function.
Generating commands
To generate commands, quickcheck-state-machine
requires a function that produces the next command given the current model; this function will be a standard QuickCheck
generator. For our running example, the generator is easy to write:
generator :: Model Symbolic -> Maybe (Gen (Cmd :@ Symbolic))
generator (Model _ hs) = Just $ QC.oneof $ concat [
withoutHandle
, if null hs then [] else withHandle
]
where
withoutHandle :: [Gen (Cmd :@ Symbolic)]
withoutHandle = [
fmap At $ MkDir <$> genDir
, fmap At $ Open <$> genFile
, fmap At $ Read <$> genFile
]
withHandle :: [Gen (Cmd :@ Symbolic)]
withHandle = [
fmap At $ Write <$> genHandle <*> genString
, fmap At $ Close <$> genHandle
]
genDir :: Gen Dir
genDir = do
n <- QC.choose (0, 3)
Dir <$> replicateM n (QC.elements ["x", "y", "z"])
genFile :: Gen File
genFile = File <$> genDir <*> QC.elements ["a", "b", "c"]
genString :: Gen String
genString = QC.sized $ \n -> replicateM n (QC.elements "ABC")
genHandle :: Gen (Reference IO.Handle Symbolic)
genHandle = QC.elements (map fst hs)
A few comments on the generator:
- When we generate paths, we choose from a very small set of directory and file names. We are not really interested in testing that, for example, our implementation is Unicode-safe here; by picking from a small known set we generate tests that are more likely to contain multiple references to the same file, without having to add special provisions to do so.
- We cannot, of course, generate handles out of nowhere; fortunately, the model tells us which handles we have available, and so
genHandle
just picks one at random from that. We do not limit the generator to picking open handles: it is important to also test the behaviour of the system in the error case where a program attempts to write to a closed handle. - The elements function from
QuickCheck
, which picks a random element from a list, is partial: it will throw an error if the list is empty. We must be careful not to generate any commands requiring handles when we have no handles yet. - The reason for the
Maybe
in the type signature is that in some circumstances a generator might be unable to generate any commands at all given a particular model state. We don’t need to take advantage of this feature and therefore always return Just a generator.
We can also define a shrinker for commands, but we can start with a shrinker that says that individual commands cannot be shrunk:
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ _ = []
Putting it all together
In order to actually start generating and running tests, quickcheck-state-machine wants us to bundle all of this functionality up in a single record:
sm :: FilePath -> StateMachine Model (At Cmd) IO (At Resp)
sm root = StateMachine {
initModel = initModel
, transition = transition
, precondition = precondition
, postcondition = postcondition
, invariant = Nothing
, generator = generator
, distribution = Nothing
, shrinker = shrinker
, semantics = semantics root
, mock = symbolicResp
}
We have defined all of these functions above, with the exception of precondition
. When quickcheck-state-machine
finds a failing test, it will try to shrink it to produce a smaller test case. It does this by trying to remove commands from the program and then checking if the resulting program still “makes sense”: does the precondition of all commands still hold?
The precondition should be as liberal as possible; for instance, the precondition should not require that a file handle is open before writing to it, because we should also check that errors are handled correctly. Typically, the only thing the precondition should verify is that commands do not refer to non-existent handles (i.e., if we remove a call to open, then subsequent uses of the handle returned by that call to open simply cannot be executed anymore). Thus, we will define:
precondition :: Model Symbolic -> Cmd :@ Symbolic -> Logic
precondition (Model _ hs) (At c) =
forall (toList c) (`elem` map fst hs)
Aside. Although we do not need it for our running example, one other thing a precondition might do is rule out test cases that we explicitly don’t want to test. For example, if our mock file system throws an error in some cases because a particular combination of arguments to a call is simply not supported, then we don’t want to flag this as a bug. A clean way to implement this is to extend the error type with a field that marks an error as an intentional limitation; then the precondition can simply rule out any commands that (in the model) would return an error with this mark. This keeps the definition of the precondition itself simple, and the logic of what is and what isn’t an intentional limitation lives in the implementation of the model itself.
Running the tests
Apart from some additional type class instances (see the code), we are now ready to define and execute the actual test:
prop_sequential :: FilePath -> QC.Property
prop_sequential tmpDir =
forAllCommands (sm rootUnused) Nothing $ \cmds ->
QC.monadicIO $ do
tstTmpDir <- liftIO $ createTempDirectory tmpDir "QSM"
let sm' = sm tstTmpDir
(hist, _model, res) <- runCommands sm' cmds
prettyCommands sm' hist
$ checkCommandNames cmds
$ res QC.=== Ok
rootUnused :: FilePath
rootUnused = error "root not used during command generation"
All functions prefixed QC
are standard QuickCheck
functions. The others come from quickcheck-state-machine
:
forAllCommands
usesQuickCheck
’sforAllShrink
and instantiates it with the command generation and shrinking infrastructure fromquickcheck-state-machine
.runCommands
then executes the generated commands, validating the postcondition at every step.prettyCommands
renders those commands in case of a test failure.checkCommandNames
adds some statistics about the distribution of commands in the generated tests.
We can run the test in ghci
:
> quickCheck (prop_sequential "./tmp")
+++ OK, passed 100 tests:
56% coverage
3% [("MkDir",1),("Read",1)]
3% []
2% [("MkDir",1),("Open",1),("Read",1)]
2% [("MkDir",1)]
1% [("Close",1),("MkDir",1),("Open",5),("Read",4),("Write",2)]
...
It tells us that all tests passed, and gives us some statistics about the tests that were executed: in 3% of the cases, they contained a single MkDir
and a single Read
, 3% were completely empty, 2% contained one call to MkDir
, one call to Open
, one call to Read
, and so on.
Labelling
At this point you might conclude that you’re done. We have the real implementation, we have the mock implementation, they return identical results for all tests, what else could we want?
Let’s think back to those unit tests we were on the verge of writing, but stopped just in time because we remembered that we should generate unit tests, not write them:
- Write to two different files
- Write to a file and then read it
How do we know that our generated tests include these two cases (and all the other unit tests that we would have written)? We get some statistics from quickcheck-state-machine
, but it’s not terribly helpful. For example, the first line above tells us that 3% of our test cases contain one call to MkDir
and one call to Read
; but we know that that call to Read
must fail, because if these are the only two commands we execute, there aren’t any files for that Read
to read.
The solution is to label tests. For example, we might introduce labels, or tags, that correspond to the two unit tests above:
data Tag = OpenTwo | SuccessfulRead
We then need to write a function that looks at a particular test case and checks which labels apply. It is important to realize that this does not mean we are bringing back the same unit tests under a different guise: programs that will be labelled OpenTwo
must write to two different files, but may also do a whole bunch of other things.
We can use use the foldl
package to write such a labelling function in a convenient manner. The labelling function will look at the Events
and produce Tags
. To check if we open (at least) two different files, we keep track of all the successful calls to open; for SucessfulRead
we simply remember if we have seen a call to Read
with a non-error result:
tag :: [Event Symbolic] -> [Tag]
tag = Foldl.fold $ catMaybes <$> sequenceA [
openTwo
, successfulRead
]
where
openTwo :: Fold (Event Symbolic) (Maybe Tag)
openTwo = Fold update Set.empty extract
where
update :: Set File -> Event Symbolic -> Set File
update opened ev =
case (cmd ev, mockResp ev) of
(At (Open f), Resp (Right _)) ->
Set.insert f opened
_otherwise ->
opened
extract :: Set File -> Maybe Tag
extract opened = do
guard (Set.size opened >= 2)
return $ OpenTwo
successfulRead :: Fold (Event Symbolic) (Maybe Tag)
successfulRead = Fold update False extract
where
update :: Bool -> Event Symbolic -> Bool
update didRead ev = didRead ||
case (cmd ev, mockResp ev) of
(At (Read _), Resp (Right _)) ->
True
_otherwise ->
False
extract :: Bool -> Maybe Tag
extract didRead = do
guard didRead
return SuccessfulRead
(For a read to be successful, we must have created the file first – this is a property of the semantics, we don’t need to enforce this in the labeller.)
The commands we get back from the forAllCommands
function of quickcheck-state-machine
are of type Commands
. This is a simple wrapper around a list of Command
; Command
in turn bundles a command (Cmd
) along with its response (and the variables in that response). We can therefore easily turn this into a list of events:
execCmd :: Model Symbolic
-> Command (At Cmd) (At Resp)
-> Event Symbolic
execCmd model (Command cmd resp _vars) =
lockstep model cmd resp
execCmds :: Commands (At Cmd) (At Resp) -> [Event Symbolic]
execCmds = \(Commands cs) -> go initModel cs
where
go :: Model Symbolic
-> [Command (At Cmd) (At Resp)]
-> [Event Symbolic]
go _ [] = []
go m (c : cs) = e : go (after e) cs
where
e = execCmd m c
We can then define a variant on prop_sequential
above that replaces checkCommandNames
with our own labelling function (you could also use both, if you wanted to):
prop_sequential' :: FilePath -> QC.Property
prop_sequential' tmpDir =
forAllCommands (sm rootUnused) Nothing $ \cmds ->
QC.monadicIO $ do
tstTmpDir <- liftIO $ createTempDirectory tmpDir "QSM"
let sm' = sm tstTmpDir
(hist, _model, res) <- runCommands sm' cmds
prettyCommands sm' hist
$ QC.tabulate "Tags" (map show $ tag (execCmds cmds))
$ res QC.=== Ok
Here tabulate
is the standard QuickCheck
function for adding multiple labels to a test case. If we now re-run our tests, we get
> quickCheck (prop_sequential' "./tmp")
+++ OK, passed 100 tests.
Tags (57 in total):
63% OpenTwo
37% SuccessfulRead
The numbers here are a bit awkward to interpret: it says that across all tests a total of 57 labels were found, and of those 57 labels, 63% were OpenTwo
and 37% were SuccessfulRead
. In other words, 63% 57 = 36 tags were OpenTwo
(36 tests out of 100 were labelled as OpenTwo
), and 37% 57 = 21 tags were SuccessfulRead
(21 tests out of 100 were labelled as SuccessfulRead
). Note that it is perfectly possible for these two sets of tests to overlap (i.e., a single program can be tagged as both OpenTwo
and SuccessfulRead
).
Inspecting the labelling function
So we have a generator, and we have labels for the unit tests that we didn’t write; have we covered our bases now? Well, how do we know that our labelling function is correct? We might seem to descent into infinite regress here, testing the tests that tests the tests… but it would be good to at least have a look at some test case examples and how they are labelled. Fortunately, this is functionality that QuickCheck
provides out of the box through a function called labelledExamplesWith
. We can define a simple wrapper around it that gives us the possibility to specify a particular seed (which will be useful once we start working on shrinking):
showLabelledExamples :: Maybe Int -> IO ()
showLabelledExamples mReplay = do
replaySeed <- case mReplay of
Nothing -> getStdRandom $ randomR (1, 999999)
Just seed -> return seed
putStrLn $ "Using replaySeed " ++ show replaySeed
let args = QC.stdArgs {
QC.maxSuccess = 10000
, QC.replay = Just (QC.mkQCGen replaySeed, 0)
}
QC.labelledExamplesWith args $
forAllCommands (sm rootUnused) Nothing $ \cmds ->
repeatedly QC.collect (tag . execCmds $ cmds) $
QC.property True
Instead of tabulate
we must use collect
to label examples when constructing examples; collect
takes a single label at a time though, so we use my all-time favourite combinator to repeatedly call collect
for all tags:
repeatedly :: (a -> b -> b) -> ([a] -> b -> b)
repeatedly = flip . List.foldl' . flip
(I wonder if I can patent this function?)
When we run this, we see something like
> showLabelledExamples Nothing
Using replaySeed 288187
*** Found example of OpenTwo
Commands [
.. Open (File (Dir []) "b")
, .. Open (File (Dir []) "a")
]
*** Found example of SuccessfulRead
Commands [
.. Open (File (Dir []) "b")
, .. Close (Reference (Symbolic (Var 0)))
, .. Read (File (Dir []) "b")
]
(I’ve tidied up the output a bit and removed some redundant information.)
This is looking good: both of these look like the kind of examples we would have written ourselves for these tags.
Standard shrinking
In fact, if we look at those test cases at the end of the previous section, you might be thinking that those examples look surprisingly good: not only are they indeed instances of those tags, but they are very small test cases: they are pretty much the unit tests that we would have written if we were so misguided as to think we need to write unit tests. Were we simply lucky?
No, we were not. QuickCheck
’s labelledExamples
not only searches for labelled test cases, it also tries to shrink them when it finds one, and continues to shrink them until it can shrink them no further without losing the label. The shrinker itself is implemented in quickcheck-state-machine
; if we disable it altogether and re-run the search for labelled examples, we might find an example such as the following for SuccessfulRead
, where for clarity I’ve marked all failing commands with an asterisk (*):
Commands [
.. Open (File (Dir ["z", "x", "y"]) "c") (*)
, .. MkDir (Dir ["z", "x") (*)
, .. Read (File (Dir ["z", "y", "y"]) "c") (*)
, .. Read (File (Dir []) "c") (*)
, .. MkDir (Dir [ "x" , "x" ]) (*)
, .. Read (File (Dir ["x", "z", "x"]) "b") (*)
, .. MkDir (Dir ["y"])
, .. MkDir (Dir []) (*)
, .. Open (File (Dir ["z"]) "b") (*)
, .. Open (File (Dir []) "a")
, .. MkDir (Dir []) (*)
, .. Open (File (Dir ["x"]) "b") (*)
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir ["x", "y"]) "b") (*)
, .. Open (File (Dir []) "b")
, .. MkDir (Dir ["y"]) (*)
, .. Read (File (Dir []) "a")
, .. MkDir (Dir ["z"])
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 1)))
, .. Open (File (Dir ["z" , "z" , "y"]) "a") (*)
, .. Open (File (Dir ["x" , "x" , "z"]) "c") (*)
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir ["y", "y"]) "b") (*)
, .. Read (File (Dir ["x", "y", "x"]) "a") (*)
]
This is looking significantly less ideal! If there was a bug in read
, then this would certainly not be a very good minimal test case, and not something you would want to debug. So how does quickcheck-state-machine
shrink tests? The basic approach is quite simple: it simply removes commands from the program. If the resulting program contains commands whose precondition is not satisfied (remember, for our running example this means that those commands would use handles that are no longer created) then it discards it as a possible shrink candidate; otherwise, it reruns the test, and if it still fails, repeats the process.
The large percentage of commands that are unsuccessful can easily be removed by quickcheck-state-machine
:
Commands [
.. MkDir (Dir ["y"])
, .. Open (File (Dir []) "a")
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir []) "b")
, .. Read (File (Dir []) "a")
, .. MkDir (Dir ["z"])
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 1)))
, .. Close (Reference (Symbolic (Var 0)))
, .. Close (Reference (Symbolic (Var 0)))
]
Both calls to MkDir
can easily be removed, and the resulting program would still be tagged as SuccessfulRead
; and the same is true for repeated closing of the same handle:
Commands [
.. Open (File (Dir []) "a")
, .. Close (Reference (Symbolic (Var 0)))
, .. Open (File (Dir []) "b")
, .. Read (File (Dir []) "a")
, .. Close (Reference (Symbolic (Var 1)))
]
At this point the shrinker cannot remove the second call to Open
because the second call to Close
depends on it, but it can first remove that second call to Close
and then remove that second call to Open
, and we end up with the minimal test case that we saw in the previous section:
Commands [
.. Open (File (Dir []) "a")
, .. Close (Reference (Symbolic (Var 0)))
, .. Read (File (Dir []) "a")
]
Perfect.
Improving shrinking
Unfortunately, this does not mean that we can depend on quickcheck-state-machine
to solve all our shrinking problems. Consider this run of showLabelledExamples
:
> showLabelledExamples (Just 166205)
Using replaySeed 166205
*** Found example of OpenTwo
Commands [
.. MkDir (Dir ["x"])
, .. Open (File (Dir []) "c")
, .. Open (File (Dir ["x"]) "a")
]
This is indeed an example of a program in which we open at least two files; however, why is that call to MkDir
still there? After all, if there was a bug in opening more than one file, and the “minimal” test case would include a call to MkDir
, that would be a red herring which might send the person debugging the problem down the wrong path.
The reason that quickcheck-state-machine
did not remove the call to MkDir
, of course, is that without it the second call to Open
would fail: it tries to create a file in directory x
, and if that directory does not exist, it would fail. To fix this, we need to tell quickcheck-state-machine
how to shrink individual commands; so far we have been using
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ _ = []
which says that individual commands cannot be shrunk at all. So how might we shrink an Open
call? One idea might be to shrink the directory, replacing for instance /x/y/a
by /x/a
or indeed just /a
. We can implement this using
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ (At cmd) =
case cmd of
Open (File (Dir d) f) ->
[At $ Open (File (Dir d') f) | d' <- QC.shrink d]
_otherwise ->
[]
If we use this shrinker and run showLabelledExamples
a number of times, we will find that all the examples of OpenTwo
are now indeed minimal… until it finds an example that isn’t:
> showLabelledExamples (Just 980402)
Using replaySeed 980402
*** Found example of OpenTwo
Commands [
.. MkDir (Dir ["x"]))
, .. Open (File (Dir []) "a")
, .. Open (File (Dir ["x"]) "a")
]
In this example we cannot shrink the directory to [] because the resulting program would try to open the same file twice, which is not allowed (“resource busy”). We need a better way to shrink this program.
What we want to implement is “try to replace the file path by some file in the (local) root. It is important to realize however that a shrinker such as
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ (At cmd) =
case cmd of
Open _ -> -- BAD EXAMPLE
[At $ Open (File (Dir []) f') | f' <- ["t1", "t2"]]
_otherwise ->
[]
is a bad idea. This shrinker tries to replace any file path with either /t1
or /t2
. However, this means that shrinking now never stops:
Open (File (Dir [] "t1"))
can be shrunk to
Open (File (Dir [] "t2"))
which can then be shrunk back to
Open (File (Dir [] "t1"))
and QuickCheck
will loop when trying to shrink the test case. It is important that there is a clear direction to shrinking.
An approach that does work is the following: any file path that doesn’t start with a t
can be replaced by path /t100
; moreover, any /tN
(for some number N) can be replaced by /tN'
for some number N’ < N:
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker _ (At cmd) =
case cmd of
Open (File (Dir []) ('t' : n)) ->
[openTemp n' | n' <- QC.shrink (read n)]
Open _ ->
[openTemp 100]
_otherwise ->
[]
where
openTemp :: Int -> Cmd :@ Symbolic
openTemp n = At $ Open (File (Dir []) ('t' : show n))
Now
Commands [
.. MkDir (Dir ["x"]))
, .. Open (File (Dir []) "a")
, .. Open (File (Dir ["x"]) "a")
]
can shrink to
Commands [
.. MkDir (Dir ["x"]))
, .. Open (File (Dir []) "a")
, .. Open (File (Dir []) "t100")
]
at which point the call to MkDir
can be removed; this will eventually shrink down to
Commands [
.. Open (File (Dir []) "t0")
, .. Open (File (Dir []) "t1")
]
Dependencies between commands
It’s been a long road, but we are almost there. The last thing we need to discuss is how to shrink programs with dependencies. The “open at least two” example above was relatively easy to shrink because we could shrink one command at the time. Sometimes however there are dependencies between commands. For example, consider this “minimal” example of “successful read”:
> showLabelledExamples (Just 617213)
Using replaySeed 617213
*** Found example of SuccessfulRead
Commands [
.. MkDir (Dir [ "z" ])
, .. Open (File (Dir ["z"]) "b")
, .. Close (Reference (Symbolic (Var 0)))
, .. Read (File (Dir ["z"]) "b")
]
As before, we have a call to MkDir
which should ideally be removed. However, if we tried to change the path in the call to Open
, the call to Read
would fail: both of these commands should refer to the same path. But shrinking can only change one command at a time, and this is important to keep the computational complexity (runtime cost) of shrinking down. What to do?
The problem is that we want that call to Read
to use the same path as the call to Open
, but we have no way to express this. The solution is to make this expressible. After all, we can already express “the handle returned by that call to Open
”; all we need to do is introduce a second kind of reference: a reference to a path.
The language of commands changes to
data Cmd fp h = Read (Expr fp) | -- rest as before
Cmd
gets an additional type argument fp
to record the types of paths, and instead of a File
, Read
now takes an Expr
as argument:
data Expr fp = Val File | Var fp
We can of course still use a concrete path, as before, but we can also use a variable: “use the same path as used in that call to open”. This means that Open
must return that reference, so Success
and Resp
get adjusted accordingly:
data Success fp h = Handle fp h | -- rest as before
newtype Resp fp h = Resp (Either Err (Success fp h))
Just like was the case for handles, when we actually run code all variables have been resolved, so the interpreter isn’t any more difficult:
runMock :: Cmd File MHandle -> Mock -> (Resp File MHandle, Mock)
runMock (Open f) = first (Resp . fmap (Handle f)) . mOpen f
runMock (Read e) = first (Resp . fmap String) . mRead (eval e)
-- rest as before
eval :: Expr File -> File
eval (Val f) = f
eval (Var f) = f
The IO interpreter is modified similarly. Most of the rest of the changes to the code are minor and mostly automatic. For example, At
must now instantiate both parameters
newtype At f r = At (f (Reference File r) (Reference IO.Handle r))
the model must record the mapping from file references now too
type FileRefs r = [(Reference File r, File)]
data Model r = Model Mock (FileRefs r) (HandleRefs r)
See Version2.hs
in the example package for details.
Crucially, we can now take advantage of this in the shrinker: when we see a call to Read
with a file path that we have seen before, we can shrink that to use a variable instead:
shrinker :: Model Symbolic -> Cmd :@ Symbolic -> [Cmd :@ Symbolic]
shrinker (Model _ fs _) (At cmd) =
case cmd of
Read (Val f) ->
[At $ Read (Var r) | r <- mapMaybe (matches f) fs]
-- other cases as before
where
matches :: File -> (r, File) -> Maybe r
matches f (r, f') | f == f' = Just r
| otherwise = Nothing
This means that the problematic example
> showLabelledExamples (Just 617213)
Using replaySeed 617213
*** Found example of SuccessfulRead
Commands [
.. MkDir (Dir [ "z" ])
, .. Open (File (Dir ["z"]) "b")
, .. Close (Reference (Symbolic (Var 1)))
, .. Read (Val (File (Dir ["z"]) "b"))
]
can now shrink to
Commands [
.. MkDir (Dir [ "z" ])
, .. Open (File (Dir ["z"]) "b")
, .. Close (Reference (Symbolic (Var 1)))
, .. Read (Var 0)
]
At this point the shrinking for Open
that we defined above can kick in, replacing z/b
with /t100
, making the call to MkDir
redundant, and the example can ultimately shrink to
Commands [
.. Open (File (Dir []) "t0")
, .. Close (Reference (Symbolic (Var 1)))
, .. Read (Var 0)
]
Conclusions
Writing unit tests by hand is problematic for at least two reasons. First, as the system grows in complexity the number of interactions between the various features will be impossible to capture in hand written unit tests. Second, unit tests can only exercise the system in ways that the author of the unit tests foresees; often, bugs arise when the system is exercised in ways that were not foreseen. It is therefore important not to write unit tests, but rather generate them. We have seen how a library such as quickcheck-state-machine
can assist in generating and shrinking meaningful sequences of API calls.
We have also seen why it is important to label test cases, how to inspect the labelling function, and how to use labelling to improve shrinking. Without writing a shrinker for individual commands, the standard quickcheck-state-machine
shrinker already does a really good job at removing redundant commands. However, if we are serious about producing minimal test cases without red herrings that might lead debugging efforts astray (and we should be serious about that), then we also need to put some thought into shrinking individual commands.
Finally, we have seen how we can improve shrinking by making dependencies between commands explicit. This also serves as an example of a language with multiple types of references; the approach we put forth in this blog post essentially scales to an arbitrary number of types of references without much difficulty.
We have not talked about running parallel tests at all in this blog post. This is an interesting topic in its own right, but affects only the very outermost level of the test: prop_sequential
would get an analogue prop_parallel
, and nothing else would be affected; the quickcheck-state-machine
documentation shows how to do this; the original paper describing the approach (in Erlang) by John Hughes and others is also well worth a read. Finally, quickcheck-state-machine
is not the only library providing this functionality in Haskell; in particular, hedgehog, an alternative to QuickCheck
, does also.
The way we set up the tests in this blog post is not the only way possible, but is one that we believe leads to a clean design. The running example in this blog post is a simplified version of a mock file system that we use in the new Ouroboros consensus layer, where we use it to simulate file system errors when testing a blockchain database. The tests for that blockchain database in turn also use the design patterns laid out in this blog post, and we have used those design patterns also in a test we added to quickcheck-state-machine
itself to test the shrinker. Of course, being Haskellers, we would prefer to turn design patterns into actual code; indeed, we believe this to be possible, but it may require some more advanced type system features. This is something we want to investigate further.
Footnotes
5. The `Close` call now refers to `Var 1` instead of `Var 0`, because `Open` now creates _two_ variables: the reference to the path and the reference to the handle.↩
Recent posts
Six reasons why EUTXO wins by Fernando Sanchez
8 January 2025
Input | Output chief scientist receives prestigious Lovelace computing award by Fergie Miller
3 December 2024
Delivering change in Ethiopia: lessons and reflections by Staff Writer
28 November 2024