Using the Typestate Pattern with Rust Traits
Changes to the state of a value can set the stage for later errors. Sometimes these errors occur during subsequent method calls. Other times, the method call itself is the source of the error. Good type systems can make it possible to move these kinds of state bugs from run time to compile time. Indeed, the typestate pattern is often cited a solution, especially for Rust. Unfortunately, most discussions of the typestate pattern assume concrete types. But what if typestate needs to extend over a family of types? Traits are an obvious choice, but this more complex variation is rarely documented. This article illustrates the use of trait-based typestates in Rust through a simple example.
The Problem: Builder
One of my projects uses a Builder
to construct values through sequential one-character instruction pushes. The starting state of Builder
and the character being pushed determine the transition. When the Done state is reached, the product is returned. Transition to Done can occur either as a result of a push or client-induced finish. The states and their relationships are captured in the following state diagram.
First Pass with Concrete Types
A simple approach would be to model Builder
on a struct with two mutable methods, push
and finish
. The former would accept a char
argument, causing mutation of the product under construction; the latter would return the completed product without mutation. The optional return type of push
allows Builder
to finish after reaching some pre-determined internal state. Similarly, clients can request the product, if available, by calling finish
.
struct Builder;
struct Foo;
impl Builder {
pub fn push(&mut self, char: char) -> Option<Foo> {
// π₯ Runtime error if this is called after error
todo!()
}
pub fn finish(&mut self) -> Option<Foo>{
// π₯ Should further pushes be allowed after an error or a previous
// finish?
todo!()
}
}
Although simple, this approach raises some hard questions:
- Should calls to
push
after an erroneous previous push be allowed, and if so which errors qualify? - Should calling
finish
after an erroneouspush
be allowed? - Should calling
finish
after a previous call tofinish
be allowed?
We could make the answers to these questions part of the documentation for the crate. Unfortunately, this would fail if clients didn't read the documentation, or changes to the code rendered the documentation incorrect.
A sturdier approach would eliminate the possibility of misusing Builder
altogether. For example, the mutable borrow in finish
could be replaced with a move. The compiler now prevents any use of Builder
, mutable or otherwise, after finish
is called. Entering the "finished" state destroys the Builder
.
struct Builder;
struct Foo;
impl Builder {
// None indicates that the
pub fn push(&mut self, char: char) -> Option<Foo> {
// π₯ this can still be called after a previous error
todo!()
}
pub fn finish(self) -> Option<Foo> {
// π Typestate: use after finish is no longer possible
todo!()
}
}
But this leaves the headache of dealing with calls to push
after an error. This problem too can be eliminated by replacing the mutable borrow with a move. But to support that, we need to replace the return type of push
. If Builder
is in fact not done, clients need a way to keep pushing. That can be accomplished with a return type of Target
. Builders that are done return the Target::Product
variant, or the Target::Builder
variant otherwise.
struct Builder;
struct Foo;
impl Builder {
pub fn push(self, char: char) -> Target {
// π Typestate: push can never be called twice.
todo!()
}
pub fn finish(self) -> Option<Foo> {
// π Typestate: use after finish is no longer possible
todo!()
}
}
enum Target {
Builder(Builder),
Product(Foo)
}
This small bit of infrastructure enables something very useful: the clean separation between the construction of a product and the management of the character stream that directs it. No matter how complex the process of managing the flow of characters becomes, Builder
remains completely decoupled from it.
enum Error {
NotDone
}
fn build(target: Target) -> Result<Foo, Error> {
// in production, char comes from an iterator
let char = Some('#');
match target {
Target::Builder(builder) => {
if let Some(char) = char {
build(builder.push(char))
} else {
Ok(builder.finish().ok_or(Error::NotDone)?)
}
}
Target::Product(product) => Ok(product)
}
}
This design is certainly viable, and this is where most discussions of the typestate pattern end. But what happens if we want build
to control more than just the production of values of type Foo
?
The Builder Trait
One way to generalize the construction of products would be to replace the Builder
struct with a Builder
trait.
trait Builder {
fn push(self, char: char) -> Target;
fn finish(self) -> Option<Foo>;
}
Unfortunately, making this change is not going to be simple. We need a way to deal with the problem that values associated with enum variants must have concrete types. Bare traits are not allowed. Attempting to associate an enum variant with a bare trait yields the compiler error: "trait objects must include the dyn
keyword rustc(E0782)".
enum Target {
// π₯ dyn keyword, but how?
Builder(Builder),
Product(Foo)
}
This error and the proposed solution, "add dyn
keyword before this trait", are nevertheless quite misleading. Just prefixing Builder
with the dyn
keyword won't work. There is one and only one way to take the compiler's advice, and that is to replace Builder
with Box<dyn Builder>
.
Boxed Trait Object
The following example shows how to associate a "boxed trait object" with an enum variant.
struct Foo;
trait Builder {
fn push(self, char: char) -> Target;
fn finish(self) -> Option<Foo>;
}
enum Target {
// π₯ This won't ultimately work.
Builder(Box<dyn Builder>),
Product(Foo)
}
But this solution is doomed to failure. Although compilation finishes without error, it's no longer possible to use Target
as before in the build
function.
struct Foo;
trait Builder {
fn push(self, char: char) -> Target;
}
enum Target {
Builder(Box<dyn Builder>),
Product(Foo)
}
enum Error {
NotDone
}
fn build(target: Target) -> Result<Foo, Error> {
// in production, this comes from an iterator
let char = Some('#');
match target {
Target::Builder(builder) => {
if let Some(char) = char {
build(builder.push(char))
} else {
// π₯ method 'done' not found on Box<dyn Builder>
Ok(builder.done().ok_or(Error::NotDone)?)
}
}
Target::Product(product) => Ok(product)
}
}
The push
method no longer works because it's not defined on Box<dyn Builder>
. We could satisfy the compiler by adding an implementation of Builder
to Box<dyn Builder>
, but actually implementing the resulting method is impossible. Although methods can be called on boxed trait objects (the self
parameter here), it's impossible to access any attributes from them.
impl Builder for Box<dyn Builder> {
fn push(self, char: char) -> Target {
// π₯ How to implement this?
todo!()
}
}
The problems don't end there. Once boxed, a value can never be unboxed. Even if we could somehow cobble together a way to give push
the data it needs, moving self
as required by the build
function is impossible.
Trait Type Parameter
Boxed trait objects are clearly a dead end. Fortunately, Rust offers another way to associate a trait with an enum variant: type parameters. The following example shows how.
trait Builder {
fn push(self, char: char) -> Target<Builder>;
fn finish(self) -> Foo
}
enum Target<B: Builder> {
Builder(B),
Product(Foo)
}
Adding the type parameter B
makes it possible to use Target
with build
to get very close to a workable solution.
struct Foo;
enum Error {
NotDone
}
fn build<B: Builder>(target: Target<B>) -> Result<Foo, Error> {
// in production, this comes from an iterator
let char = Some('#');
match target {
Target::Builder(builder) => {
if let Some(char) = char {
build(builder.push(char))
} else {
Ok(builder.done().ok_or(Error::NotDone)?)
}
}
Target::Product(product) => Ok(product)
}
}
Although the above listing compiles and will dutifully produce values of type Foo
given the right implementations, build
can't produce values of any other type. What's needed is a way to abstract the kind of product that Builder
returns.
Abstracting Product Type
Type parameters can also be used with traits. The following listing illustrates how to do this with Builder
to abstract the type of product constructed.
struct FooBuilder;
trait Builder<P>: Sized {
fn push(self, char: char) -> Target<P, Self>;
fn done(self) -> Option<P>;
}
enum Target<T, B: Builder<T>> {
Builder(B),
Product(T)
}
enum Error {
NotDone
}
struct Product;
fn build<P, B: Builder<P>>(target: Target<P, B>) -> Result<P, Error> {
// in production, this comes from an iterator
let char = Some('#');
match target {
Target::Builder(builder) => {
if let Some(char) = char {
build(builder.push(char))
} else {
Ok(builder.done().ok_or(Error::NotDone)?)
}
}
Target::Product(product) => Ok(product)
}
}
To get this solution to work, Builder
itself now needs a type parameter. And because that parameter refers to Self
, Builder
must be bound to the Sized
trait. The documentation notes that:
All type parameters have an implicit bound of Sized. The special syntax ?Sized can be used to remove this bound if itβs not appropriate.
The one exception is the implicit Self type of a trait. A trait does not have an implicit Sized bound as this is incompatible with trait objects where, by definition, the trait needs to work with all possible implementors, and thus could be any size.
One consequence of binding to Sized
is that borrowing is no longer possible. But that's of no concern here because the point was to prevent borrowing in the first place.
The new Builder<P>
trait works until we try to actually implement Builder
:
struct FooBuilder;
trait Builder<P>: Sized {
fn push(self, char: char) -> Target<P, Self>;
fn done(self) -> Option<P>;
}
enum Target<T, B: Builder<T>> {
Builder(B),
Product(T)
}
enum Error {
NotDone
}
impl<P> Builder<P> for FooBuilder {
fn push(self, char: char) -> Target<P, Self> {
todo!()
}
fn done(self) -> Option<P> {
// π₯ How to implement this method?
// We need to return a type parameter, not a value!
todo!()
}
}
fn build<P, B: Builder<P>>(target: Target<P, B>) -> Result<P, Error> {
// in production, this comes from an iterator
let char = Some('#');
match target {
Target::Builder(builder) => {
if let Some(char) = char {
build(builder.push(char))
} else {
Ok(builder.done().ok_or(Error::NotDone)?)
}
}
Target::Product(product) => Ok(product)
}
}
We solve this problem by replacing the type parameter P
, with the actual type being built:
trait Builder<P>: Sized {
fn push(self, char: char) -> Target<P, Self>;
fn done(self) -> Option<P>;
}
enum Target<T, B: Builder<T>> {
Builder(B),
Product(T)
}
enum Error {
NotDone
}
struct Foo;
struct Bar;
struct FooBuilder;
struct BarBuilder;
impl Builder<Foo> for FooBuilder {
fn push(self, char: char) -> Target<Foo, Self> {
Target::Product(Foo)
}
fn done(self) -> Option<Foo> {
Some(Foo)
}
}
impl Builder<Bar> for BarBuilder {
fn push(self, char: char) -> Target<Bar, Self> {
Target::Product(Bar)
}
fn done(self) -> Option<Bar> {
Some(Bar)
}
}
fn test() {
let foo_builder = FooBuilder;
let foo = build(Target::Builder(foo_builder));
let bar_builder = BarBuilder;
let bar = build(Target::Builder(bar_builder));
}
fn build<P, B: Builder<P>>(target: Target<P, B>) -> Result<P, Error> {
// in production, this comes from an iterator
let char = Some('#');
match target {
Target::Builder(builder) => {
if let Some(char) = char {
build(builder.push(char))
} else {
Ok(builder.done().ok_or(Error::NotDone)?)
}
}
Target::Product(product) => Ok(product)
}
}
More Typestate?
How much further should the idea of typestate be taken with this example? It's possible to call finish
when a Builder
isn't actually finished, ultimately yielding a runtime error. Maybe this can be avoided by splitting the Builder
trait further:
trait Push<T, B: Branch<T, Self>>: Sized {
fn push(self, char: char) -> Target<T, Self, B>;
}
trait Branch<T, P: Push<T, Self>>: Sized {
fn push(self, char: char) -> Target<T, P, Self>;
fn finish(self) -> T;
}
enum Target<T, P: Push<T, B>, B: Branch<T, P>> {
Push(P),
Branch(B),
Product(T)
}
This two-trait approach would make it impossible to call finish
until the previous state permits it by retuning a Target::Branch
variant. This seems safer, but at what cost? The design becomes more complex, with two traits rather than one. The type parameter signature is also busier. But the main reason to take this no further is that doing so does not solve the problem of runtime errors. A client expecting a branch state but receiving a push state will still generate a runtime error. Vanishing returns suggest we're done β for now at least.
Conclusion
The typestate pattern is a powerful tool that is especially well-suited to Rust's type system. Applying the pattern to struct-based states is relatively straightforward, but limited. Defining states as traits allows more flexibility, but at the cost of diverging from most documentation on the topic. This article showed one specific approach to using the typestate pattern with traits, overcoming some pitfalls along the way.