Graphs in Rust: Introducing Gamma

Graphs pervade science and technology. As such, many kinds of software projects rely on a graph library. This article introduces Gamma, a new graph library written in Rust.

Motivation

Graphs are used heavily within cheminformatics, a field concerned with the collection, storage, and retrieval of information about substances. Most cheminformatics systems use a library for manipulating molecular structures. Such a library is known as a "cheminformatics toolkit" or just "toolkit." At the core of every cheminformatics toolkit, whether explicitly acknowledged or not, sits a graph library.

A cheminformatics toolkit uses graph representations to solve chemical information problems. For the most part, nodes map to atoms and edges map to bonds. Graphs of this kind are undirected. Numerical edge weights sometimes represent bond orders.

A number of graph algorithms are used in cheminformatics. To name three: subgraph isomorphism; matching; and cycle perception. Isomorphism finds embeddings and atom/bond mappings between two molecules. Matching is used for kekulization, or the assignment of alternating single/double bonds in certain classes of molecule. Many important molecules in medicine and commerce contain one or more cycles, so cycle perception is essential for many molecular analyses.

Most cheminformatics applications deal with small organic molecules. Graph representations of such molecules contain fewer than 50 nodes, with the degree of any node rarely exceeding four. A graph library used in cheminformatics must therefore be capable of efficient manipulation of small, sparsely-connected graphs.

Other applications for graphs are possible. One application is feature graphs. In a feature graph, a node represent a molecular subgraph and an edge represents a bond between them. One especially useful feature graph uses relevant cycles for nodes.

Graph can also represent relationships between molecules. For example, a reaction graph maps nodes to molecules and edges to reactions. Traversing such a graph yields information about feasible multistep chemical transformations, a cornerstone of the modern world economy.

Although these parameters apply specifically to cheminformatics, they also apply to many other fields. By factoring the graph manipulation aspect into a a separate library, it's hoped that Gamma can serve not only its intended domain, but others as well.

No Rust cheminformatics toolkit exists to my knowledge as of yet.

Requirements

A graph library suitable for cheminformatics in Rust should meet several criteria, including:

  • A Graph trait defining a minimal API. Several implementations offer tradeoffs in functionality, convenience, and performance will be required in cheminformatics applications.
  • At least one Graph reference implementation. This implementation can serve as a key primitive when designing graphs (such as molecules) with custom behavior.
  • Traversal and analysis functions that accept Graph trait objects, not concrete implementations.
  • Common Rust idioms should be used whenever possible. Synthetic node references, reference counting, interior mutability, and unsafe should all be avoided. Gamma should be as unsurprising as possible.
  • A suite of up-to-date unit tests.

Graph Trait

The center of Gamma is the Graph trait, which defines the following methods:

use crate::graph::Error;

pub trait Graph<'a, N: 'a> {
    type NodeIterator: Iterator<Item=&'a N>;
    type NeighborIterator: Iterator<Item=&'a N>;
    type EdgeIterator: Iterator<Item=(&'a N, &'a N)>;

    /// Returns true if there are no nodes, or false otherwise.
    fn is_empty(&self) -> bool;

    /// Returns the number of nodes in this graph.
    fn order(&self) -> usize;

    /// Returns the number of edges in this graph.
    fn size(&self) -> usize;

    /// Iterates the nodes of this graph
    fn nodes(&'a self) -> Self::NodeIterator;

    /// Returns true if node is a member, or false otherwise. 
    fn has_node(&self, node: &N) -> bool;

    /// Iterates the neighbors of node.
    fn neighbors(&'a self, node: &N) -> Result<Self::NeighborIterator, Error>;

    /// Returns the number of neighbors connected to node.
    fn degree(&self, node: &N) -> Result<usize, Error>;

    /// Iterates the edges of this graph.
    fn edges(&'a self) -> Self::EdgeIterator;

    /// Returns true if an edge exists between source and target.
    fn has_edge(&self, source: &N, target: &N) -> Result<bool, Error>;
}

This API was chosen to satisfy two opposing goals: utility and brevity. On the one hand, Graph should provide a rich API because it alone defines the features available to all graph traversals and analyses. On the other hand, the number of methods must be kept low to avoid burdening implementors with unnecessary work. A previous post offers some thoughts on the minimal Graph API used here.

The choice of associated types for node and edge iterators deserves some explanation. I considered an alternative Graph trait that returned boxed iterators, as given at the bottom of this article. However, the requirement for boxed iterators seemed limiting. Although trait objects such as Iterator can be returned from bare functions, they must be boxed if returned from functions that are themselves defined on traits. This requirement seemed a step up in complexity with poorly-defined performance tradeoffs. For this reason, associated types were chosen for the first releases of Gamma.

Should edge weights be required, the WeightedGraph trait is available.

use crate::graph::Error;
use crate::graph::Graph;

pub trait WeightedGraph<'a, N:'a, E> : Graph<'a, N> {
    /// Returns the weight between source and target.
    fn weight(&self, source: &'a N, target: &'a N) -> Result<Option<&E>, Error>;
}

HashGraph

HashGraph provides a versatile default implementation of Graph and WeightedGraph. Unlike the traits it implements, HashGraph supplies the mutators for building a graph in a stepwise fashion. Behind the scenes, HashGraph is implemented as a IndexMap of Vec, in which nodes themselves serve as keys. The use of both data structures ensures that node and edge ordering will be stable to iteration.

The build function allows for declarative construction of a HashGraph:

use gamma::graph::HashGraph;

fn main() {
    let mut graph = HashGraph::build(vec![ 0, 1, 2 ], vec![
        (&0, &1, ()),
        (&1, &2, ())
    ]).unwrap();

    assert_eq!(graph.is_empty(), false);
}

Alternatively, mutator methods can be used to accomplish the same result:

use gamma::graph::HashGraph;

fn main() {
    let mut graph = HashGraph::<_,()>::new();

    graph.add_node(0).unwrap();
    graph.add_node(1).unwrap();
    graph.add_node(2).unwrap();
    graph.add_edge(&0, &1).unwrap();
    graph.add_edge(&1, &2).unwrap();

    assert_eq!(graph.has_edge(&1, &0).unwrap(), true);
}

Custom node types can be used directly with HashGraph. Unlike many graph libraries, Gamma lets you work directly with your own node references, rather than node references returned to you by the library. This vastly simplifies the use of custom node types. The only constraint on a node is that it must implement std::hash::Hash and std::cmp::Eq.

use gamma::graph::HashGraph;

#[derive(Eq, Hash, PartialEq, Debug)]
struct Node {
    value: u8
}

impl Node {
    fn new(value: u8) -> Self {
        Node { value }
    }
}

fn main() {
    let n0 = &Node::new(0);
    let n1 = &Node::new(1);
    let graph = HashGraph::<_, ()>::build(
        vec![ n0, n1 ], vec![ (&n0, &n1, ()) ]
    ).unwrap();

    assert_eq!(graph.has_edge(&n0, &n1).unwrap(), true);
}

HashGraph supports the WeightedGraph trait by allowing any type to be used as a weight. For example, the following example shows how to use strings as weights.

use gamma::graph::HashGraph;

fn main() {
    let mut graph = HashGraph::build(vec![ 0, 1, 2 ], vec![
        (&0, &1, "a"),
        (&1, &2, "b")
    ]).unwrap();

    assert_eq!(graph.weight(&0, &1).unwrap(), Some(&"a"));
}

Traversal with Iterators

Two types of traversal are included: breadth-first and depth-first. Both traversals are implemented as edge Iterators. Taking this approach combines the broad applicability of these traversals with the vast array of transformations supported by Rust's Iterator trait.

The edges returned by either traversal Iterator are represented as tuples of three values: two nodes, and a boolean. When true, the last value indicates that the edge produces a cut between two nodes in a cyclic graph.

use gamma::graph::HashGraph;
use gamma::traversal::depth_first;

fn main() {
    let graph = HashGraph::build(vec![ 0, 1, 2 ], vec![
        (&0, &1, ()),
        (&1, &2, ()),
        (&2, &0, ()),
    ]).unwrap();
    let traversal = depth_first(&graph, &0).unwrap();

    assert_eq!(traversal.collect::<Vec<_>>(), vec![
        (&0, &1, false),
        (&1, &2, false),
        (&2, &0, true)
    ]);
}

The above example uses the #collect transformation to create a Vec of edges, but much more is possible. For example, we can filter everything but cycle cut edges:

use gamma::graph::HashGraph;
use gamma:traversal::breadth_first;

fn main() {
    let graph = HashGraph::build(vec![ 0, 1, 2 ], vec![
        (&0, &1, ()),
        (&1, &2, ()),
        (&2, &0, ()),
    ]).unwrap();
    let traversal = depth_first(&graph, &0).unwrap();
    let cuts = traversal.filter(|edge| edge.2);

    assert_eq!(cuts.collect::<Vec<_>>(), vec![
        (&2, &0, true)
    ]);
}

Breath-first traversal works in exactly the same way:

use gamma::graph::HashGraph;
use gamma::traversal::breadth_first;

fn main() {
    let graph = HashGraph::build(vec![ 0, 1, 2 ], vec![
        (&0, &1, ()),
        (&1, &2, ()),
        (&2, &0, ()),
    ]).unwrap();
    let traversal = breadth_first(&graph, &0).unwrap();

    assert_eq!(traversal.collect::<Vec<_>>(), vec![
        (&0, &1, false),
        (&0, &2, false),
        (&1, &2, true)
    ]);
}

Other Work

I recently reviewed Rust's most widely-used graph crate, Petgraph. Aside from being much less mature, Gamma differs in other important ways:

  • Petgraph has no Graph trait.
  • Petgraph's graph types use opaque node pointers rather than the direct node manipulation offered by Gamma.
  • Petgraph has four not-quite-interchangeable graph types.
  • Some Petgraph functions will not work across all graph types.

These limitations made Petgraph less than ideal for the cheminformatics applications I'm planning.

Graphlib offers another take on the Rust graph library. Unfortunately it too organizes itself around a concrete implementation of graph, rather than a trait.

A handful of miscellaneous graph crates can also be found on crates.io.

Gamma's design parts company with things that have been previously written about graphs in Rust. For example, Idiomatic tree and graph like structures in Rust leads the discussion with a more complicated model involving reference counting and nodes that maintain references to neighbors. That model could be used with Gamma, but there wouldn't be much point. Other discussion suggests starting with a Typed Arena. Similar advice appears in this document. However, Gamma proves this approach to be unnecessary, at least within the scope of its requirements. Gamma likewise demonstrates vector indices to be unnecessary. Gamma brings the full power of the Rust compiler to bear for every use of Graph.

Future Work

Future work on Gamma will focus on two goals:

  1. Refinement of the Graph trait. It should be easy to implement graphs with a variety of different internal representations. Whether or not the current design achieves that goal remains to be seen.
  2. Implementation of graph algorithms, specifically: matching; subgraph isomorphism; and relevant cycle perception.

Conclusions

Gamma is a new graph library for Rust based around a streamlined, flexible Graph trait. The Gamma crate includes a reference Graph implementation along with two Iterator-based traversals. Future work will focus on adding graph algorithms useful for cheminformatics applications.