Statically typed red-black trees

I have met with claims that the static typing is useless as it is not mathematical proof (OK – for those knowing the discussion – I exaggerated a lot). However to large extend the static typing is prove of some sort and depending on the code it might be a prove of even quite strong properties.

As chosen field I tried to use LLRB trees – I have implemented them quite long time ago and I’ve run into problems that violated the pre and post conditions. Even when problem was discovered it took some time to find out the underlaying problems (my inexperience didn’t helped probably either)

I’ve tried to reimplement the code in GHC Haskell with following rules:

  • Using only ‘safe’ extensions
  • Making all functions total and without use of undefined or error
  • Ensuring that all properties of tree are preserved

Unfortunately this was my first time I used some of the concepts (like Zipper) so the implementation and description might be not the best (comments and corrections welcome).

I have used the code closer in structure to traditional implementations then the one described by Sedgewick. I don’t know the performance though as point was more to ensure the properties of the tree. It also includes both 2-3 and 2-4 implementation with latter one commented out.

While it took some time to implement it (for finding out how to convincing the type checker that (n + 1) + m = n + (m + 1)) it ensures the safety (and in total it have taken much less time then de debugging alone last time).

WARNING! The code was written in hurry and while I have learned and I haven’t had time to clean it. It might contain strange variable names etc. It is also not written as efficient implementation but rather a proof of concept.

{-# LANGUAGE DataKinds #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE FlexibleContexts #-}

Above code is simply bunch of extentions that are used

data Nat = Z | S Nat

type family Add (x :: Nat) (y :: Nat) :: Nat
type instance Add  Z     Z    = Z
type instance Add  Z     y    = y
type instance Add  x     Z    = x
type instance Add (S x) (S y) = S (S (Add x y))

DataKinds extension allows to promote a type to a kind (type of type). The above code is simply an encoding of natural numbers on type level with addition

data TaggedNat :: Nat -> * where
  TaggedNat :: TaggedNat n

class NatHelper (n :: Nat) where
  nat :: (Z ~ n => a) -> (forall n'. (S n' ~ n, NatHelper n') => TaggedNat n' -> a) -> TaggedNat n -> a

instance NatHelper Z where
  nat z _ _ = z

instance NatHelper n => NatHelper (S n) where
  nat _ s _ = s (TaggedNat :: TaggedNat n)

data ChainNat :: Nat -> * where
  ChainZero :: ChainNat Z
  ChainCons :: NatHelper n => ChainNat n -> ChainNat (S n)

toChain :: NatHelper n => TaggedNat n -> ChainNat n
toChain = nat ChainZero (ChainCons . toChain)

fromChain :: ChainNat n -> TaggedNat n
fromChain ChainZero = TaggedNat
fromChain (ChainCons _) = TaggedNat

prev :: NatHelper (S n) => TaggedNat (S n) -> (NatHelper n => TaggedNat n -> a) -> a
prev t x = case toChain t of
  ChainCons c -> x $ fromChain c

addEq :: forall a n' m'. (NatHelper n', NatHelper m') =>
         (Add n' (S m') ~ Add (S n') m' => a) ->
         TaggedNat n' -> TaggedNat m' ->
         a
addEq x n m = nat (nat x (const x) m) (\n' -> nat x (addEq x n') m) n

In general the type checker in Haskell is not a SMT solver (at least yet). The above code provides prove of 2 properties:

  • If we have S n (or n + 1) then n is a natural number.
  • Prove by induction (n + 1) + m = n + (m + 1). Unfortunately it currently is not optimized out despite my attempts to convince GHC by rules etc. that it is complex identity. If someone wanted to write the ‘real’ code it might be a good spot to fix as it changes the complexity.
data BlackNode :: Nat -> * -> * where
  Nil :: BlackNode Z a
  BBNode :: BlackNode n a -> a -> BlackNode n a -> BlackNode (S n) a
  RBNode :: RedNode n a -> a -> BlackNode n a -> BlackNode (S n) a
--  RRNode :: RedNode n a -> a -> RedNode n a -> BlackNode (S n) a

data RedNode :: Nat -> * -> * where
  RedNode :: BlackNode n a -> a -> BlackNode n a -> RedNode n a

The above code defines the nodes. The second type parameter denotes number of non-nil black nodes in all paths in subtree. It states that (for uncommented version):

  • For black trees the following rules apply:
    • Nil element is a black tree with no non-nil elements
    • If l, r are a black trees with n black non-nil nodes in all paths and v is a element (label) then BBNode l v r is black tree with n + 1 black non-nil nodes in all paths
    • If l(r) is a red tree (respectively black tree) with n black non-nil nodes in all paths and code>v is a element (label) then RBNode l v r is black tree with n + 1 black non-nil nodes in all paths
    • Nothing else is a black tree
  • For red trees the following rules apply:
    • If l, r are a black trees with n black non-nil nodes in all paths and v is a element (label) then RedNode l v r is red tree with n black non-nil nodes in all paths
    • Nothing else is a red tree

By construction it ensures the all red-black tree’s properties are kept and the tree is left-leaning. The tree, which would violate any property is not possible to construct.

data BlackNode' :: Nat -> Nat -> Nat -> * -> * where
  Top :: BlackNode' n Z n a
  BBNode'L :: (t ~ Add (S n) m) => BlackNode' (S n) m t a -> a -> BlackNode n a -> BlackNode' n (S m) t a
  BBNode'R :: (t ~ Add (S n) m) => BlackNode n a -> a -> BlackNode' (S n) m t a -> BlackNode' n (S m) t a
  RBNode'R :: (t ~ Add (S n) m) => RedNode n a -> a -> BlackNode' (S n) m t a -> BlackNode' n (S m) t a
  RedNode'L :: (t ~ Add n m) => RedNode' n m t a -> a -> BlackNode n a -> BlackNode' n m t a
  RedNode'R :: (t ~ Add n m) => BlackNode n a -> a -> RedNode' n m t a -> BlackNode' n m t a

data RedNode' :: Nat -> Nat -> Nat -> * -> * where
  RBNode'L :: (t ~ Add (S n) m) => BlackNode' (S n) m t a -> a -> BlackNode n a -> RedNode' n (S m) t a
--  RRNode'L :: (t ~ Add (S n) m) => BlackNode' (S n) m t a -> a -> RedNode n a -> RedNode' n (S m) t a
--  RRNode'R :: (t ~ Add (S n) m) => RedNode n a -> a -> BlackNode' (S n) m t a -> RedNode' n (S m) t a

Since the operation of insertion into tree needs to consider sometimes surrounding and we need to reconstruct the tree afterwards the notion of Zipper become handy. The simplest way of thinking about it is that if we view the tree from certain node we can split/cut the tree into subtree and tree with hole. Now we can represent the tree with hole as either empty suptree (the hole is the tree) or as tuple of sibling subtree, value at parent and tree with hole of 1 size bigger.

The parameters are used to keep the correct sizes of three, where first parameter (n) is number of black non-nil nodes in all paths in subtree, second (m) is number of black nodes in path to root and third (t) is number of black non-nil nodes in all paths in whole tree.

find :: (Ord a, NatHelper t) =>
        (forall n m. t ~ Add n m => BlackNode n a -> BlackNode' n m t a -> b) ->
        (forall n m. t ~ Add n m => RedNode n a -> RedNode' n m t a -> b) ->
        BlackNode t a ->
        a ->
        b
find fb fr n = find'B fb fr n Top

find'B :: forall a b n m t. (Ord a, t ~ Add n m, NatHelper n, NatHelper m) =>
         (forall n' m'. t ~ Add n' m' => BlackNode n' a -> BlackNode' n' m' t a -> b) ->
         (forall n' m'. t ~ Add n' m' => RedNode n' a -> RedNode' n' m' t a -> b) ->
         BlackNode n a ->
         BlackNode' n m t a ->
         a ->
         b 
find'B fb _      Nil           ctx _  = fb Nil ctx
find'B fb fr bn@(BBNode l v r) ctx v' = case compare v' v of
  LT -> prev (TaggedNat :: TaggedNat n) $ \n' -> addEq (find'B fb fr l (BBNode'L ctx v r) v') n' (TaggedNat :: TaggedNat m)
  EQ -> fb bn ctx
  GT -> prev (TaggedNat :: TaggedNat n) $ \n' -> addEq (find'B fb fr r (BBNode'R l v ctx) v') n' (TaggedNat :: TaggedNat m)
find'B fb fr bn@(RBNode l v r) ctx v' = case compare v' v of
  LT -> prev (TaggedNat :: TaggedNat n) $ \n' -> addEq (find'R fb fr l (RBNode'L ctx v r) v') n' (TaggedNat :: TaggedNat m)
  EQ -> fb bn ctx
  GT -> prev (TaggedNat :: TaggedNat n) $ \n' -> addEq (find'B fb fr r (RBNode'R l v ctx) v') n' (TaggedNat :: TaggedNat m)
--find'B fb fr bn@(RRNode l v r) ctx v' = case compare v' v of
--  LT -> prev (TaggedNat :: TaggedNat n) $ \n' -> addEq (find'R fb fr l (RRNode'L ctx v r) v') n' (TaggedNat :: TaggedNat m)
--  EQ -> fb bn ctx
--  GT -> prev (TaggedNat :: TaggedNat n) $ \n' -> addEq (find'R fb fr r (RRNode'R l v ctx) v') n' (TaggedNat :: TaggedNat m)

find'R :: (Ord a, t ~ Add n m, NatHelper n, NatHelper m) =>
         (forall n' m'. t ~ Add n' m' => BlackNode n' a -> BlackNode' n' m' t a -> b) ->
         (forall n' m'. t ~ Add n' m' => RedNode n' a -> RedNode' n' m' t a -> b) ->
         RedNode n a ->
         RedNode' n m t a ->
         a ->
         b 
find'R fb fr rn@(RedNode l v r) ctx v' = case compare v' v of
  LT -> find'B fb fr l (RedNode'L ctx v r) v'
  EQ -> fr rn ctx
  GT -> find'B fb fr r (RedNode'R l v ctx) v'

Find allows to get the location (node and context) of element labelled by given value. Unfortunately the black case requires additional bookkeeping as type checker is not a SMT solver.

It is generalized function which can be used for implementation of contains, insert, delete etc. The insert function finds location to add – if it is not Nil then value is already present in tree and it aborts. Otherwise it creates a simple red tree and tries to plug it into the hole. There are four cases of plugging in (2 colors of hole x 2 colors of current subtree) starting with putting the red subtree into a black hole created by Nil.

insert' :: forall a t. (Ord a, NatHelper t) => a -> BlackNode t a -> (Bool, Either (BlackNode t a) (BlackNode (S t) a))
insert' a n = find (\n' ctx -> case n' of
                       Nil -> (True, mergeRB (RedNode Nil a Nil) ctx)
                       _ -> (False, Left $! n)) (\_ _ -> (False, Left $! n)) n a

The simplest cases are plugging red subtree into red hole or black subtree into black hole. They are straightforward operations which simply reconstruct tree from location.

mergeRR :: RedNode n a -> RedNode' n m t a -> Either (BlackNode t a) (BlackNode (S t) a)
mergeRR l (RBNode'L ctx v r) = mergeBB (RBNode l v r) ctx
--mergeRR l (RRNode'L ctx v r) = mergeBB (RRNode l v r) ctx
--mergeRR r (RRNode'R l v ctx) = mergeBB (RRNode l v r) ctx

mergeBB :: BlackNode n a -> BlackNode' n m t a -> Either (BlackNode t a) (BlackNode (S t) a)
mergeBB t  Top                = Left $! t
mergeBB l (BBNode'L  ctx v r) = mergeBB (BBNode l v r) ctx
mergeBB r (BBNode'R  l v ctx) = mergeBB (BBNode l v r) ctx
mergeBB r (RBNode'R  l v ctx) = mergeBB (RBNode l v r) ctx
mergeBB l (RedNode'L ctx v r) = mergeRR (RedNode l v r) ctx
mergeBB r (RedNode'R l v ctx) = mergeRR (RedNode l v r) ctx

mergeBR :: BlackNode n a -> RedNode' n m t a -> Either (BlackNode t a) (BlackNode (S t) a)
mergeBR l (RBNode'L ctx v r)                  = mergeBB (BBNode l v r) ctx
--mergeBR l (RRNode'L ctx v (RedNode l' v' r')) = mergeBB (RBNode (RedNode l v l') v' r') ctx
--mergeBR r (RRNode'R l v ctx)                  = mergeBB (RBNode l v r) ctx

blacken :: RedNode n a -> BlackNode (S n) a
blacken (RedNode l v r) = BBNode l v r

mergeRB :: RedNode n a -> BlackNode' n m t a -> Either (BlackNode t a) (BlackNode (S t) a)
mergeRB t                   Top                                = Right $! blacken t
mergeRB l                  (BBNode'L ctx v r)                  = mergeBB (RBNode l v r) ctx
mergeRB (RedNode l' v' r') (BBNode'R l v ctx)                  = mergeBB (RBNode (RedNode l v l') v' r') ctx
mergeRB r                  (RBNode'R l v ctx)                  = mergeRB (RedNode (blacken l) v (blacken r)) ctx
mergeRB l                  (RedNode'L (RBNode'L ctx v' u) v r) = mergeRB (RedNode (blacken l) v (BBNode r v' u)) ctx
--mergeRB r                  (RBNode'R l v ctx)                  = mergeBB (RRNode l v r) ctx
--mergeRB l                  (RedNode'L (RBNode'L ctx v' u) v r) = mergeBB (RRNode l v (RedNode r v' u)) ctx
--mergeRB l                  (RedNode'L (RRNode'L ctx v' u) v r) = mergeRB (RedNode (RBNode l v r) v' (blacken u)) ctx
--mergeRB l                  (RedNode'L (RRNode'R u v' ctx) v r) = mergeRB (RedNode (blacken u) v' (RBNode l v r)) ctx
mergeRB (RedNode l' v' r') (RedNode'R l v ctx)                 = mergeRB (RedNode l v l') (RedNode'L ctx v' r')

Since it would be inconvenient for user to keep the height of tree on type level it is sufficient to provide him or her with opaque wrapper.

data RBTree a where
  RBTree :: NatHelper t => BlackNode t a -> RBTree a

empty :: RBTree a
empty = RBTree $ Nil

insert :: Ord a => a -> RBTree a -> RBTree a
insert v (RBTree t) = let (_, r) = insert' v t
                      in either RBTree RBTree r

contains :: Ord a => a -> RBTree a -> Bool
contains a (RBTree n) = find (\n' _ -> case n' of
                                 Nil -> False
                                 _ -> True) (\_ _ -> True) n a

While I can see that dynamic typing may have advantages in certain areas (say monkey patching plug-ins – like in Gnome Shell or Firefox – although it have some drawbacks too) static typing can be very useful too.

About these ads
This entry was posted in Programming and tagged , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s