I have met with claims that the static typing is useless as it is not mathematical proof (OK – for those knowing the discussion – I exaggerated a lot). However to large extend the static typing is prove of some sort and depending on the code it might be a prove of even quite strong properties.

As chosen field I tried to use LLRB trees – I have implemented them quite long time ago and I’ve run into problems that violated the pre and post conditions. Even when problem was discovered it took some time to find out the underlaying problems (my inexperience didn’t helped probably either)

I’ve tried to reimplement the code in GHC Haskell with following rules:

- Using only ‘safe’ extensions
- Making all functions total and without use of
`undefined`

or`error`

- Ensuring that all properties of tree are preserved

Unfortunately this was my first time I used some of the concepts (like Zipper) so the implementation and description might be not the best (comments and corrections welcome).

I have used the code closer in structure to traditional implementations then the one described by Sedgewick. I don’t know the performance though as point was more to ensure the properties of the tree. It also includes both 2-3 and 2-4 implementation with latter one commented out.

While it took some time to implement it (for finding out how to convincing the type checker that (n + 1) + m = n + (m + 1)) it ensures the safety (and in total it have taken much less time then de debugging alone last time).

**WARNING!** The code was written in hurry and while I have learned and I haven’t had time to clean it. It might contain strange variable names etc. It is also not written as efficient implementation but rather a proof of concept.

{-# LANGUAGE DataKinds #-} {-# LANGUAGE GADTs #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE RankNTypes #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE FlexibleContexts #-}

Above code is simply bunch of extentions that are used

data Nat = Z | S Nat type family Add (x :: Nat) (y :: Nat) :: Nat type instance Add Z Z = Z type instance Add Z y = y type instance Add x Z = x type instance Add (S x) (S y) = S (S (Add x y))

`DataKinds`

extension allows to promote a type to a kind (type of type). The above code is simply an encoding of natural numbers on type level with addition

data TaggedNat :: Nat -> * where TaggedNat :: TaggedNat n class NatHelper (n :: Nat) where nat :: (Z ~ n => a) -> (forall n'. (S n' ~ n, NatHelper n') => TaggedNat n' -> a) -> TaggedNat n -> a instance NatHelper Z where nat z _ _ = z instance NatHelper n => NatHelper (S n) where nat _ s _ = s (TaggedNat :: TaggedNat n) data ChainNat :: Nat -> * where ChainZero :: ChainNat Z ChainCons :: NatHelper n => ChainNat n -> ChainNat (S n) toChain :: NatHelper n => TaggedNat n -> ChainNat n toChain = nat ChainZero (ChainCons . toChain) fromChain :: ChainNat n -> TaggedNat n fromChain ChainZero = TaggedNat fromChain (ChainCons _) = TaggedNat prev :: NatHelper (S n) => TaggedNat (S n) -> (NatHelper n => TaggedNat n -> a) -> a prev t x = case toChain t of ChainCons c -> x $ fromChain c addEq :: forall a n' m'. (NatHelper n', NatHelper m') => (Add n' (S m') ~ Add (S n') m' => a) -> TaggedNat n' -> TaggedNat m' -> a addEq x n m = nat (nat x (const x) m) (\n' -> nat x (addEq x n') m) n

In general the type checker in Haskell is not a SMT solver (at least yet). The above code provides prove of 2 properties:

- If we have
`S n`

(or n + 1) then n is a natural number. - Prove by induction (n + 1) + m = n + (m + 1). Unfortunately it currently is not optimized out despite my attempts to convince GHC by rules etc. that it is complex identity. If someone wanted to write the ‘real’ code it might be a good spot to fix as it changes the complexity.

data BlackNode :: Nat -> * -> * where Nil :: BlackNode Z a BBNode :: BlackNode n a -> a -> BlackNode n a -> BlackNode (S n) a RBNode :: RedNode n a -> a -> BlackNode n a -> BlackNode (S n) a -- RRNode :: RedNode n a -> a -> RedNode n a -> BlackNode (S n) a data RedNode :: Nat -> * -> * where RedNode :: BlackNode n a -> a -> BlackNode n a -> RedNode n a

The above code defines the nodes. The second type parameter denotes number of non-nil black nodes in all paths in subtree. It states that (for uncommented version):

- For black trees the following rules apply:
- Nil element is a black tree with no non-nil elements
- If
`l, r`

are a black trees with`n`

black non-nil nodes in all paths and`v`

is a element (label) then`BBNode l v r`

is black tree with`n + 1`

black non-nil nodes in all paths - If
`l`

(`r`

) is a red tree (respectively black tree) with`n`

black non-nil nodes in all paths and code>v is a element (label) then`RBNode l v r`

is black tree with`n + 1`

black non-nil nodes in all paths - Nothing else is a black tree

- For red trees the following rules apply:
- If
`l, r`

are a black trees with`n`

black non-nil nodes in all paths and`v`

is a element (label) then`RedNode l v r`

is red tree with`n`

black non-nil nodes in all paths - Nothing else is a red tree

- If

By construction it ensures the all red-black tree’s properties are kept and the tree is left-leaning. The tree, which would violate any property is not possible to construct.

data BlackNode' :: Nat -> Nat -> Nat -> * -> * where Top :: BlackNode' n Z n a BBNode'L :: (t ~ Add (S n) m) => BlackNode' (S n) m t a -> a -> BlackNode n a -> BlackNode' n (S m) t a BBNode'R :: (t ~ Add (S n) m) => BlackNode n a -> a -> BlackNode' (S n) m t a -> BlackNode' n (S m) t a RBNode'R :: (t ~ Add (S n) m) => RedNode n a -> a -> BlackNode' (S n) m t a -> BlackNode' n (S m) t a RedNode'L :: (t ~ Add n m) => RedNode' n m t a -> a -> BlackNode n a -> BlackNode' n m t a RedNode'R :: (t ~ Add n m) => BlackNode n a -> a -> RedNode' n m t a -> BlackNode' n m t a data RedNode' :: Nat -> Nat -> Nat -> * -> * where RBNode'L :: (t ~ Add (S n) m) => BlackNode' (S n) m t a -> a -> BlackNode n a -> RedNode' n (S m) t a -- RRNode'L :: (t ~ Add (S n) m) => BlackNode' (S n) m t a -> a -> RedNode n a -> RedNode' n (S m) t a -- RRNode'R :: (t ~ Add (S n) m) => RedNode n a -> a -> BlackNode' (S n) m t a -> RedNode' n (S m) t a

Since the operation of insertion into tree needs to consider sometimes surrounding and we need to reconstruct the tree afterwards the notion of Zipper become handy. The simplest way of thinking about it is that if we view the tree from certain node we can split/cut the tree into subtree and tree with hole. Now we can represent the tree with hole as either empty suptree (the hole is the tree) or as tuple of sibling subtree, value at parent and tree with hole of 1 size bigger.

The parameters are used to keep the correct sizes of three, where first parameter (`n`

) is number of black non-nil nodes in all paths in subtree, second (`m`

) is number of black nodes in path to root and third (`t`

) is number of black non-nil nodes in all paths in whole tree.

find :: (Ord a, NatHelper t) => (forall n m. t ~ Add n m => BlackNode n a -> BlackNode' n m t a -> b) -> (forall n m. t ~ Add n m => RedNode n a -> RedNode' n m t a -> b) -> BlackNode t a -> a -> b find fb fr n = find'B fb fr n Top find'B :: forall a b n m t. (Ord a, t ~ Add n m, NatHelper n, NatHelper m) => (forall n' m'. t ~ Add n' m' => BlackNode n' a -> BlackNode' n' m' t a -> b) -> (forall n' m'. t ~ Add n' m' => RedNode n' a -> RedNode' n' m' t a -> b) -> BlackNode n a -> BlackNode' n m t a -> a -> b find'B fb _ Nil ctx _ = fb Nil ctx find'B fb fr bn@(BBNode l v r) ctx v' = case compare v' v of LT -> prev (TaggedNat :: TaggedNat n) $ \n' -> addEq (find'B fb fr l (BBNode'L ctx v r) v') n' (TaggedNat :: TaggedNat m) EQ -> fb bn ctx GT -> prev (TaggedNat :: TaggedNat n) $ \n' -> addEq (find'B fb fr r (BBNode'R l v ctx) v') n' (TaggedNat :: TaggedNat m) find'B fb fr bn@(RBNode l v r) ctx v' = case compare v' v of LT -> prev (TaggedNat :: TaggedNat n) $ \n' -> addEq (find'R fb fr l (RBNode'L ctx v r) v') n' (TaggedNat :: TaggedNat m) EQ -> fb bn ctx GT -> prev (TaggedNat :: TaggedNat n) $ \n' -> addEq (find'B fb fr r (RBNode'R l v ctx) v') n' (TaggedNat :: TaggedNat m) --find'B fb fr bn@(RRNode l v r) ctx v' = case compare v' v of -- LT -> prev (TaggedNat :: TaggedNat n) $ \n' -> addEq (find'R fb fr l (RRNode'L ctx v r) v') n' (TaggedNat :: TaggedNat m) -- EQ -> fb bn ctx -- GT -> prev (TaggedNat :: TaggedNat n) $ \n' -> addEq (find'R fb fr r (RRNode'R l v ctx) v') n' (TaggedNat :: TaggedNat m) find'R :: (Ord a, t ~ Add n m, NatHelper n, NatHelper m) => (forall n' m'. t ~ Add n' m' => BlackNode n' a -> BlackNode' n' m' t a -> b) -> (forall n' m'. t ~ Add n' m' => RedNode n' a -> RedNode' n' m' t a -> b) -> RedNode n a -> RedNode' n m t a -> a -> b find'R fb fr rn@(RedNode l v r) ctx v' = case compare v' v of LT -> find'B fb fr l (RedNode'L ctx v r) v' EQ -> fr rn ctx GT -> find'B fb fr r (RedNode'R l v ctx) v'

Find allows to get the location (node and context) of element labelled by given value. Unfortunately the black case requires additional bookkeeping as type checker is not a SMT solver.

It is generalized function which can be used for implementation of contains, insert, delete etc. The insert function finds location to add – if it is not `Nil`

then value is already present in tree and it aborts. Otherwise it creates a simple red tree and tries to plug it into the hole. There are four cases of plugging in (2 colors of hole x 2 colors of current subtree) starting with putting the red subtree into a black hole created by `Nil`

.

insert' :: forall a t. (Ord a, NatHelper t) => a -> BlackNode t a -> (Bool, Either (BlackNode t a) (BlackNode (S t) a)) insert' a n = find (\n' ctx -> case n' of Nil -> (True, mergeRB (RedNode Nil a Nil) ctx) _ -> (False, Left $! n)) (\_ _ -> (False, Left $! n)) n a

The simplest cases are plugging red subtree into red hole or black subtree into black hole. They are straightforward operations which simply reconstruct tree from location.

mergeRR :: RedNode n a -> RedNode' n m t a -> Either (BlackNode t a) (BlackNode (S t) a) mergeRR l (RBNode'L ctx v r) = mergeBB (RBNode l v r) ctx --mergeRR l (RRNode'L ctx v r) = mergeBB (RRNode l v r) ctx --mergeRR r (RRNode'R l v ctx) = mergeBB (RRNode l v r) ctx mergeBB :: BlackNode n a -> BlackNode' n m t a -> Either (BlackNode t a) (BlackNode (S t) a) mergeBB t Top = Left $! t mergeBB l (BBNode'L ctx v r) = mergeBB (BBNode l v r) ctx mergeBB r (BBNode'R l v ctx) = mergeBB (BBNode l v r) ctx mergeBB r (RBNode'R l v ctx) = mergeBB (RBNode l v r) ctx mergeBB l (RedNode'L ctx v r) = mergeRR (RedNode l v r) ctx mergeBB r (RedNode'R l v ctx) = mergeRR (RedNode l v r) ctx mergeBR :: BlackNode n a -> RedNode' n m t a -> Either (BlackNode t a) (BlackNode (S t) a) mergeBR l (RBNode'L ctx v r) = mergeBB (BBNode l v r) ctx --mergeBR l (RRNode'L ctx v (RedNode l' v' r')) = mergeBB (RBNode (RedNode l v l') v' r') ctx --mergeBR r (RRNode'R l v ctx) = mergeBB (RBNode l v r) ctx blacken :: RedNode n a -> BlackNode (S n) a blacken (RedNode l v r) = BBNode l v r mergeRB :: RedNode n a -> BlackNode' n m t a -> Either (BlackNode t a) (BlackNode (S t) a) mergeRB t Top = Right $! blacken t mergeRB l (BBNode'L ctx v r) = mergeBB (RBNode l v r) ctx mergeRB (RedNode l' v' r') (BBNode'R l v ctx) = mergeBB (RBNode (RedNode l v l') v' r') ctx mergeRB r (RBNode'R l v ctx) = mergeRB (RedNode (blacken l) v (blacken r)) ctx mergeRB l (RedNode'L (RBNode'L ctx v' u) v r) = mergeRB (RedNode (blacken l) v (BBNode r v' u)) ctx --mergeRB r (RBNode'R l v ctx) = mergeBB (RRNode l v r) ctx --mergeRB l (RedNode'L (RBNode'L ctx v' u) v r) = mergeBB (RRNode l v (RedNode r v' u)) ctx --mergeRB l (RedNode'L (RRNode'L ctx v' u) v r) = mergeRB (RedNode (RBNode l v r) v' (blacken u)) ctx --mergeRB l (RedNode'L (RRNode'R u v' ctx) v r) = mergeRB (RedNode (blacken u) v' (RBNode l v r)) ctx mergeRB (RedNode l' v' r') (RedNode'R l v ctx) = mergeRB (RedNode l v l') (RedNode'L ctx v' r')

Since it would be inconvenient for user to keep the height of tree on type level it is sufficient to provide him or her with opaque wrapper.

data RBTree a where RBTree :: NatHelper t => BlackNode t a -> RBTree a empty :: RBTree a empty = RBTree $ Nil insert :: Ord a => a -> RBTree a -> RBTree a insert v (RBTree t) = let (_, r) = insert' v t in either RBTree RBTree r contains :: Ord a => a -> RBTree a -> Bool contains a (RBTree n) = find (\n' _ -> case n' of Nil -> False _ -> True) (\_ _ -> True) n a

While I can see that dynamic typing may have advantages in certain areas (say monkey patching plug-ins – like in Gnome Shell or Firefox – although it have some drawbacks too) static typing can be very useful too.