…desugars to…
data Ann = Ann { ... }
data Exp a where
Const :: Ann -> a -> Exp a
Var :: Ann -> Ident t -> Exp t
Pair :: Ann -> Exp a -> Exp b -> Exp (a, b)
...
Rewriting in-place Bottom-to-top modification order
Deferred modification
Top-to-bottom modification order
Doesn't rewrite every term
Regular free functions | GHC Call Stacks |
Existing type classes | RTS Execution Stacks |
Pattern synonyms | GHC Call Stacks plus some trickery |
main :: IO ()
main = foo
foo :: IO ()
foo = bar -- Compilation error: Unbound implicit parameter
foo' :: IO ()
foo' = sourceMap bar -- Runtime error: No HasCallStack
foo'' :: HasCallStack => IO ()
foo'' = sourceMap bar -- Works!
bar :: SourceMapped => IO ()
bar = print callStack
Now to combine everything
mkAnn :: SourceMapped => Ann
mkAnn = Ann { locations = capture ?callStack, ... }
where
capture = ...
constant :: HasCallStack => a -> Exp a
constant = sourceMap $ Const mkAnn
Recall the subtree annotations from earlier.
factorial :: (Num a, Ord a, Typeable a) => Exp a -> Exp a
factorial n = context "factorial" $
letfn (\factorial' n' -> cond (n' <= 1)
1
(n' * factorial' (n' - 1)))
(\factorial' -> factorial' n)
Accelerate's LLVM backend already supports the Tracy frame profiler
Adds (many) floating point numbers while compensating for floating point rounding errors
Which makes it a good candidate for experimenting with loop unrolling...
...and the average runtime on the CPU backend with a 12-core 24-thread Ryzen 9 5900x went up from an 8.65 seconds average to a 9.60 second average after unrolling eight times.
According to a one minute run under Cachegrind:
Under the GPU backend with an RTX 2080 SUPER the program became (statistically) significantly faster:
But more importantly, the annotation system serves as a foundation for new areas of exploration.