GitPedia

Backprop

Heterogeneous automatic differentiation ("backpropagation") in Haskell

From mstksg·Updated May 12, 2026·View on GitHub·

Write your functions to compute your result, and the library will automatically generate functions to compute your gradient. The project is written primarily in Haskell, distributed under the BSD 3-Clause "New" or "Revised" License license, first published in 2017. Key topics include: automatic-differentiation, backprop, backpropagation, deep-learning, gradient-descent.

backprop

backprop on Hackage
backprop on Stackage LTS 11
backprop on Stackage Nightly
Build Status

Join the chat at https://gitter.im/haskell-backprop/Lobby

Documentation and Walkthrough

Automatic heterogeneous back-propagation.

Write your functions to compute your result, and the library will automatically
generate functions to compute your gradient.

Differs from ad by offering full heterogeneity -- each intermediate step
and the resulting value can have different types (matrices, vectors, scalars,
lists, etc.).

Useful for applications in differentiable programming and deep learning
for creating and training numerical models, especially as described in this
blog post on a purely functional typed approach to trainable models.
Overall, intended for the implementation of gradient descent and other numeric
optimization techniques. Comparable to the python library autograd.

Currently up on hackage, with haddock documentation! However, a proper
library introduction and usage tutorial is available here. See also my
introductory blog post. You can also find help or support on the
gitter channel.

If you want to provide backprop for users of your library, see this guide
to equipping your library with backprop
.

MNIST Digit Classifier Example

My blog post introduces the concepts in this library in the context of
training a handwritten digit classifier. I recommend reading that first.

There are some literate haskell examples in the source, though
(rendered as pdf here), which can be built (if stack is
installed) using:

bash
$ ./Build.hs exe

There is a follow-up tutorial on using the library with more advanced types,
with extensible neural networks a la this blog post, available as
literate haskell
and also rendered as a PDF.

Brief example

(This is a really brief version of the documentation walkthrough and my
blog post)

The quick example below describes the running of a neural network with one
hidden layer to calculate its squared error with respect to target targ,
which is parameterized by two weight matrices and two bias vectors.
Vector/matrix types are from the hmatrix package.

Let's make a data type to store our parameters, with convenient accessors using
lens:

haskell
import Numeric.LinearAlgebra.Static.Backprop data Network = Net { _weight1 :: L 20 100 , _bias1 :: R 20 , _weight2 :: L 5 20 , _bias2 :: R 5 } makeLenses ''Network

(R n is an n-length vector, L m n is an m-by-n matrix, etc., #> is
matrix-vector multiplication)

"Running" a network on an input vector might look like this:

haskell
runNet net x = z where y = logistic $ (net ^^. weight1) #> x + (net ^^. bias1) z = logistic $ (net ^^. weight2) #> y + (net ^^. bias2) logistic :: Floating a => a -> a logistic x = 1 / (1 + exp (-x))

And that's it! neuralNet is now backpropagatable!

We can "run" it using evalBP:

haskell
evalBP2 runNet :: Network -> R 100 -> R 5

If we write a function to compute errors:

haskell
squaredError target output = error `dot` error where error = target - output

we can "test" our networks:

haskell
netError target input net = squaredError (auto target) (runNet net (auto input))

This can be run, again:

haskell
evalBP (netError myTarget myVector) :: Network -> Double

Now, we just wrote a normal function to compute the error of our network.
With the backprop library, we now also have a way to compute the gradient,
as well!

haskell
gradBP (netError myTarget myVector) :: Network -> Network

Now, we can perform gradient descent!

haskell
gradDescent :: R 100 -> R 5 -> Network -> Network gradDescent x targ n0 = n0 - 0.1 * gradient where gradient = gradBP (netError targ x) n0

Ta dah! We were able to compute the gradient of our error function, just by
only saying how to compute the error itself.

For a more fleshed out example, see the documentaiton, my blog
post
and the MNIST tutorial (also rendered as a
pdf
)

Benchmarks and Performance

Here are some basic benchmarks comparing the library's automatic
differentiation process to "manual" differentiation by hand. When using the
MNIST tutorial as an example:

benchmarks

Here we compare:

  1. "Manual" differentiation of a 784 x 300 x 100 x 10 fully-connected
    feed-forward ANN.
  2. Automatic differentiation using backprop and the lens-based accessor
    interface
  3. Automatic differentiation using backprop and the "higher-kinded
    data"-based pattern matching interface
  4. A hybrid approach that manually provides gradients for individual layers
    but uses automatic differentiation for chaining the layers together.

We can see that simply running the network and functions (using evalBP)
incurs virtually zero overhead. This means that library authors could actually
export only backprop-lifted functions, and users would be able to use them
without losing any performance.

As for computing gradients, there exists some associated overhead, from three
main sources. Of these, the building of the computational graph and the
Wengert Tape wind up being negligible. For more information, see a detailed
look at performance, overhead, and optimization techniques
in the
documentation.

Note that the manual and hybrid modes almost overlap in the range of their
random variances.

Comparisons

backprop can be compared and contrasted to many other similar libraries with
some overlap:

  1. The ad library (and variants like diffhask) support automatic
    differentiation, but only for homogeneous/monomorphic situations. All
    values in a computation must be of the same type --- so, your computation
    might be the manipulation of Doubles through a Double -> Double
    function.

    backprop allows you to mix matrices, vectors, doubles, integers, and even
    key-value maps as a part of your computation, and they will all be
    backpropagated properly with the help of the Backprop typeclass.

  2. The autograd library is a very close equivalent to backprop,
    implemented in Python for Python applications. The difference between
    backprop and autograd is mostly the difference between Haskell and
    Python --- static types with type inference, purity, etc.

  3. There is a link between backprop and deep learning/neural network
    libraries like tensorflow, caffe, and theano, which all
    support some form of heterogeneous automatic differentiation. Haskell
    libraries doing similar things include grenade.

    These are all frameworks for working with neural networks or other
    gradient-based optimizations --- they include things like built-in
    optimizers, methods to automate training data, built-in models to use out
    of the box. backprop could be used as a part of such a framework, like
    I described in my A Purely Functional Typed Approach to Trainable
    Models
    blog series; however, the backprop library itself does
    not provide any built in models or optimizers or automated data processing
    pipelines.

See documentation for a more detailed look.

Todo

  1. Benchmark against competing back-propagation libraries like ad, and
    auto-differentiating tensor libraries like grenade

  2. Write tests!

  3. Explore opportunities for parallelization. There are some naive ways of
    directly parallelizing right now, but potential overhead should be
    investigated.

  4. Some open questions:

    a. Is it possible to support constructors with existential types?

    b. How to support "monadic" operations that depend on results of previous
    operations? (ApBP already exists for situations that don't)

    c. What needs to be done to allow us to automatically do second,
    third-order differentiation, as well? This might be useful for certain
    ODE solvers which rely on second order gradients and hessians.

Contributors

Showing top 5 contributors by commit count.

View all contributors on GitHub →

This article is auto-generated from mstksg/backprop via the GitHub API.Last fetched: 6/14/2026