+1 for Karpathy's guide. It's a mini-book - hands-on, extensive and guided by his personal philosophy of knowing all the nitty gritty math details so you know how to optimize your scripts even if you use many of the available libraries
This paper appears to be from 1998 [0]. No judgment on its quality; I'm just trying to provide a reference for other readers of the post.
[0]: A.C.C. Coolen, in ‘Concepts for Neural Networks - A Survey’ (Springer 1998; eds. L.J. Landau and J.G. Taylor), 13-70
‘A Beginner’s Guide to the Mathematics of Neural Networks’
1. The goal of the paper seems to be to take the biological metaphor as far as it can, and to make the strongest possible case that it biology may work this way.
(Neuroscience research was where the action was in the 90s so this makes sense).
2. As far as I know (last time I had the pulse of the field) there is no biological equivalent of back-propagation. Does anyone know if this is still the case? i.e. there is no circuitry for a signal to travel in the opposite direction in a nerve.
I would caution that, in general, you can find "evidence" in the literature for literally anything you could dream up when it comes to neuroscience though. So who knows in this case.
The key to understanding is drilling the backpropagation algorithm and being able to visualize the application of the multivariate chain rule as a computational graph.
EDIT: You won't understand until you do this yourself using pen and paper. It's a pain.
The derivation is a pain -- there's a lot of notation and indexes to keep track of.
It might be an easier first step for someone starting out to derive the gradient terms for the cost function for logistic regression since it can be viewed as a classification neural net without the hidden layer(s).
The same author has newer book online
Theory of Neural Information Processing Systems (2005) A.C.C. Coolen, R. Kuehn, and P. Sollich http://www-thphys.physics.ox.ac.uk/people/AlexanderSherstnev...