Vanishing Gradient


Graph

Gradients used for weight updates during backpropagation, become extremely small as they travel back through layers, effectively halting learning in early layers