Differentiable Hebbian Consolidation for Continual Lifelong Learning
Catastrophic forgetting poses a grand challenge for continual learning systems. It prevents neural network models from protecting previously learned knowledge while learning new tasks in a sequential manner. As a result, neural network models that are deployed in the real world often struggle in scenarios where the data distribution is non-stationary (concept drift), imbalanced, or not always fully available, i.e., rare or novel edge cases. In this thesis, we propose a Differentiable Hebbian Consolidation model which replaces the traditional softmax layer with a Differentiable Hebbian Plasticity (DHP) Softmax that adds a fast learning plastic component to the fixed (slowly changing) parameters of the softmax output layer. Similar to the hippocampal system in Complementary Learning Systems (CLS) theory, the DHP Softmax behaves as a compressed episodic memory that reactivates existing long-term memory traces, while simultaneously creating new short-term memories. We demonstrate the flexibility of our approach by combining our model with existing well-known task-specific synaptic consolidation methods to penalize changes in the slow weights that are important for each target task. We evaluate our approach on the Permuted MNIST, Split MNIST and Vision Datasets Mixture benchmark problems, and introduce an imbalanced variant of Permuted MNIST --- a dataset that combines the challenges of class imbalance and concept drift. Our proposed model requires no additional hyperparameters and outperforms comparable baselines by reducing forgetting.