I started following Deep Learning Curriculum written by Jacob Hilton and here is what I learnt from the exercise in Topic 2 - Scaling Laws. My solution is written in Colab T2-ScalingLaws-solution.ipynb

It took me around 15 hours to finish the exercise. Throughout the process I learnt:

  1. How to vary the CNN width and training data to follow scaling laws experimentation set up.
  2. How to use Pytorch lighting learning rate finder to adjust the learning rate based on model size.
    1. use callbacks.LearningRateFinder from pytorch lighting and do some experimentation to find the proper minimum and maximum learning rate to search from. Plot the learning rate to make sure the result looks right.
  1. How the compute-efficient model size varies with compute.
    1. To approximate the relationship between compute and loss, we can use Cubic Root Function. We need to train more episodes to enable an accurate approximation.