Stanford DawnBench - CIFAR10

Goal: Achieve 94% validation accuracy on CIFAR10 Dataset in less than 100 secs on V100 GPU.

Challenges:

  • Due to lack of infrastructure, google colab was the only option available. To simulate the same environment, target is 94% validation accuracy in 600 secs on K80

Experiments:

Experiment 1: Trained a custom made ResNet9 Model using tensorflow.keras.
Results: Validation Accuracy: 92.2% Time: 1493 secs

Experiment 2: Added a slanted one cycle Learning rate with gradual drop towards the end.
Results: Validation Accuracy: 92.6% Time: 1502 secs

Experiment 3: Added Image Augmentation: FlipLR, RandomPadCrop(padding of 4) and Cutout(16x16).
Results: Validation Accuracy: 93.8% Time: 1627 secs

Experiment 4: Built a pipeline using tfRecords and enabled prefetch for CPU and GPU to work in parallel.
Results: Validation Accuracy: 93.8% Time: 741 secs

Experiment 5: Augmented the data and then stored in tfRecords.
Results: Validation Accuracy: 93.8% Time: 602 secs

Details:

  • Batch size: 512
  • Total Parameters: 8.9M (check params)
  • Learning Rate: MaxLR = 0.4 at epoch 5, MinLR=0.001 at epoch 20 and gradual drop 0.0001 at 24th epoch

Results:

  • Validation Accuracy 93.80%
  • Time: 602 secs
  • Epochs 24