Stochastic Weight Averaging in Parallel: Massive-Batch Coaching that Generalizes Effectively
Authors: Vipul Gupta, Santiago Akle Serrano, Dennis DeCoste
Summary: We suggest Stochastic Weight Averaging in Parallel (SWAP), an algorithm to speed up DNN coaching. Our algorithm makes use of massive mini-batches to compute an approximate answer shortly after which refines it by averaging the weights of a number of fashions computed independently and in parallel. The ensuing fashions generalize equally properly as these skilled with small mini-batches however are produced in a considerably shorter time. We exhibit the discount in coaching time and the nice generalization efficiency of the ensuing fashions on the pc imaginative and prescient datasets CIFAR10, CIFAR100, and ImageNet