There are three main factors that drive the advance of AI: algorithmic innovation (applying sequence modelling techniques, gradient descent etc), data (both structured and unstructured) and computing power available for training. Algorithmic innovation and data are difficult to track, but computing power is quantifiable, providing an opportunity to measure one input to AI progress. Infact, the use of massive computing sometimes just exposes the shortcomings of our current algorithms. But at least within many current domains, more compute seems to lead predictably to better performance, and is often complementary to algorithmic advances (the current tensorflow packages which provide advance analytics in the study of x-rays and high-end health care imaging techniques).
For this article, the relevant number is not the speed of a single GPU, nor the capacity of the biggest datacenter, but the amount of computing power that is used to train a single model — this is the number most likely to correlate to how powerful our best models are. Compute per model differs widely from total bulk compute because limits on parallelism have constrained how big a model can be or how much it can be usefully trained. Let us focus now on computing capability.
The current trend shows us an increase by roughly a factor of 10 each year. It’s been partly driven by custom hardware that allows more operations to be performed per second for a given price (GPUs and TPUs), but it’s been primarily propelled by finding ways to use more chips in parallel and being willing to pay the cost of doing so.
What next?
We see multiple reasons to believe that the trend could continue. Many hardware startups are developing AI-specific chips, some of which claim they will achieve a substantial increase in FLOPS/Watt (which is correlated to FLOPS/$) over the coming years. On the parallelism side, many of the recent algorithmic innovations in principle be combined multiplicatively — for example, architecture search and massively parallel SGD.
We believe the largest training runs today employ hardware that cost in the single digit millions of dollars to purchase, but the majority of neural net compute today is still spent on deployment, not training, meaning companies can repurpose or afford to purchase much larger fleets of chips for training. Therefore, if sufficient economic incentive exists, we could see even more massively parallel training runs, and thus the continuation of this trend for several more years. The world’s total hardware budget is 1 trillion dollars a year, so absolute limits remain far away. Overall, given the data above, the precedent for exponential trends in computing, work on ML specific hardware, and the economic incentives at play, we think it’d be a mistake to be confident this trend won’t continue in the short term.
Comments welcome…
Comments