Scaling-pilled

January 2025

Sam Altman made waves last week with his blog post reflecting on OpenAI’s trajectory. One line stood out:

We are now confident we know how to build AGI as we have traditionally understood it.

That’s an enormous claim. Should we take Sam at his word? It’s to his advantage to say the things that will make it easy for OpenAI to raise more money and attract more talent. However, I think there’s compelling evidence that AGI is within reach and perhaps superintelligence soon after.

The scaling laws and the scaling hypothesis

An empirical result about model performance is that it improves predictably as we increase three factors: compute, dataset size, and model size. As these inputs scale, loss (a measure of error) decreases following a power law. When you 10x any input, loss shrinks by a calculable fraction.

The scaling hypothesis builds on this: we can create AGI simply by scaling these inputs until models rival (AGI) or surpass human intelligence (superintelligence).

How Much Better Can AI Get?

According to scaling laws, when all three factors are 10x’ed, loss drops by about 40%. But what does this actually mean?

Loss measures how well a model predicts its training data. For language models, it’s the cross-entropy loss—the difference between what the model predicts and the true distribution of words. Lower loss means a model has internalized the underlying patterns and connections within its training data, enabling it to generalize effectively. It’s not just rote memorization; it’s understanding.

Take arithmetic: a model trained on "2 + 3 = 5" and "7 - 4 = 3" can initially appear to “memorize” individual equations. But as loss decreases, it generalizes the underlying rules of addition and subtraction. It can infer “8 + 6 = 14” without ever having seen it in training because it has learned the relationships between numbers and operations.

We have seen when going from smaller to larger models (think GPT3 to GPT4), lower loss correlates with emergent capabilities: reasoning, understanding, creativity, and generalization. Loss is not just a metric; it’s a proxy of intelligence. If scaling laws hold, we can continue to scale these models until they have intelligence that far surpasses humans.

Will scaling laws hold?

Neural networks are universal function approximators—they can model any relationship with enough data and parameters. Scaling provides these networks with the resources they need to move closer to this theoretical limit. Scaling laws, tested across nine orders of magnitude (OoMs) in compute, have held consistently.

With GPT-5 potentially being another 10x step, Sam may have confidence that scaling laws will hold for 10 OoMs. This isn’t unprecedented. Moore’s Law drove decades of exponential growth in compute, and scaling laws in AI follow similar trajectories.

There are concerns scaling might break down. Data availability is finite, but synthetic data generated by other models continues to improve training. The technical report of Phi-4 suggests synthetic data has advantages over human-generated data. Compute costs are steep, but GPUs and specialized hardware like ASICs are keeping efficiency gains alive. Even as Moore’s Law slows, the hardware supporting AI has advanced rapidly.

Should we scale?

Scaling is a bet on the nature of intelligence itself. It is the defining challenge of our era. As long as scaling laws hold, the only limit is our willingness to build.

The reason we should scale is simple: intelligence is the ultimate lever. The Industrial Revolution amplified human physical labor, driving unprecedented economic growth. Scaling AI does the same for intellectual labor, enabling breakthroughs in science, medicine, energy, and beyond. We can build machines of loving grace.

Can we scale?

Scaling by 1000x or more is entirely feasible. AI is driving unprecedented infrastructure investment. ASICs like TPUs will make compute more efficient. Expansion of nuclear energy and other renewables will slash the energy costs of training. Tech companies are already snapping up reactors like Three Mile Island to meet energy demand.

SizeH100sCostPower
~GPT-4~10k~$500M~10 MW
+1 OoMs~100k$1B+~100MW
+2 OoMs~1M$10B+~1 GW
+3 OoMs~10M$100B+~10 GW
+4 OoMs~100M$1T+~100GW
Calculations from the essay Situational Awareness

Why does it seem like we have hit a wall?

Scaling is not a smooth path. As models grow, the cost of training them grows exponentially. The compute needed to train GPT-4 is already in the hundreds of millions of dollars. The compute needed to train GPT-5 is in the billions. It has taken time to build the infrastructure to support training these models, and next gen models will require solving the challenge of building datacenters and power plants larger than anything we have seen before.

Another argument about why we haven't seen large improvements recently is that comapanies are keeping their next-gen models internal in order to use them to help train other models. This would explain why we haven't seen GPT-5 or the next Claude Opus yet, despite them ostensibly having the resouces to build them. While I think companies like OpenAI or Anthropic will be able to build models one more OoM in scale, I am skeptical that a single company can or should build a training cluster needed to create superintelligence.

Getting scaling-pilled

If superintelligence is to define the next chapter of humanity, we must approach its development with the ambition of a Manhattan Project. Scaling current models by 1000x or more will require power plants many times larger than the Three Gorges Dam and investments of potentially trillions of dollars in compute.

But this isn’t just a technological endeavor. Building superintelligence as a transparent, national effort would be patriotic—a signal of leadership and a commitment to progress. It would address fears of corporate or foreign control of AI while inspiring collective ambition, much like the Apollo program did.

I see many people dismissing the idea of superintelligence, seeing it as either impossible or stuff of science fiction. But we live in a time when rockets land themselves, machines etch trillions of transistors onto tiny wafers with light, and reactors harness the power of the sun. AGI isn’t some distant dream—it’s within reach. If we get more scaling-pilled as a country, we can build it first and soon.