Takeoff Forecast
Daniel Kokotajlo, Eli Lifland | April 2025
ai-2027.comOverview of our takeoff forecast, assuming no increases in training compute. Our median forecast for the time from the superhuman coder milestone (achieved in Mar 2027) to artificial superintelligence is ~1 year, with wide error margins.
Summary
In our timelines forecast, we forecast the time between present day and a superhuman coder (SC): an AI system that can do any coding tasks that the best AGI company engineer does, while being much faster and cheaper. In this forecast we condition on SC being achieved in March 2027, which is the date it’s achieved in our scenario.
Now, we will forecast takeoff: the time between a superhuman coder and wildly superhuman capabilities. The superhuman coders and beyond will automate a large fraction of the AI R&D needed to traverse this gap.
Our methodology focuses on predicting the viability and speed of a software-driven intelligence explosion, in which there is vast improvement in AI capabilities on the scale of months-years primarily driven by using compute more efficiently (improved software), rather than more training or inference compute. First, we enumerate a progression of AI capability milestones, with a focus on AI R&D capabilities, though we think general capabilities will also be improving. Then, for each gap between milestones A and B, we:
Human-only time required: Forecast a distribution for how long it would take to get from A to B with only humans working on software improvements.
AI R&D speedup: Forecast how much AI R&D automation due to each of milestones A and B will speed up progress, then run a simulation in which the speedup is interpolated between these speedups over time to get a forecasted distribution for the calendar time between A and B (the AI R&D progress multiplier).
Our forecasts conditional on SC being achieved in Mar 2027 are summarized in the figures and table below. These are assuming no increases in training compute.
ai-2027.com
Milestone | Projected date conditional on SC in Mar 2027 (median + 80% CI) | Date achieved in scenario, racing ending | Human-only, software-only time until next milestone (median + 80% CI) | AI R&D progress multiplier: Algorithmic progress speedup from AI vs. humans-only |
---|---|---|---|---|
Superhuman coder (SC): An AI system that can do the job of the best human coder on tasks involved in AI research but faster, and cheaply enough to run lots of copies. | Mar 2027 | Mar 2027 | SC to SAR: 15% 0 years; Otherwise 4 years (80% CI: 1.5 to 10; lognormal) (reasoning) | 5 (reasoning) |
Superhuman AI researcher (SAR): An AI system that can do the job of the best human AI researcher but faster, and cheaply enough to run lots of copies. | Jul 2027 (Mar 2027 to Mar 2028) | Aug 2027 | SAR to SIAR: 19 years (80% CI: 2.3 to 380) (reasoning) | 25 (reasoning) |
Superintelligent AI researcher (SIAR): An AI system that is vastly better than the best human researcher at AI research. | Nov 2027 (May 2027 to 2034) | Nov 2027 | SIAR to ASI: 95 years (80% CI: 2.4 to 1,000,000) (reasoning) | 250 (reasoning) |
Artificial superintelligence (ASI): An AI system that is much better than the best human at every cognitive task. | Apr 2028 (Jun 2027 to >2100) | Dec 2027 | N/A | 2,000 (reasoning) |
Below, we:
Describe our methodology for forecasting takeoff.
More rigorously define the above milestones and the AI R&D progress multiplier.
For each milestone, estimate its corresponding progress multiplier and human-only, software-only timeline for getting from it to the next milestone (links in table above).
The code for our simulation is here.
Methodology
Overview
We focus on predicting a potential software-driven intelligence explosion, in which there is vast improvement in AI capabilities on the scale of months-years primarily driven by using compute more efficiently (improved software), rather than more training compute.
First, we enumerate a progression of AI capability milestones (more precise definitions below):
Superhuman coder (SC): An AI system that can do the job of the best human coder on tasks involved in AI research but faster, and cheaply enough to run lots of copies.
Superhuman AI researcher (SAR): An AI system that can do the job of the best human AI researcher but faster, and cheaply enough to run lots of copies.
Superintelligent AI researcher (SIAR): An AI system that is vastly better than the best human AI researchers. The gap between SAR and SIAR is 2x the gap between an automated median AGI company researcher and a SAR.
Artificial superintelligence (ASI): An AI system that is vastly better than the best human at every cognitive task.
Then, for each gap between milestones A and B, we:
Human-only time required: Forecast a distribution for how long it would take to get from A to B with only humans working on software improvements.
AI R&D speedup: Forecast how much AI R&D automation due to each of milestones A and B will speed up progress, then run a simulation in which the multiplier is interpolated between these over time to get a forecasted distribution for the calendar time between A and B (the AI R&D progress multiplier).
Focus on software improvements
We break progress in AI down into hardware: how much computing power (“compute”) is used and software: the algorithms and data used to convert compute into more capable AIs.
In this writeup we focus primarily on a possible software-driven intelligence explosion, in which there is vast improvement in AI capabilities on the scale of months-years primarily driven by using compute more efficiently (improved software), rather than more training compute. This report discusses the possibility of a software-driven intelligence explosion at length. We focus on software because the feedback loops are stronger: improved algorithms can be almost immediately applied to train better AIs, while improved hardware designs require substantial time to produce at scale (using anything like our current methods).
Milestone definitions
Below we present the AI capability milestones that we’ve chosen to predict.
Milestone name and short definition | Name on scenario side panel | Full definition |
---|---|---|
Superhuman coder (SC): An AI system that can do the job of the best human coder on tasks involved in AI research but faster, and cheaply enough to run lots of copies. | Superhuman coder | An AI system for which the company could run with 5% of their compute budget 30x as many agents as they have human researchers, each which is on average accomplishing coding tasks involved in AI research (e.g. experiment implementation but not ideation/prioritization) at 30x the speed (i.e. the tasks take them 30x less time, not necessarily that they write or “think” at 30x the speed of humans) of the company’s top coder. This includes being able to accomplish tasks that are in any human researchers’ area of expertise. Since SC is a subset of AI research, it cannot come after a fully superhuman AI researcher (SAR below). That said, it will also have some level of “research taste” and other AI research skills when it is first achieved, and may even be a full SAR if coding is the last skill needed. |
Superhuman AI researcher (SAR): An AI system that can do the job of the best human AI researcher but faster, and cheaply enough to run lots of copies. | Superhuman AI researcher | An AI system that can do the job of the best human AI researcher but faster, and cheaply enough to run lots of copies. |
Superintelligent AI researcher (SIAR): An AI system that is vastly better than the best human AI researchers. The gap between SAR and SIAR is 2x the gap between an automated median researcher and a SAR. | Superintelligent AI researcher | An AI system that is vastly better than the best human researchers: the gap between SAR and SIAR is 2x the gap between an automated median AGI company researcher and a SAR. By 2x, we mean that if we take consider the skill distribution of the AGI company’s researchers (as measured by skill on applicable cognitive tasks, not overall productivity including compute bottlenecks; so basically value of labor), the AI is 2 doublings more above the top researcher than the top researcher is above the median (i.e. the difference is 2x greater in log space). It also has the 30x task accomplishing speed and 30x copies requirements. |
Artificial superintelligence (ASI): An AI system that is vastly better than the best human at every cognitive task. | Generally superintelligent | Roughly, an SAR but for every cognitive task. An AI system that is 2x better at every cognitive task relative to the best human professional, than the best human professional is relative to the median human professional (across the whole field, not a single company as in SAR). |
Human-only software-only forecast
Consider a scenario in which we start at an SC, and humanity is tasked with creating an SAR without being able to use the SC to speed up their research. They also use a fixed hardware supply and cannot train a system with more overall training FLOP than what was used to train the SC. The population of human researchers is fixed. For more details on this scenario, see the without-AI clarifications below.
Over time, they will run into diminishing returns (discussed more below), and we aim to take this into account in our forecasts.
Our methodology involves forecasting how long it would take for the human researchers to get from SC to SAR in the scenario above, and correspondingly for further jumps between milestones. We do this via a combination of reasoning about the specifics of the capability jumps, and explicitly accounting for diminishing returns (in the SC->SAR forecast we don’t explicitly account for diminishing returns, we aim to do it implicitly via our intuitions about the required time).
Accounting for AI R&D automation: the AI R&D Progress Multiplier
Overview
In summary, the AI R&D progress multiplier is how much faster AI software improvements are advancing with AI usage than without it.
In order in to incorporate the progress multiplier into our forecast, we first forecast the AI R&D progress multiplier due to each of milestones A and B, then run a simulation in which the multiplier is interpolated between these over time to get a forecasted distribution for the calendar time between A and B.
Our progress multiplier forecasts are based on reasoning about AIs’ effects on the research process and surveys of AI researchers.
AI R&D progress multiplier definition
In summary, the AI R&D progress multiplier is how much faster AI software improvements are advancing with AI usage than without it.
More comprehensively, consider the following conditions:
With-AI: The leading AGI project progresses as normal with AI being used to speed up algorithmic progress.
Without-AI: The leading AGI project is not allowed to use AIs to speed up algorithmic progress (where algorithmic progress is getting better at translating compute into capabilities).
Both projects are not allowed to do significant compute scaling. How much faster would AI R&D capabilities be improving in (1) rather than (2)? i.e. for what value of N would your company's algorithmic progress productivity be equivalent between (a) N weeks with no post-2024 AIs speeding up algorithmic progress vs. (b) 1 week with post-2024 AIs? e.g. if N is 5, 5 weeks of no-AI progress is equivalent to 1 week of with-AIs progress, so AIs are 5xing productivity.
For clarifications on details of the progress multiplier definition, see the appendix.
From SC to SAR
Overview
We’re assuming that we’ve reached a superhuman coder in March 2027.. How long would it be until a superhuman AI researcher (SAR) is reached: AI agents as skilled as the best human researchers but accomplishing tasks 30x faster and 30x more numerous?
Our approach will be (as described above):
Forecast how long it would take to get from SC to SAR if only humans were doing the AI R&D with a fixed hardware supply (here).
Forecast how much having an SC and SAR would speed this up (here and here).
Combine the above together accounting for intermediate speedups to get a forecast for the length of time between SC and SAR.
An overview of the results:
Forecast | Reasoning | |
---|---|---|
SC to SAR, humans-only software-only time | 15% 0 years; Otherwise 4 years (80% CI: 1.5 to 10; lognormal) | Case-based analysis depending on the requirements for training an SAR (more). |
SC progress multipler | 5 | Decomposition of various speedups that an SC provides (more). |
SAR progress multiplier (in next section, needed to calculate intermediate speedups) | 25 | A combination of 3 methods, most prominently relying on surveys of AI researchers answering questions about various hypothetical scenarios (more). |
Time between SC and SAR | 0.3 years (15% 0 years, 90th percentile 0.95 years) FutureSearch aggregate (n=4): 0.55 years (0, 2.1) | Results of simulation |
Human-only timeline from SC to SAR
How long would it take to get from SC to SAR, with only humans doing AI R&D and no increases in training compute? Here’s our forecast, via a breakdown of what might be needed to go from SC to SAR:
What is needed to go from SC to SAR? | Human-only, software-only timeline | Reasoning summary |
---|---|---|
1) The first SC is already an SAR or very close to one (15%) | 0 years | Moravec’s paradox and general inability to guess which skills will turn out to be harder than others (more). |
2a) Training the missing SAR skills isn’t more compute intensive than the SC (25%) | 2 years (80% CI 1 to 4; lognormal) | Perhaps new training environments are needed but they aren’t more compute-intensive than the ones needed for SC because the skills aren’t that different (more). |
2b) Training the missing SAR skills would be more compute-intensive than the SC absent further algorithmic progress (30%) | ~5 years (80% CI: 2 to 15; lognormal) | Perhaps new training environments are needed and they are substantially more compute-intensive as well, e.g. larger teams of agents doing big ML research projects over subjective years rather than medium-sized ML coding projects over subjective months (more). |
3) Crossing the gap from SC to SAR is a scientific rather than an engineering problem (30%) | ~5 years (80% CI: 2 to 15; lognormal) | We’d guess that this is the sort of limitation that would take years to overcome — but not decades; just look at the past decade of progress e.g. from AlphaGo to EfficientZero (more). |
Overall distribution | 15% 0 years; Otherwise 4 years (80% CI: 1.5 to 10; lognormal) | Approximating a mixture of the above with a lognormal. |
Case 1: The first SC is already an SAR (15%)
Some reasons why this might be the case:
Moravec’s paradox & general inability to guess which skills will turn out to be harder than others.
Plausible story #1 for why this could happen: Maybe SAR is close enough to SC that generalization/transfer learning takes care of it.
Plausible story #2 for why this could happen: SC and SAR skillsets share the same agency skills but differ in world-knowledge/understanding. E.g. the difference between a great engineer and a great scientist is in what they know / what they understand, not in e.g. their general heuristics for noticing when they are stuck, making plans, etc.
If Case 1 obtains, then time from SC to SAR is 0 years.
Case 2: The first SC isn’t an SAR due to its training environments not teaching it necessary skills (55%)
In this case the shortest path to SAR would probably involve coding up new and improved training environments that teach those skills. For example, perhaps the new environments involve managing a research team of SCs with tasks like ‘replicate this paper’ and ‘improve on this SOTA’ and ‘predict the results of this experiment before it runs’ and ‘with your training data cutoff having been 6 months ago, predict the most promising research breakthroughs of the past 6 months’.
However, it seems really hard to get ground truth for such tasks, which could be a huge conceptual blocker for building the right training environment. However, this isn’t a barrier for all tasks, and by now we have SC. Lots of SC tasks are really hard to get GT data on too, e.g. ‘write secure clean code that faithfully captures the intent of the scientist managing the project.’ Perhaps whatever worked for SC will also work for SAR with a bit of extra tinkering. If not, then go to Case 3 below.
Sub-case A: Training the missing SAR skills isn’t more compute-intensive than the SC (25%)
Environments that train the missing SAR skills don’t by default require more compute than the SC cost to train. In this case the bottleneck is the design, construction, & testing of the new training environments.
With human engineers doing the labor, our guess is that it would take 1-4 years.
Sub-case B: Training the missing SAR skills would be more compute-intensive than the SC absent further algorithmic progress (30%)
In this case, the environments needed to train SAR would require a substantial increase in compute expenditure. Therefore the SAR milestone will be achieved after some amount of ‘effective compute’ increase, in which algorithmic progress lowers the compute requirements e.g. by more efficient architectures, more efficient RL algorithms, more sophisticated training environments, etc.
With human engineers doing the labor, our guess is that it would take about 2-15 years.
Case 3: (30%) Crossing the gap from SC to SAR is a scientific rather than an engineering problem
Finally, it might be that the first SC is not an SAR due to some other limitation (e.g. using an architecture + RL algorithm that is data-inefficient, or that requires ground-truth training signal). In Case 3 crossing the gap from SC to SAR is a scientific problem and not just an engineering problem, for example:
Research engineering doesn’t require data-efficient learning but being a research scientist does, and perhaps data-efficiency is more a property of the architecture + RL algorithm than of the model or training environment.
It's easier to evaluate coding tasks (does the code pass all the test cases?) than it is to evaluate research tasks (does this research substantially contribute to the field?) And perhaps this difference is deep and large, such that new methods will need to be invented to cross the gap.
Overall we’d guess that this is the sort of limitation that would take years to overcome—but not decades; just look at the past decade of progress and consider how many similar barriers have been overcome. E.g. in the history of game-playing RL AIs, we went from AlphaGo to EfficientZero in about a decade.
Remember, we are assuming SC is reached in Mar 2027. We think that most possible barriers that would block SAR from being feasible in 2027 would also block SC from being feasible in 2027.
So in this case we guess that with humans doing the AI R&D, it would take about 2-15 years.
SC would ~5X AI R&D
Remember that an superhuman coder (SC) is defined as:
Superhuman coder (SC): An AI system for which the company could run with 5% of their compute budget 30x as many agents as they have human researchers, each which is on average accomplishing coding tasks involved in AI research (e.g. experiment implementation but not ideation/prioritization) at 30x the speed (i.e. the tasks take them 30x less time, not necessarily that they write or “think” at 30x the speed of humans) of the company’s top coder. This includes being able to accomplish tasks that are in any human researchers’ area of expertise. |
---|
We broadly think of the AI software R&D process as broadly involving 2 types of activities:
Experiment selection: Ideating and prioritizing experiment ideas, interpreting the implications of experiment results.
Experiment implementation: Coding up, running and monitoring experiments.
We think that experiment implementation is easier to automate than experiment selection, because it’s easier to obtain lots of natural coding data, and also easier to generate synthetic data due to coding being easier to evaluate than experiment selection. This is borne out in today’s AI systems, which aid in coding more than experiment selection.
When we discuss SCs, we are discussing AIs that are as good as the best humans at (2) experiment implementation, while being faster and cheaper. In the mainline case an SC doesn’t completely automate AI R&D; instead, elite human scientists in the AGI project manage large teams of AI agents that rapidly execute on their research vision.
A more detailed model of AI software R&D is from an Epoch report:
A more realistic picture would be fractal — the big loop would be composed of smaller loops, themselves composed of smaller loops. Zooming in we’d find mini-loops such as ‘Think of a new feature the code should have; draft it up; run the code and see if it works; repeat…’ For example, a typical month might look like this:
Most of the project’s compute is sunk into the ongoing big new training run. The remainder of the compute is divided up amongst N teams. Each of these N teams is doing 1 small-scale training run per week; in between these runs individual team members do mini-scale ‘warmup runs’ to test and debug their code. So, ideas are tested at mini-scale first and debugged, and then the better versions of those ideas are tested at small-scale, and then the most promising surviving ideas are tested further and scaled up even more and eventually are integrated into the training run. In general compute doesn’t get wasted—when one team (or individual) is writing code or analyzing their latest results, the GPUs are humming away running someone else’s experiments.
If an SC existed and were integrated into this process, it could greatly shorten the time taken to complete some of the tasks in the loop, thereby speeding up the overall pace of progress. With lots of cheap, fast coding labor, an individual research project doesn’t need to alternate between writing code and running experiments. Instead, the ‘writing code’ portion of the process is vastly sped up, so that it takes negligible time compared to the other parts. It’s sped up by at least 30x definitionally because the SC is on average 30x faster than the best coder, and more due to there being 30x more copies of SC than human coders, and because the AIs are at the level of the best rather than the average human coder. Let’s say it’s sped up about 100x.
This 100x speedup in coding, however, will translate to a much-less-than-100x speedup in overall research velocity—because the overall research loop includes e.g. waiting for experiments to run, discussing and analyzing the results of experiments, discussing what the next experiment should be, etc., and our hypothesized SC is not going to speed those up much; those parts of the process will thus become bottlenecks.
Taking these bottlenecks into account, what will the overall speedup be? We don’t know. But here is our current best guess breakdown:
Flexible prioritization:
Normally, a company or research team splits its resources across multiple different parallel projects. For example its GPUs might be running several experiments at once, or switching between running an experiment for team A while team B codes, and then running an experiment for team B while team A codes.
This is NOT because all the projects are equally valuable and urgent; rather, it is because there are diminishing returns to focusing on one project. If the company gave 100% of their compute to their most valuable team working on their most valuable project, instead of just 20%, this would not make it go 5x faster. Because they’d still have to spend lots of time coding, during which time the GPUs would be idle or at least doing something relatively unimportant. Instead it would maaaybe make it go ~1.5x faster, which wouldn’t be worth the cost to the other teams.
Importantly, we expect that returns to temporarily focusing compute to turbocharge one sub-project will diminish less steeply after the SC milestone is reached. It really will be possible to throw 100% of the compute at the highest-priority team for a time, and have them go ~5x faster (again assuming the default is 20%) Because coding happens much faster, they can launch the next experiment very soon after the previous one finishes.
Our estimate is that this results in a research speedup of 1.5x to 3x.
Smaller experiments when possible: Moreover, thanks to SCs, all of your research projects will look for ways to run smaller experiments on the margin, since compute savings directly translate to research velocity.
Some kinds of research really require lots of compute and lose most of their value if you run the cheaper versions of experiments. But others don’t, so they’ll be turbocharged. The organization can invest extra in types of research that don’t depend so much on compute—such as, for example, making really complicated scaffolds.
Maybe this causes another factor of 1.2x to 2x speedup that stacks with the above?
Less waste: With SCs, research teams can use the vast engineering labor to waste less compute.
Before every experiment including small-scale experiments, SCs can red-team and bug-test it to reduce the probability that there is some bug which ruins the results.
Monitor the experiments in real time, noticing and fixing some kinds of problems almost as soon as they happen (by contrast it’s common in frontier companies today for even medium-sized experiments to be running overnight with no one watching them). They can shut down the experiment early as soon as it’s clear that it worked or didn’t work
Do the same experiments but with less compute e.g. via higher utilization
Overall I’d guess this causes another factor of 1.2x to 2x speedup.
Fancier experiments: The experiments can also just be fancier, testing more variables at once for example, or eliminating more possible confounders. (Because the fast cheap SCs are able to write a lot more code a lot faster than the engineers of 2024.)
Guess: 1.1x to 1.5x speedup
Lack of diversity: Even if the SCs are as good as the best human engineers, they will all be copies of each other, whereas human engineers are diverse. So it’s possible that e.g. they’ll still have occasional blindspots—fewer than any human perhaps, but nonzero —and that unlike humans they won’t be able to call over a buddy with fresh eyes and a different perspective to help them out. They can still call on the humans of course… as usual it’s hard for us to guess how much this would slow things down if at all. Maybe it’s a ‘speedup’ of 0.8 to 1x
These four speedup estimates combine via a Guesstimate model to get a total speedup of 5.8x (90% CI: 3.4x to 10x). (This is within-model uncertainty; our overall actual uncertainty is bigger) Some other adjustments:
There may be other helpful uses of SCs which are missing on this list, which would combine with the above to make overall research velocity even faster. For example, SCs could cheaply and quickly construct richer, more diverse, and more challenging training environments, with better shaped rewards. This isn’t improving scientific understanding, but it directly makes the trained models smarter, which still should count.
There may also be other bottlenecks we are unaware of, and the bottleneck of time spent thinking and analyzing and planning etc. might bite earlier than we expect. That said, SCs might help with that too to some extent, or partially obviate the need for it…
We’re going to forecast 5x to be slightly conservative. We reiterate that this is just a guess and that it could be substantially faster or slower in reality.
From SAR to SIAR
Overview
Now we’ve reached superhuman AI researcher, and it’s speeding up AI R&D by 25x. How long would it be until a superintelligent AI researcher (SIAR) is reached: an AI system that is better than the best human AI researchers to a 2x greater extent than the best human researchers are relative to the AGI company’s median researchers, while being 30x faster and more numerous.
An overview of the results:
Forecast | Reasoning | |
---|---|---|
SAR to SIAR, humans-only software-only time | 19 years (80% CI: 2.3 to 380) FutureSearch aggregate conditional on SAR in 2027 (n=3): 11.5 years (1.75, 27) | Based on comparing the time between automated median researcher and SAR (more). |
SAR progress multiplier | 25x | A combination of 3 methods, most prominently relying on surveys of AI researchers answering questions about various hypothetical scenarios (more). |
SIAR progress multiplier (in next section, needed to calculate intermediate speedups) | 250x | Based on survey data regarding returns to better researchers within the human range (more). |
Time between SAR and SIAR | 0.3 years (80% CI: 0.04 to 56 FutureSearch aggregate conditional on SAR in 2027 (n=3): 1 year (0.1, 2.1) | Via [simulation](https://github.com/uvafan/timelines-takeoff-ai-2027 |
Human-only timeline from SAR to SIAR
Here’s a simplified model of how human labor spent improving AI algorithms on a fixed hardware supply might translate into improving AI R&D capabilities measured with reference to the human range within OpenBrain:
The distribution of AI R&D capabilities within OpenBrain is a lognormal distribution in terms of value of labor as cashed out in overall differences in research progress.
It’s confusing to think about what the labor/progress differences would be between the worst project researchers and the median. The distribution might not be lognormal because there’s something like a cutoff that they’re trying to select above (there are probably some outliers though so hard to think about).
Based on salaries the labor multiplier between median and lowest is significantly lower than between median and highest. I think there’s something real there due to the cutoff effect. This is ignored in this forecast due to time constraints.
Each doubling of cumulative human labor spent improving AI algorithms multiplies the AIs’ value of labor by a fixed amount (this is very similar to the assumption made in the Davidson report). In particular, for each doubling of cumulative labor, there are r doublings of the value of labor.
Since the distribution in (1) is lognormal, increasing labor productivity by a fixed multiplier is equivalent to increasing by a fixed amount of SDs within the OpenBrain human range.
Since SAR->SIAR is the same in terms of labor multiples as 2*(automated median OpenBrain researcher->SAR), the amount of cumulative effort doublings to go from SAR->SIAR is twice the amount required to go from automated median OpenBrain researcher->SAR.
Working within this simplified model, we will think about the arrival of the automated median OpenBrain researcher in our scenario, how much research stock it took to get this and how much research it took to get from the median researcher to SAR.
Human-only years from automated median researcher to SC: ~0-3 assuming SC has 25th percentile research taste, more uncertainty if not. Roughly 0-3 calendar months before SC in the case where SC has the median level of research taste we’re projecting (25th percentile, see above). This is because the SC is very strong on coding such that it can make up some for its somewhat below average taste. These 0-3 calendar months translate into approximately 0-3 human-only years, given that an SC speeds up algorithmic progress by 5x in the median case. There is further uncertainty introduced by the SC having varying levels of taste, including some cases in which it’s already SAR and the median researcher must have been achieved a few months earlier.
Human-only years to get from automated median researcher to SAR: 5 years (80% CI: 1 to 25, lognormal). Above we give a median time of 4 years from SC to SAR, conditional on SC not already being an SAR, but with high uncertainty; the 90th percentile is 20 years. We’d then add on the above 0-3 years human-only years from (1) (in the median taste case), with further uncertainty if we accounted for variations in taste at SC arrival. The quantities of (1) time from automated median researcher -> SC and (2) time from SC->SAR are anti-correlated, as SC having higher taste increases (1) while decreasing (2). Therefore the lower bound is above 0 (also common sense says it can’t be 0 since SAR is defined as being much better than the median).
Total stock of 2025-research-years at the time of automated median researcher: ~10. This is a quick estimate based on the pre-2025 stock, the calendar time through approximately SC in March 2027, a bit earlier in the median case.
Thinking how diminishing returns work here under our simplified model by doing some casework with a few examples:
Human-only years to automated median researcher: 10
Human-only years from automated median researcher to SAR
Case 1: 5
This is about 0.6 doublings leading to about a 5x progress multiplier, according to the above survey. So this would be r = ~4 (each doubling in total effort means 4 doublings in progress multiplier), which is on the high end of Davidson’s calculations but not wildly off. We tend to have higher r estimations than Davidson due to putting more weight on the importance of qualitative capability jumps (more below).
Case 2: 10
This would mean that progress r=~2.3, which is similar to Davidson’s best guess (as of 2022) of r=2 (at the time of full AI R&D automation)
SAR to SIAR human-only software-only time needed (calculated via needing twice as many overall research effort doublings as the previous step)
Case 1: 19 = 15*2^(2*log(15/10)/log(2))-15
Case 2: 60 = 20*2^(2*log(20/10)/log(2))-20
Overall I will model my forecast in the following way:
Human-only years to automated median researcher: Assumed to be 10 for simplicity
Human-only years from automated median researcher to SAR: 5 years (80% CI: 1 to 25, lognormal), and with a strong correlation to the SC->SAR human-only years (0.8).
I’ll then model SAR to SIAR human-only software-only time needed via the formula above.
This gives a median of 19 years (80% CI: 2.3 to 380). I’ll adopt this as my forecast.
SAR would ~25x AI R&D
Overview
First, we discuss what having SARs will look like. Then, we forecast the SAR progress multiplier via a few methods:
Forecasted progress multiplier | Reasoning | |
---|---|---|
Method 1: Speedup decomposition | 417 (90% CI: 130 to 1900) | Combining a breakdown of expected speedups from SAR |
Method 2: Surveys on subquestions | 26 (90% CI: 8 to 139) | A progression of hypothetical scenarios informed by our surveys of AI researchers |
Method 3: More direct survey | 35 (90% CI: 6 to 183) | We take a recent survey of researchers about how much progress would speed up if every company employee had access to 30x copies thinking at 30x speed, then adjust upward for all SARs being as good as the best researchers. |
Overall forecast | 25 | We think that both methods 2 and 3 are more reliable than method 1 and surprisingly converged to a similar answer. Overall we forecast 25x, shading conservatively from our preferred methods above and avoiding false precision. |
What having SARs will look like
Now all the labor is automated. What does this look like quantitatively?
Our compute forecast projects that OpenBrain will have about 10,000,000 H100-equivalents in 2027. If they allocate 5% of their compute to inference for AI R&D, then (given our other guesses about e.g. model size, latency bottlenecks, etc.) we think they’d be able to run 200,000 automated researcher copies, each at ~400 tokens/second. As in the SAR definition, this workforce is equivalent to 50,000 agents accomplishing tasks at 30x the rate of the best humans.
We think they might have access to a small quantity of specialized chips (e.g. Cerebras) able to run a smaller population of AIs at up to ~2000 tokens/second.
The effect of SARs is simpler to analyze compared to SCs. All of the labor (including experiment selection) is sped up by 30x, plus there are 30x more parallel laborers, plus all of the AIs are like the best researcher sped up rather than one corresponding to each individual researcher.
Method 1: Speedup decomposition
To forecast SAR’s progress multiplier, we first did a breakdown similar to the previous one for SC. We thought about all the ways having SARs could accelerate things above and beyond what SCs would already accomplish. We came up with the following list:
Factor | Speedup |
---|---|
Flexible prioritization | 1.5 to 3 |
Experiment design debugging & early stopping | 1.5 to 3 |
Smaller experiments when possible part II | 1.5 to 3 |
Even fancier experiments | 1.1 to 1.5 |
Lack of diversity creates blindspots | 0.3 to 1 |
Improved research taste via quantity | 2 to 8 |
Improved research taste via quality | 1.5 to 5 |
SC AI R&D progress multiplier | 5 |
SAR AI R&D progress multiplier | 417 (90% CI: 130 to 1900) |
However, this progress multiplier feels too high.
Method 2: Surveys on subquestions
Scenario step | Progress multiplier estimate | Reasoning |
---|---|---|
All researchers at OpenBrain are sped up by 30x | 7 | A survey says that AI researchers would make progress 40% as fast with 10x less compute. Being sped up 30x means you have 1/30 of the compute per subjective-time, which gives us 22% of the 30x rate of progress. |
Add 30x parallel labor | 3 (90% CI: 1.5 to 6) | Estimated by brainstorming what parallel labor would be helpful for. |
Make everyone as competent as the best researcher | 2.5 (90% CI: 1.25 to 5) | Halving a survey result for the question of how much more productive a company with all best researchers would be than one with all median researchers. |
Less diversity | 0.25 to 0.95 | AIs might lack the diversity of human organizations. |
SAR achieved | 26 (90% CI: 8 to 139) | Guesstimate |
Let’s consider a simplified hypothetical, in which every human researcher and engineer in the AGI project is replaced with a digital copy of themselves that accomplishes tasks 30x faster. How much would the pace of AI R&D progress in the company speed up then?
We have some informal, nonscientific survey data of ML researchers, asking two related questions:
How much faster would your overall research progress be if you had access to 10x as much compute as you do now?
How much slower would your overall research progress be if you had access to 10% as much compute as you do now?
The results are:
From an informal Twitter poll re: increased speed from 2x compute (n=185): Median of 1.2x-2x, toward the high end.
From an informal Slack poll of AI safety researchers (n=6):
Increased speed from 10x compute: 1.18x median (1.01 to >1.5)
Decreased speed from 10% of compute: 0.6x median (0.2 to 1)
While these surveys are informal and have a low sample size and wide error bars, they are better than nothing. Here we are most interested in the question of decreased speed from having 10% of the compute. We got a 0.6x median from the Slack poll, but this was from people whose work seemed less compute-intensive than the Twitter poll, whose work may be further less compute-intensive than frontier AI researchers. Therefore we shade down and speculate that for frontier AI researchers, decreasing compute by 10x reduces speed to 40% of the current rate.
Extrapolating these results a bit, we estimate that reducing compute budgets by 30x would cut the pace of software progress to about 22% of the current rate.
In this hypothetical, we can think of the pace of software progress as being 30 times faster, times 22% (because the 30x-faster-workers have 1/30th as much compute per unit of subjective time), leading to a 7x progress multiplier.
We start with this 7x figure then gradually remove simplifications from the hypothetical until it has transformed into the actual SAR scenario.
The first simplification to remove is that in the SAR situation there are 30 times as many SARs as there were company researchers, roughly. Now, nine women can’t make a baby in a month, so this 30x increase in parallel labor will translate to far less than a 30x increase in actual progress; but it will still be helpful. Assuming that compute-for-experiments will be the bottleneck in this regime, there are still ways in which having 30x more scientific labor can help ameliorate that bottleneck such as:
Watching experiments as they run, debating their implications, and stopping them as soon as starting the next experiment seems more valuable than continuing to watch the learning curve grow.
Putting 30x more thought into the design each experiment, making sure that it is not only bug-free but confounder-free and testing the right hypotheses etc.
Having more and better thoughts go into analyzing each experiment and deciding what to prioritize next.
Rapidly pursuing all the parts of the tech tree that don’t involve costly experiments, such as ML theory, interpretability, evals…
Overall we estimate that this 30x parallel labor factor would speed things up by 3 (90% CI: 1.5 to 6).
Another simplification to remove is that instead of the AIs being digital copies of all of the human employees of varying ability. SARs will be, by definition, as good as the very best scientists and engineers. So the average ability level will rise to the maximum ability level.
We conducted a small survey ourselves to get information about this—specifically, in November 2024 to March 2025 we surveyed some of our friends who currently or formerly work as researchers or engineers at OpenAI and Google DeepMind. We asked, paraphrased: “how much faster would research go if everyone in your company was as good as the best, compared to if everyone in your company was as good as the median (for your company)” Prior to seeing their answers our estimate was 4x faster; the median response was 6.25x (n=8) .
Our estimate is that going from the present day distribution to having all researchers be at the level of the best captures half of the speedup from going from all median to all best researchers. So we estimate the overall speedup from this factor will be about 2.5 (90% CI: 1.25 to 5).
One factor remains: diversity loss. “Bring all your researchers up to the level of the best” is better than “make all your researchers copies of the best,” because any individual might have blind spots or weaknesses that others lack. The SARs will mostly be copies of each other, so this problem might bite more than it bites in human research organizations which come with some degree of diversity built-in simply by virtue of having multiple employees. We think this might matter a lot, with high uncertainty. We wildly guess this multiplies overall research velocity by 0.28x to 0.95x.
Combining all these numbers, we get a total multiplier of 26 (90% CI: 8 to 139) from SARs.
Method 3: More direct survey
A recent report surveyed five current or past employees of frontier AI companies in March 2024, asking them to engage in the following hypothetical scenario that is coincidentally quite similar to ours:
“Imagine that compute available for experiments and training is basically increasing at the rate it’s been increasing over the last several years, but now, there is one big difference: there are 30 AI-powered copies of each person at your company working 30 times as fast… Now let’s look at things from a very high level, what do you think would be the overall pace of frontier lab AI capabilities progress, compared to the current pace?”
The results ranged from about 1.5x to “conservative[ly]” 8-11x (n=4), with a median of ~7x. We represent this as a lognormal with median ~7, 90% CI 1.5 to 30,
There were 2 differences between the survey and the SAR progress multiplier:
The survey asks about overall AI progress, while we focus on software progress: therefore we multiply by 2x since about half of current progress is from training compute, meaning that with high progress multipliers the overall progress speedup is about ½ the software progress speedup.
The survey asks about making copies of every researcher, the SAR is as competent as the best researcher: 2.5 (90% CI: 1.25 to 5)
This gives a result of 35 (90% CI: 6 to 183).
These researchers were also asked more specific questions about e.g. the speedup that would come specifically from being able to reliably debug experiments before running them, from being able to better prioritize resources, from being able to do experiments at smaller scale when possible, from being able to run fancier experiments, etc. Perhaps unsurprisingly given the massive uncertainties involved and the small amount of time spent thinking about it, the numbers they give for the per-factor speedups typically multiply together to get results that are significantly higher than the answers that they initially gave.
From SIAR to ASI
Overview
Now we’ve reached a superintelligent AI researcher. How long would it be until an artificial superintelligence (ASI) is reached: an AI system that is 2x better at every cognitive task relative to the best human professional, than the best human professional is relative to the median human professional (across the whole field, not a single company as in SAR).
An overview of the results:
Forecast | Reasoning | |
---|---|---|
SIAR to ASI, humans-only software-only time | 95 years (80% CI: 2.4 to 1,000,000) | Based on comparing the jump to that from a SAR to SIAR (more). |
SIAR progress multiplier | 250x | Based on survey data regarding returns to better researchers within the human range (more). |
ASI progress multiplier (needed to calculate intermediate speedups) | 2,000x | Based on comparing the jump to that from a SAR to SIAR (here). |
Time between SIAR and ASI | 0.16 years (80% CI: 0.01 to >100) | Via [simulation](https://github.com/uvafan/timelines-takeoff-ai-2027 |
Views on experiment selection
Given that without improved experiment selection (i.e. research taste) we’d hit sharply diminishing returns due to hardware and latency bottlenecks, forecasting improved experiment selection above the human range is quite important.
Our further progress multiplier forecasts are informed by two key views regarding experiment selection:
Large human variability: There is a substantial difference in experiment selection ability even between even the median and top AGI company employee, as evidenced by our researcher survey which found an estimated 6.25x difference in research progress between a company of all median and all best employees, including a 3.25x difference based on experiment selection.
Reasoning: Observing and discussing with researchers at AGI companies, the above survey, and observing salary differences.
Large headroom above the human range: There is a very large amount of headroom in experiments selection skill above the top humans, for AIs at the limits of intelligence (>=1000x difference based just on experiment selection and not implementation).
Reasoning: We don’t see sharp diminishing returns at the top of the human range, and we don’t have reason to believe that humans are near the limits of general intelligence.
Human-only timeline from SIAR to ASI
I’ll think about this in relative terms to how long it takes to cross from SAR to SIAR human-only as forecasted above.
There are 2 gaps to cross between SIAR and ASI:
Achieving 2 (median->best jumps) above the best human when looking at the whole field rather than a single company: a bit less than an SAR->SIAR jump
For AI R&D, my guess is that this requires a bit less of a jump as SAR->SIAR, because the median ML professional is a bit better than the worst AGI company researcher (if the worse researcher were as much worse than the median as the median was compared to the best, which may not be true in practice due to hiring cutoffs).
Achieving ASI in all cognitive tasks rather than just AI R&D: About half of an SAR->SIAR jump.
I think once an AI is extremely good at AI R&D, lots of these skills will transfer to other domains, so it won’t have to be that much more capable to generalize to all domains, especially if trained in environments designed for teaching general skills.
Again using a Davidson-style function for diminishing returns as I did for SAR to SIAR, I then model time required to get from SIAR to ASI by doing the following:
Get the total stock of human-only years up through SAR, and the years between SAR and SIAR drawing upon the results of the SAR to SIAR forecast.
SIAR to ASI human-only software-only time is determined by:
Drawing SAR-to-SIAR-equivalent-jumps needed to get from SIAR to ASI: median 1.5 jumps (80% CI: 0.3 to 7.5; lognormal). I’ve shaded this up from 1.25 due to the expectation that r will be increasing.
Use the same formula structure as used for SAR to SIAR to compute the forecasted distribution.
This results in a forecasted median of 95 years (80% CI: 2.4 to 1,000,000). We think that the 90th percentile is probably too high, though keep in mind this is software-only improvements, so it’s at least conceivable that these would slow down dramatically before reaching ASI, with no increases in training compute.
SIAR would ~250x AI R&D
The median of our survey says that switching from all median to all top researchers would give a ~6.5x algorithmic progress speed boost. When only considering “research taste” rather than engineering the median is 3.25, and this may be more relevant because once we get to superhuman automated researchers we may be strongly bottlenecked on non-engineering tasks. Let’s say that the multiplier is roughly 3.25 once we’re above SAR.
Going from SAR to SIAR is by definition like 2 of these jumps, so 3.25^2 which is about a 10x increase in the progres multiplier. This gives a 250x progress multiplier.
ASI would ~2,000x AI R&D
Above we estimated that without shading down for being decreased, going from SAR to SIAR would involve a median of 1.25 SAR-to-SIAR-equivalent-jumps This would naively result in a 10^1.25 increase in the progress multiplier. However, we should shade down for r decreasing, let’s shade down to an 8x on SAR for a rough estimate of 2,000x.
Related work
Davidson’s “What a Compute-Centric Framework Says About Takeoff Speeds”
This 2023 report by Tom Davidson utilizes economic modeling to predict AI takeoff dynamics (description, playground). The primary model is not designed to predict what happens once AI R&D is fully automated, e.g. the transition from SAR to SIAR. The section that predicts what happens after full automation projects a 65% chance of a software-only singularity, which would in the median case result in 2-3 OOMs of effective compute increase conditional on it happening. In forthcoming work, Davidson projects upon fully automating AI R&D a 50% chance of squeezing 3 years of normal AI progress (roughly 3-4 OOMs of effective compute on Davidson’s calculations) into 1 year.
While we haven’t been thinking primarily in OOMs of normal years of progress, if we use the Davidson-like model of diminishing returns discussed in the SAR->SIAR section, we back out that our median value for “r”, the rate of returns to increased effort on software progress, is 4 (as discussed in “Case 1: 5”). Davidson’s best guess in 2022 was r=2, in more recent forthcoming work his estimate is now 1.2. An r above 1 means that each increase in the AIs’ intelligence is getting successively faster over time, and r goes down over time as we get closer to the limits of intelligence. We note that Davidson’s expectations overlap significantly with ours: e.g. r=2 is within the bounds of each of our uncertainties.
Our disagreement regarding r primarily comes down to intuitions regarding accounting for jumps to new capabilities. Davidson primarily estimates r via measuring efficiency improvements: e.g. seeing how much less training compute 2025 algorithms take to get to the performance of AIs from 2015. However, we need to also take into account capability improvements: improvements in AI capabilities at a fixed training compute threshold. These can be thought of as efficiency improvements that are measured “up” rather than down: e.g. seeing how much more training compute 2015 algorithms require to get to 2025 performance than 2025 algorithms. However, these are more difficult to measure.
One might get much higher estimates for r when measuring up: algorithms might scale poorly to qualitatively new capabilities. Consider how even if Magnus Carlsen studied chess for billions of years, he would never be as good as the top computer chess engines (and if not him, definitely more typical humans). There might even be effectively infinite qualitative jumps with large enough algorithm improvements.
Davidson multipliers r by 2 to account for improving capabilities at a fixed compute level, what we’ve called measuring up. We think that multiplying by 2 is insufficient to address the qualitatively large gains that come from unlocking new capability levels (eventually likely achieving capability levels that previous training methods could never have achieved, with infinite compute/data).
There is further discussion of this issue in an unpublished draft authored by Eli here. The draft also contains sections on other issues with Davidson’s model.
We are generally optimistic about modeling specific capability levels as we’ve done in this draft, as a complement to doing economic modeling fit to past data. We are wary of relying too strongly on mathematical models to predict qualitative performance changes, though we think they are still very informative.
See also this discussion of one conception of a software-mostly intelligence explosion and its likelihood.
GATE
GATE is a recently released Epoch model that builds upon Davidson’s model to create a version that is more focused on broad economic automation (scenario explorer, paper). Like Davidson, they don’t claim that their model makes good predictions about what happens close to or after full automation.
The primary change they make which is relevant to a potential software-only intelligence explosion is that they make investment the driver of AI software R&D (i.e. an allocation from GWP), rather than allowing improved software efficiency to directly be used for further R&D. This leads to less aggressive takeoff predictions, though it has a bit faster timelines.
We prefer Davidson’s choice. Epoch’s modeling emphases are related to the view that most AI value will come from broad automation, rather than R&D. In the post, Erdil and Barnett argue that the value of AI will be driven by broad economic deployment, rather than R&D, because:
Economic estimates of the value of R&D generally find it explains about 20% of labor productivity growth, with the rest mostly accounted for by an increase in capital per hour of labor (capital deepening).
Our response: It seems plausible that most of the AIs’ value generated will eventually end up being non-R&D. But we think the R&D will still unlock huge capabilities in a short amount of time.
AI R&D automation isn’t a special case: r is probably below or near 1 once accounting for reliance on experiment compute and data, and if it starts above 1 it will likely decrease to below one soon after the start of a software-only intelligence explosion, leading to a less than 10x efficiency improvement.
Our response: As discussed in the above section, we have higher estimates of r. We agree that these other considerations need to be taken into account but think that r will likely remain significantly above 1. Davidson discusses the need for experiment compute here.
Automating AI R&D would require a broad set of abilities, perhaps being automated after general labor.
Our response: We agree that AI R&D requires a broad set of abilities that would transfer to other domains. However, we think it will be automated early in practice because of (a) incentives to automate it first, (b) easy access to data and (c) reliance on cognitive skills. We can also observe that coding has been one of the jobs most susceptible to automation thus far.
Additionally, in Epoch’s default case they set the maximum software efficiency relative to today to be 10,000, which we consider to be far too low. In Davidson’s model, this parameter is set to 1e12 by default.
Addressing common objections
Above we’ve done a relatively mechanistic analysis of when AI milestones will be reached and what the implications are, and come away with the conclusion that it’s likely there will be a “software-only intelligence explosion” in the sense that due to AI software progress on a fixed hardware supply, we will most likely go from superhuman coders to superintelligent AIs within a year. This becomes even more likely if we allow for some hardware increases, a software-”mostly” intelligence explosion.
Below we address some common objections to (a) the likelihood of such fast progress and (b) the consequences of such fast progress.
Amdahl’s Law
We’ve heard a few people make the argument that because automation can only speed up some aspects of R&D so much (e.g. gathering human data), progress will be bottlenecked on these tasks leading to not-very-large progress multipliers (see Amdahl’s Law).
In particular, they argue something like: Task X currently takes 10% of researchers’ time during AI software R&D. Therefore if it's not sped up by automation, AI software R&D can't be sped up by more than 10x.
While there is some truth to this, the claim does not hold because the speed of accomplishing current AI software R&D sub-tasks doesn’t constrain the speed of the overall process.
To see why this is conceptually mistaken, consider a theoretical AI with very superhuman experiment selection capabilities but sub-human experiment implementation skills. Even if automation didn’t speed up implementation of AI experiments at all and implementation started as 50% of researchers’ time, if automation led to much better experiments being chosen, a >2x AI R&D progress multiplier could be achieved.
Regarding the extent to which these sorts of bottlenecks will hold in practice: regarding gathering human data, we think that synthetic data will likely ameliorate this bottleneck. Additionally, we’ve argued that it’s likely that improved experiment selection has a high ceiling above, and to the extent one can select Nx more valuable experiments this multiplies the progress multiplier Nx, regardless of the rest of the research process.
Compute is the dominant driver of AI progress
This view posits some combination of (a) training compute matters far more than algorithmic progress (b) experiment compute matters far more than researcher quality and quantity for algorithmic progress, (c) algorithmic progress is driven by adapting to larger compute scales and (d) once AI R&D is fully automated, algorithmic progress will be close to fundamental limits which will make compute the dominant driver of further progress.
There’s lots to say about this. First, we’ll note that we agree that compute is both (a) an important driver for overall AI progress and (b) the importance of compute for experimentation. We take into account these bottlenecks in our forecasts, but some disagree with us about how hard these bottlenecks bite.
We’ll note just a few arguments here regarding why we have more aggressive predictions regarding, given that this is addressed to some extent above in the sections discussing AI R&D progress multipliers:
A survey of a few frontier AI researchers yielded an estimate of approximately a 6x increase in the pace of algorithmic progress if the company had only copies of the median researchers vs. the best researcher, with a fixed compute budget.
AGI companies pay very high compensation packages to their top researchers, and have been growing quickly.
AI models have consistently gotten much better over time at a fixed compute cost (some of this is due to distillation though).
AIs have structural collective advantages relative to humans (see article).
It seems unlikely that AIs will be anywhere near the physical limits of intelligence/learning when AI R&D is fully automated, based on (a) the AIs potentially being less efficient than humans at this point (b) the distribution of skills within the human range suggesting large gains near the top and therefore likely large gains above the top (c) known inefficiencies in the human brain.
The AI R&D progress multiplier has to plateau somewhere, why not below 10,000x? Why not below 1,000x?
We agree that the progress multiplier has to plateau somewhere. In our scenario we depict it continuing to rise as the milestones SC, SAR, SIAR, and ASI are achieved, and rising further still after that. Why? Why do we depict it rising all the way to 1,000,000x by 2035?
We are in fact very uncertain about how high (or low) the plateau will be. We did not put much effort into estimating this quantity because it does not seem important to the story—the story would go basically the same if the progress multiplier plateaus at 1000x, for example.
However, our best guess is that the multiplier will eventually get much higher than 1000x.
Recall that according to our small survey of frontier AI company researchers, the research taste gap alone is about 3x between their company’s median and best researchers. Recall that research taste is basically how fast you learn from experiments (by e.g. choosing better experiments to run, and by arriving at more correct conclusions). The median researchers at frontier AI companies are pretty good! They are probably at least 3x research taste multiplier over the global median AI researcher, possibly 10x. And they are vastly better than the average human, who basically couldn’t do AI R&D at all.
The point we are making is that the distribution of humans by research taste appears to be heavy-tailed and does not appear to be asymptoting against any inherent limits.
ai-2027.com
Given a distribution of this shape, where should we expect the ceiling/asymptote? Our thinking is shaped by an analogy to Lindy’s Law, which states that the unknown part of a distribution should increase proportionately to the known part. That is, if a pile of leaves on a lawn has existed for several days, our prior for its remaining lifespan should be measured in days; if the Great Pyramid has existed for several millennia, our prior for its remaining lifespan should be denominated in millennia. By a hand-wavy version of the same principle, we speculate that maximum-achievable-for-humans research and engineering speed could be about as far from the top human geniuses as they are from their more normal peers. This is highly speculative, and we don’t want to claim anything with confidence except that the upper end of the distribution probably is not coincidentally at the exact last point we have measured.
This in turn suggests that merely by massively increasing the size of the human population (e.g. by having 1000 Earths, or a million) there would exist a human with substantially better research taste than the best humans today. Perhaps about 10x better, to be precise—requiring 10x fewer resources for AI experiments to make the same amount of algorithmic progress. (This corresponds loosely to the SIAR, which is roughly 10x better than SAR, which is like the best human researcher but faster and cheaper.)
But the space of all possible humans is a tiny region within the space of all possible minds. Indeed, the space of all possible minds accessible in design-space by the army of SIAR geniuses on the datacenters is much bigger than human mindspace—there are more degrees of freedom to experiment with, e.g. brain size, core learning algorithms, core neural net architecture, hybrids of various things…
This suggests that there should be an additional gap between the research taste of a SIAR and what could eventually be reached during an intelligence explosion. How big is this gap? We can reason about inefficiencies in the brain and how much better AI might be able to get on these dimensions. In particular, it seems likely that there are architectures much more efficient than neural networks and learning algorithms much more efficient than the brain’s.
But overall this sort of thing is hard to reason about: our situation is like that of a caveman walking down the beach and into the ocean, noticing that the water gets too deep for him to walk, and then wondering how deep it eventually gets miles away from shore.
Another consideration is that in some sense the progress multiplier has to go to infinity in some domains. Recall that the progress multiplier is relative: Speed of R&D relative to the speed unaided humans would go. A 100x multiplier means that research that would take humans a century, happens in a year. But eventually AIs will be making discoveries that human civilization simply never could have made. (Just as human civilization has made discoveries that chimpanzees simply never could have made.) This naively translates to a progress multiplier of infinity. We aren’t sure how to handle this conceptually, but we bring it up because (if you like) you can think of ridiculously large multipliers like 1,000,000x as really saying something like “In some domains, the ASIs have a progress multiplier of 1,000x. In others, it’s much higher. In others, it’s infinite. The latter are common enough compared to the former that in effect it ‘averages out’ to 1,000,000x.”
Limitations
Due to time constraints, important dynamics that we weren’t able to model include but aren’t limited to:
Uncertainty over AI R&D progress multipliers.
Training and experiment compute increases.
Future work could improve upon these limitations.
We are also excited about other methods of takeoff forecasting which complement the perspective we’ve laid out here.
Appendix
AI R&D progress multiplier clarifications
We recognize that these seem inelegant and encourage others to suggest improvements.
What uses of AI are allowed in the without-AI condition: AIs can only be used to play a “static role” in either (a) the training or design process (e.g. synthetic data generation, distillation) or (b) experiments. It can’t be used to improve the training or design process, or to improve ideation/prioritization or implementation of experiments.
We need to include some uses of AIs, for example allow them to be used as subjects in experiments, as we want to restrict AI productivity increases to activities like those human researchers currently do.
Assume that for the starting conditions (i.e. for SC in Mar 2027), a project’s progress multiplier has conditions set by OpenBrain’s (the fictional leading AGI company in our scenario) human labor supply and compute levels are based on our trend extrapolation projections ignoring research automation. We decided to set things this way in order to preserve the same progress multipliers across different situations, though it trades off against the relevance to the particular situation, so we’re unsure about what’s best to do. More reasoning in a footnote.
For the without-AI condition: The human research labor is as if current growth trends continued for OpenBrain, as is the compute supply. That is, every company has a without-AI condition that is defined based on OpenBrain researchers with historical growth rates. If new human researchers are hypothetically brought in, they are allowed time to adapt to the company and AIs.
For the with-AI condition: The same as above, but with AI assistance of the latest AI systems at the relevant company (i.e. not OpenBrain, but the company for which the progress multiplier is being calculated) allowed.
Restricting to algorithmic progress: We restrict to activities that are primarily aimed at algorithmic progress and limit compute scaling to a small amount (e.g. maybe 1% of FLOP/s, and a total max of scaling the largest training run by 1%), intending for it only be used to experiment with improved algorithms.
Due to (2) above, the AI R&D progress multiplier should not be interpreted as a multiplier on the current human labor in the relevant time period. It’s instead a multiplier on a labor supply projected from what would happen if there were no AI automation.
Keep in mind that different AI capabilities may lead to different sorts of research agendas being pursued. When discussing the progress multiplier, reduce these differences including potential qualitative differences in which agenda are pursued to a single dimension for simplicity.
Modeling “experiments getting harder to find”
In early 2024 Eli coded up a model which directly models the experiment process including modeling an actual pool of experiments, how long each takes to implement, and how valuable they are. More details about the model are here.
We think this approach has some promise, but unfortunately have not found time to explore it in detail.
1.
The interpolation is done exponentially.
2.
Some of the model parameters were chosen by Daniel and others by Eli. Due to time constraints, we weren’t able to independently make estimates for each parameter. To the extent this forecast represents a single person’s view, it’s closer to being Daniel than Eli. That being said, they both find the overall distributions plausible.
3.
We also solicited forecasts from 4 professional forecasters from FutureSearch (bios at http://futuresearch.ai/ai-2027). Their forecasts for SC->SAR and SAR->SIAR are included as footnotes in the table below, and expanded upon in later sections.
4.
These are sometimes different from the projected dates because we have been making adjustments to our timelines and takeoff forecasts beyond when we locked in the progression in the scenario. As you can see, we have enough uncertainty that we find the dates in the scenario nearly as plausible as our medians.
5.
FutureSearch aggregate: Oct 2027 (Mar 2027 to Apr 2029)
6.
FutureSearch aggregate, years from SAR->SIAR conditional on SAR in 2027: 1 year (0.1, 2.1) (more)
7.
The interpolation is done exponentially.
8.
The reason 5% is used is because it’s approximately the fraction that we project AI projects to be spending around the time they reach SC: see the “Research automation” row in our compute supplement.
9.
Note that an “agent” may on average use multiple model copies/instances running at once, e.g. in the simplest case doing best of many tries in parallel.
10.
Technically, we define this and the below milestones are based on 30x the amount of human researchers they would have based on our best projections as of 2025 ignoring research automation, to account for issues around hiring more slowly or firing people due to them being automated substantially changing the milestone. (see clarification (2) below).
11.
We aren’t including experiment bottlenecks or other non-cognitive bottlenecks. when measuring task completion speed.
12.
This definition is similar to that of this report which interviewed AI researchers about AI R&D automation. In our case, we say that all agents are 30x faster than the best researcher, while they discuss one copy of each researcher sped up 30x.
13.
It doesn’t make sense for the definition to use labor rather than log(labor), because this would fit the distribution of AI researcher abilities worse.
14.
We focus on AI R&D capabilities because it is the most relevant for takeoff forecasting, but we believe general capabilities and capabilities in other domains will also be improving rapidly, with varying levels of lag compared to AI R&D-specific capabilities.
15.
This takes into account compute bottlenecks, but not diminishing returns over time due to picking low-hanging fruit (which we prefer not to include in order to isolate the algorithmic progress variable in our takeoff forecasting, as well as the most obvious versions of adjusting being highly sensitive to choices like whether to take the log() of the chosen metric).
16.
The reason 5% is used is because it’s approximately the fraction that we project AI projects to be spending around the time they reach SC: see the “Research automation” row in our compute supplement.
17.
Note that an “agent” may on average use multiple model copies/instances running at once, e.g. in the simplest case doing best of many tries in parallel.
18.
Technically, we define this and the below milestones are based on 30x the amount of human researchers they would have based on our best projections as of 2025 ignoring research automation, to account for issues around hiring more slowly or firing people due to them being automated substantially changing the milestone. (similar rationale to (2) above).
19.
We aren’t including experiment bottlenecks or other non-cognitive bottlenecks. when measuring task completion speed.
20.
They might not be faster and cheaper at literally every task, but they will likely have massive advantages in many mportant respects.
21.
We could also look directly talk about labor, and our best guess would be that the picture stays roughly the same.
22.
If there were no humans to help improve taste their labor is probably still more valuable as the median OpenBrain researcher but less valuable than the 90th percentile, as their engineering skills make up for their taste deficit.
23.
log(15/10)/log(2)
24.
Technically R should go down over time as well, but this shouldn’t be a large effect on short timescales, so I’m assuming it stays the same.
25.
(10+AMR_to_SAR)*2^(2*log((10+AMR_to_SAR)/10)/log(2))-(10+AMR_to_SAR)
26.
Which is about 30x what we project the OpenBrain researcher population to be in 2027.
27.
0.4^(log(30)/log(10))=0.22 Note that after making this estimate we remembered that AIs would not need to sleep or eat or otherwise rest, and can instead work round-the-clock. This means that compared to top humans they will in some sense be not 30x but 60x or 100x faster, arguably. However, this is not as valuable as a literal thinking-time serial speedup for various reasons, and moreover, our survey data was about the 10x less compute hypothetical not a 30x, 60x, or 100x less compute hypothetical, and we are loathe to extrapolate the results too far. (30x feels like not too far, but 100x feels like too far). Eli thinks that overall we should increase the progress multiplier from 7 to ~10 in this section as a result of this consideration, but Daniel thinks we should keep it at 7 for the reasons mentioned above. Also, the Davidson et al survey data explicitly prompted the respondents to take this no-need-to-sleep effect into account, so we don’t need to update the results of Method 3 in light of this consideration.
28.
The results are: 25.6% <20% faster, 37.5% 20-200% faster, 36.8% more than twice as fast
29.
Data points: 1.1, 1.02-1.1, 1.3, >1.5, 1.25, 1.01-1.1
30.
Data points: “A little <0.5”, 0.2, 0.75-0.9, “1x for theory research, 0.5x for empirical research (counting these as separate data points)”, 0.5-0.9
31.
0.4^(log(30)/log(10))=0.22
32.
Square root of 6.25
33.
Estimates: 1.5x, 2-10x, “conservative[ly]” 8-11x, 10x. Meidan of 2-10x and 10x is considered to be geomean(geomean(2, 10), 10)=6.69
34.
((STOCK_THROUGH_SAR+SAR_TO_SIAR_YEARS)*2^(SAR_TO_SIAR_EQUIV_JUMPS*log(((STOCK_THROUGH_SAR+SAR_TO_SIAR_YEARS)/STOCK_THROUGH_SAR)/log(2))-((STOCK_THROUGH_SAR+SAR_TO_SIAR_YEARS)
35.
Eth and Davidson recently published follow-up work on the likelihood of a software intelligence explosion here. While presented in a more polished, way it doesn’t depart much from the core points of the relevant section in the 2023 report.
36.
Epoch estimates r in a similar manner in this paper.
37.
For an example of why making the multiplier more particular to the present situation would result in undesirable results, consider: when there are very superhuman automated researchers there are large AI R&D progress multipliers, the progress multiplier becomes very sensitive to the quantity and quality of the human researchers in the “no AI” case; that is, the progress multiplier changes without changing the AI system. If we allowed differences across companies, we might get very different results depending on the staff size, so if a company let go lots of obsoleted employees their progress multiplier would shoot up. Similarly, if we allowed a company’s progress multiplier to change over time based on their human workforce then it would go up when they fired people. Similar issues happen if we allow the progress multiplier to change based on compute amounts, because AIs are differentially advantaged at utilizing high compute amounts, so an influx in compute would lead to an increased progress multiplier. This is the least bad solution we’ve found to this problem but encourage others’ feedback.
38.
This restriction may be a bit too harsh and difficult to draw clean lines for, though later we will incorporate training compute increases. See also this report which argues that the need for retrain models is unlikely to make the difference between whether we get or don’t get a software-driven intelligence explosion. Additionally, in our scenario there is generally not retraining from scratch above SC, although sometimes there is significant distillation before continuing training with a new paradigm. The FLOPs used for experimentation, training, etc. are constant at the level that was available when the SC was trained.