Life After AI

Every general-purpose technology reaches a point where it stops being the subject and becomes part of the foundation. The web passed through it, as did virtualization, the relational database, and cloud infrastructure. Few people now describe a product as running “in the cloud” as though that were the noteworthy fact about it. The capability has been absorbed into the baseline of how systems get built. AI is approaching the same threshold. The models keep working while the announcements that a company “uses AI” grow quieter, because the claim no longer separates one organization from another.

This is life after AI. After the hype fades and the construction crews drive away from the data center sites and the term “AI” is preceded in some fashion by “of course.” The technology doesn’t recede, but it loses its exceptional status and settles into the ordinary machinery of computing.

Practical use cases expand rather than contracts. Models draft and review documents, translate, summarize, classify images, generate and refactor code, and sit behind search and customer support. That use case is attracting capital. Menlo Ventures (2025), a venture firm with direct holdings in the model providers it ranks, put enterprise generative-AI investment near $37 billion in 2025, roughly triple the year before.

The 2025 time frame also produced the opposite headline. A report from MIT’s Project NANDA (2025) found that about 95% of organizations had seen no measurable profit-and-loss impact from their generative-AI deployments. That contrast is instructive rather than contradictory. It shows that investment is broad, measurable production value remains concentrated, and the distance between those conditions describes where the technology stands today. Gartner (2025) placed generative AI in its trough of disillusionment during 2025, the stage where early enthusiasm gives way to the slower work of making something hold up in production.

Frontier models won’t disappear in this picture, but will remain where the task justifies the cost, such as hard reasoning, research, synthetic-data generation, distillation, and escalation from smaller systems that have reached their limit. Around that core, a different arrangement is forming. A position paper from NVIDIA researchers argues that small language models are sufficient and more economical for the narrow, repetitive calls that make up most agentic work, and that heterogeneous systems invoking several models are the natural design (Belcak et al., 2025). The infrastructure has begun to assume this. IDC (2025) describes model routing, the practice of directing each request to the model that fits it on cost, latency, and complexity, as becoming standard, and projects that by 2028 roughly 70% of the most AI-driven enterprises will route across multiple models rather than commit to one.

While this shift is real, it is also uneven. Frontier revenue is still climbing steeply and Epoch AI (2026) tracked the leading labs multiplying their annualized revenue several times over inside a single year. Enterprises, for their part, have concentrated on a small number of closed frontier providers, and Menlo Ventures (2025) found enterprise adoption of open-weight models falling rather than rising. The movement toward small, fit-for-purpose models looks today like a developer and cost-control practice more than an enterprise migration trend, but both conditions exist at once, and the balance between them is shifting.

In production, AI occupies a position among the other components, alongside the database, queue, object store, and container. The decision is no longer whether to adopt AI, but rather which model serves a given task under constraints on cost, latency, security, and audit. Where the model runs has similar consideration. Latency budgets, data-residency rules, and sector regulation push some workloads onto private clouds, on-premises hardware, or the edge, while others stay with hosted frontier endpoints (IDC, 2025). The decisions around AI are starting to resemble ordinary capacity planning.

Early personal-computer software treated 640 kilobytes of memory as a hard wall and was engineered around it. Large context windows in LLMs invite the reverse assumption, that enough capacity removes the need to manage what the model sees, but evidence suggests otherwise. Liu et al. (2024) showed that models attend most reliably to information at the beginning and end of their input and lose track of material in the middle, a pattern that held across model families and was not resolved by enlarging the window. Hong et al. (2025), testing eighteen current models, found that performance grows less reliable as input length increases, even on simple retrieval and repetition tasks.

The context window is best seen as a boundary to budget against. Working systems manage it deliberately by retrieving what a step needs, summarizing what has passed, holding structured state outside the prompt, calling tools, logging, and escalating when a cheaper path runs out. Context treated this way bears a resemblance to memory management and it returns AI to the discipline of systems engineering.

Once a model is in production, the problems around it lose their novelty. Latency and cost control reassert themselves. So do orchestration, caching, retrieval strategy, data quality, governance, failure handling, security, and maintenance. Data readiness in particular tends to set the ceiling on what can be accomplished, which is the unglamorous reason many pilots stall short of production. None of this is specific to AI. It is the standard work of putting a capability into a system that people depend on.

The longer the work continues, the less singular AI looks in the technology landscape. The web stopped being a destination and became the foundation that software is assumed to sit on. AI seems to be evolving along a similar path. It is transformative in aggregate, increasingly ordinary in practice, and judged in the end by whether its output is appropriate for a given decision. That evolution gradually shifts the discussion from what AI can do to where it belongs.

References

Belcak, P., Heinrich, G., Diao, S., Fu, Y., Dong, X., Muralidharan, S., Lin, Y. C., & Molchanov, P. (2025). Small language models are the future of agentic AI (arXiv:2506.02153). arXiv. https://arxiv.org/abs/2506.02153

Epoch AI. (2026, February 19). Anthropic could surpass OpenAI in annualized revenue by mid-2026. https://epoch.ai/data-insights/anthropic-openai-revenue

Gartner. (2025, August 5). Gartner Hype Cycle identifies top AI innovations in 2025 [Press release]. https://www.gartner.com/en/newsroom/press-releases/2025-08-05-gartner-hype-cycle-identifies-top-ai-innovations-in-2025

Hong, K., Troynikov, A., & Huber, J. (2025). Context rot: How increasing input tokens impacts LLM performance [Technical report]. Chroma. https://research.trychroma.com/context-rot

IDC. (2025, December 11). The future of AI is model routing. https://www.idc.com/resource-center/blog/the-future-of-ai-is-model-routing/

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024). Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12, 157–173. https://doi.org/10.1162/tacl_a_00638

Menlo Ventures. (2025, December 9). 2025: The state of generative AI in the enterprise. https://menlovc.com/perspective/2025-the-state-of-generative-ai-in-the-enterprise/

Project NANDA. (2025). The GenAI divide: State of AI in business 2025. MIT Media Lab.

Header image: Kevin Savetz, CC BY 2.0 https://creativecommons.org/licenses/by/2.0, via Wikimedia Commons