AI’s next phase will not be defined by better answers alone.
It will be defined by systems that can act with context, perceive with depth, and model the world they are asked to change.
The next AI question is not only what models know
The AI conversation is starting to move beyond the chatbot interface.
For the past few years, the most visible form of AI has been linguistic. People typed questions, models produced answers, and the industry measured progress through reasoning, coding, writing, summarization, and search. That phase is not over. Language intelligence is still becoming more useful, more embedded, and more commercial.
But it is no longer the whole story.
The more important question now is what happens when AI systems do not only generate responses, but begin to use tools, manage workflows, understand space, and reason about the physical world. That is where the industry is starting to turn: from language to action, from text to interfaces, from static answers to dynamic environments.
That transformation has only become clearer since HumanX. Global AI spending is now being pulled by infrastructure and agentic tools. The point is not simply that another model became available. It is that agents are becoming part of the enterprise stack.
This is why the HumanX conversations in San Francisco still matter. The event has passed, but it captured a transition that is becoming more visible now: AI is moving from systems that talk to systems that act, and from models that process language to models that need some understanding of the world.
“AI went from being able to answer questions to now being able to do things.”
– Jensen Huang
The third wave of AI
Jensen Huang framed the evolution directly. AI, he argued, is much broader than large language models. Language is one form of encoded information, but information is also encoded in genes, proteins, chemicals, physics, tools, software, and environments. Wherever there is structure, AI can learn to represent it.
That framing matters because it moves AI away from being understood as a single category. Chatbots are important, but they are only one expression of a much larger technology. The deeper shift is that AI is becoming a way to represent, predict, and act across domains.
Huang described the current moment as the beginning of a third wave. The first wave of modern AI was generative: models that could translate one form of information into another. The second wave was reasoning, where models became more grounded and useful. The third wave, in his view, is agentic.
“What’s happening now is that AI went from being able to answer questions to now being able to do things,” he said.
That is a concise way to describe the new center of gravity. The prompt is no longer only a question. Increasingly, it is a request for action: build something, analyze something, use these tools, access these files, iterate until the work is done.
Huang’s most useful phrase may have been even simpler: “AI is software that uses software.”
That idea changes the meaning of the application layer. The software industry was built around tools used by humans. Word processors, spreadsheets, design suites, enterprise systems, developer tools, CRMs, ERPs, and analytics platforms were designed for people sitting in front of screens. If AI agents become tool users, the number of users of software expands dramatically.
The result is not just more automation. It is a reinvention of how software itself is consumed.
Coding to manage agents
The OpenAI/Codex conversation at HumanX made the same transition visible from inside software engineering.
Srinivas Narayanan described coding tools as having moved from assistance to agency. Engineers are no longer only using AI to autocomplete functions or explain code. They are guiding systems that generate software, review software, and fix bugs. In his words, the job has become “primarily not writing software, but managing agents.”
That line connects directly to what is happening across knowledge work. Coding is the first domain where this agentic pattern has become highly visible because software is unusually verifiable. Tests can be written. Repositories are bounded. Bugs can be reproduced. Outputs can be checked.
But the deeper claim is that coding may be a preview of other forms of work. Narayanan described Codex and coding models as becoming an underlying harness for many kinds of knowledge work, from legal and financial workflows to business-process automation.
That is where agents become more than developer tools. They become a general work interface. If they can manipulate files, access systems, use applications, and operate within guardrails, the same primitives that make them useful for code can make them useful for other workflows.
The limitation is not imagination. It is context, safety, and access. Does the agent know the company’s systems- Does it understand the workflow- Does it have the right permissions- Can it be monitored- Can it be trusted when agents begin interacting with other agents-
Those questions are why the agentic future is not only a model race. It is an infrastructure, governance, and interface problem.
The move from words to worlds
If Jensen and OpenAI showed the shift from answering to acting, Fei-Fei Li pushed the conversation toward another frontier: spatial intelligence.
Her argument was not that language intelligence is finished. In fact, she made clear that language models will remain critical. But human intelligence is not only linguistic. We understand the world through space, movement, objects, bodies, geometry, interaction, and time. For machines to become more useful in physical and virtual environments, they need some version of that spatial understanding.
Li described the absence of this awareness as intelligence in the dark. The moment animals became aware of their bodies and their relationship to the world, she said, intelligence evolved rapidly. For AI, the implication is that seeing and reasoning about the world is not an accessory to intelligence. It is central to it.
Her definition of a world model was precise: a system that can understand space, reason about geometry, interactivity, physics, and dynamics, and eventually generate 3D and 4D space just as today’s computers generate words.
That is a different ambition from making a better chatbot. It points toward systems that can create training environments for robots, help design experiences, support healthcare imaging, power virtual worlds, and model the next state of a physical environment.
World Labs’ Marble, which Li discussed on stage, is an early expression of that direction: a generative model for true 3D-consistent worlds. The point is not only that such worlds can be generated. It is that they can become environments for other systems to learn, test, simulate, and act.
The next phase is action plus world understanding
Taken together, the HumanX conversations suggested that the next phase of AI will not be defined by one interface.
Agents need tools. Enterprises need guardrails. Software needs context. Robotics needs spatial data. Video models need temporal understanding. World models need compute, new architectures, and training environments that do not yet exist at internet scale.
The common thread is that AI is moving closer to work and closer to the world. It is no longer enough for models to produce plausible language. They need to take action, operate software, understand environments, and generate outputs that can be verified, used, and trusted.
This is also why San Francisco remains such a useful lens. HumanX was not only a gathering of AI executives and founders. It was a snapshot of the industry’s next argument: the frontier is moving from words to workflows, and from workflows to worlds.
That does not make language less important. It makes it part of a larger system.
The first mass-market AI experience was conversation. The next one may be delegation. After that, it may be simulation: agents that do work inside environments they can understand, model, and change.
That is the real significance of the moment HumanX captured. AI’s next phase is not just more intelligent answers. It is systems that can act with context, perceive with depth, and eventually reason about the world they are asked to change.





