Nvidia on Tuesday launched a multimodal open model that combines vision, speech and language, aiming to help enterprises save time with agents that can provide faster, smarter responses by reasoning across modalities.

Nemotron 3 Nano Omni is the vendor's latest iteration of its open source family of models. The model removes the need for separate perception models for video, audio, image and text. It combines vision and audio encoders (neural network modules that process complex inputs and capture the most key features of the data) within its 30B mixture-of-experts architecture. This combination enables the AI system to achieve higher throughput than Nvidia’s other Omni models, leading to lower costs and better inference efficiency, the vendor said.

The model is another way Nvidia is trying to extend its dominance in AI hardware into models and services. While the vendor currently leads the AI market in hardware with its ubiquitous GPUs, emphasizing its Nemotron open models may help it to remain on top, especially as its biggest customers -- including Google, Microsoft and AWS -- have their own chips and are ramping up production. Other customers, such as OpenAI, are partnering with Nvidia competitors like Cerebras and Broadcom, and some foreign customers, notably DeepSeek, are shifting to local chipmakers such as Huawei.

Related:China Moves to Block Meta’s $2B Acquisition of AI Startup

“This is happening at the backdrop of Nvidia's biggest customers doing everything they can to eat away at the margins that Nvidia is making in hardware right now," said David Nicholson, an analyst at Futurum Group. “Over the long haul, they're not going to be able to maintain the hardware margins that they have right now."

However, Nvidia is also trying to help enterprises be more efficient by helping agents understand context across modalities. The vendor is promising a system that integrates diverse files and methodologies, making it easier for enterprises to build agents.

“The idea that we’re going to give you this environment where when you create an agent, it will automatically understand how to communicate with all of these other pieces of the entire infrastructure stack," Nicholson said. “It's one step further in the direction of an intelligently engineered system that delivers efficiency that is hard to get when you don't have control over all the components."

Being Efficient

The model can work next to proprietary models and other Nemotron open models to power agentic workflows such as computer use agents, document intelligence and audio and video understanding. With computer use agents, Nemotron 3 Nano Omni powers the perception loop for agents navigating the computer screen and reasoning about its content. 

Nvidia Nemotron 3 Nano Omni Powers Enterprise AI Agents

Related:AWS Bets on Frontier Agents as the Next Era of Enterprise AI

With document intelligence, the model can interpret documents, charts, tables, and screenshots, and reason over both visual and textual content. With audio and video understanding, the model maintains the context of both modalities within a single reasoning stream.

Obstacles

The challenge, though, is that it is unclear whether Nvidia envisions this model or system for a specific enterprise size and whether its hyperscale customers will benefit from using it.

Nicholson noted that some Nvidia customers have their own accelerators. “I don't know if Nvidia is thinking that this is going to be a hyperscale cloud provider strategy that they'll be able to use."

Moreover, while the model is open source and Nvidia provided weights, training techniques and training sets, it is unclear if enterprises outside the Nvidia stack environment will use it.

“That's not very likely," Nicholson said. "Most of this will be deployed within an entire Nvidia stack environment."

Nevertheless, developers will still experiment with the model, said Chirag Shah, a professor at the University of Washington's Information School.

Related:Gemini Agent Platform Tackles Enterprise Deployment Challenges

“When you make something like this open source, it makes all those developers quickly try it out, start integrating into their existing solutions, and when it works well, they're going to want to use Nvidia as their infrastructure partner," he said.