NeuReality launches NR-NEXUS for AI inference scale
NeuReality has launched NR-NEXUS, an inference operating system for artificial intelligence workloads aimed at organisations running AI in production.
The launch reflects a broader shift in the AI market from model training to inference, the stage at which trained models are used in live systems. As AI moves into customer and business applications, organisations are facing rising costs, operational complexity and fragmented infrastructure.
NR-NEXUS is designed as a software layer that sits across the inference stack, managing and orchestrating workloads across different types of compute resources. NeuReality says this approach allows users to work across existing infrastructure without major changes to deployed systems.
Shift To Inference
For the past several years, much of the industry's attention has focused on training large language models and other foundation models. That focus is now shifting to the economics and operational demands of serving those models continuously in real-world settings.
NeuReality describes this phase as the rise of AI "token factories", where infrastructure is tuned to generate outputs on an always-on basis rather than build models through training runs. These environments must handle uneven demand, uptime requirements, and deployment across a range of locations and hardware types.
"AI inference is rapidly becoming one of the largest computing markets in the world, yet the infrastructure stack around it remains fragmented," said Moshe Tanach, CEO of NeuReality.
"With NR-NEXUS, we are defining the operating system for AI token factories - enabling organizations to run and scale inference workloads efficiently across GPUs, emerging XPUs, hyperscalers, and dedicated AI clusters. As open-source models and AI-native applications proliferate, operators need infrastructure that gives them flexibility rather than lock-in," Tanach said.
Mixed Hardware
A central part of the product is support for heterogeneous computing environments. NR-NEXUS can operate across CPUs, GPUs and other accelerators, reflecting a market in which organisations often use a mix of hardware rather than rely on a single architecture.
This matters because companies deploying AI services must balance cost, availability and performance while also dealing with supply constraints and existing procurement decisions. A software layer that spans multiple hardware types may help them avoid dependence on a single vendor-specific stack.
The platform is designed to manage workloads dynamically and optimise how tasks are executed across available resources. Its stated aim is to increase utilisation, maintain predictable service levels and lower the cost per output as inference volumes grow.
Production Demands
Inference workloads can be difficult to manage because demand often shifts sharply over time. That can leave expensive hardware idle at some moments and overstretched at others, even as customers and internal users expect low latency and consistent service.
As more companies move AI projects from pilot programmes into operational systems, those pressures are becoming more visible. In that environment, infrastructure tools that standardise deployment and scheduling are drawing interest from buyers seeking to reduce integration work.
NeuReality says NR-NEXUS is intended to help organisations move from fragmented AI deployments to a more standardised and scalable operating model. The software is currently in beta.
Advisory Appointment
Alongside the launch, NeuReality has appointed Shalini Agarwal, a former Google AI leader, as an adviser. She will work with the leadership team on product strategy as the company expands NR-NEXUS for organisations deploying AI at scale.
"Shalini has spent her career bringing AI and software products to millions of users," Tanach said. "Her experience at the application layer will be instrumental in building NR-NEXUS to serve real production AI applications."