Author: Paul E. Sorvik — architect of CRISM, CRISMOS, and CRISM Watch
Paul E. Sorvik — Principal Investigator · ORCID 0009-0008-5717-7110
Framing
This question and its issues are relevant to the Manifold Relativity Programme (MR) because both CRISM and CAC are used as tools with AI to produce pre-print papers and Q-Day leaves GPUs (and “GPU OS“) as the AI substrate for years afterwards and may eventually replace GPUs (and “GPU OS“).
But “GPU OS” may fill Q-day gaps before Q-day and far longer after Q-day than we can write here (because previous “CAC gated in zero trust” papers identified what GPU OS could deliver before and after Q-day – thus should before could in CAC paper publishing gating is executing zero trust). Thus MR needs to predict and know what GPU OS may deliver as well as when MR tooling can start migrating to GPU OS, and then again when to start to migrating to Quantum after it meets or exceeds MR’s required capabilities and they exceed those of the then current GPU OS.
That reads complicated because it is!
Note: CRISM is explained in detail in the About Page
No OS runs on the GPU as primary compute — that is research (CMU’s LithOS leads). But a “GPU OS” does exist and is consolidating fast at the orchestration / inference layer, where NVIDIA is explicitly claiming the category: Dynamo 1.0 is officially “the inference operating system for AI factories,” alongside DSX OS. The term resolves to GPU-fleet / inference OS (exists), not OS-on-GPU (doesn’t yet).
Fast-moving research on a persistent-kernel + JIT-operator pattern: LithOS (CMU/MSR, SOSP 2025), GPUOS, AgileOS, Concordia. Narrow exception: NVIDIA’s GSP firmware RTOS on an embedded core — firmware, not a general-purpose OS. Nothing ships in production GPUs.
NVIDIA MIG (static partitioning), MPS, vGPU ship — but none gives dynamic fine-grained scheduling + memory virtualization + isolation transparently. LithOS is the research frontier.
Kubernetes + NVIDIA GPU Operator (de facto), Run:ai / KAI, NVIDIA DSX OS (“operating AI factories at scale”), dstack, Slurm. Startups brand it literally — Eldric: “the operating system for the AI datacenter.” No hardware-agnostic GPU-native kernel.
NVIDIA Dynamo 1.0 (GTC Mar 2026, production, Apache 2.0) — disaggregated prefill/decode, KV-aware routing, multi-tier KV cache; officially “the inference operating system for AI factories.” Plus vLLM (PagedAttention ≈ a GPU memory manager), SGLang, Ray Serve, llm-d, TensorRT-LLM.
No unified, hardware-agnostic GPU OS spanning device scheduling (A) → fleet (C) with a coherent process model. Missing: a GPU process abstraction with fine-grained preemption (GPUs lack HW preemption), cluster-wide GPU virtual memory (each HBM is an island), transparent multi-tenancy with hard isolation, and cross-vendor portability (everything is NVIDIA/CUDA-locked — no “POSIX for GPUs”).
The deepest blockers are at the hardware level (preemption, isolation, shared memory) — no software GPU OS fully closes them until the silicon supports OS semantics.
Absorb every layer: Dynamo (D) + DSX OS / Run:ai / KAI (C) + MIG/MPS (B) + driver/GSP (A) + DRA donated to Kubernetes. Most likely to deliver the first coherent “GPU OS” — but closed and vendor-locked. Front-runner by execution, not by openness.
An open, replaceable device-side OS (LithOS / GPUOS-style) interposing at the driver boundary becomes the standard layer — structurally like Linux replacing UNIX. Highest strategic value; research-stage; NVIDIA may absorb or enclose it.
K8s + GPU Operator + DRA + Grove + KAI become the de-facto cross-vendor fleet OS from the top down. Vendor-neutral-ish and shipping now — but treats the GPU as an opaque device; lacks device-level OS semantics.
An open standard abstraction (oneAPI/SYCL, MLIR, UXL Foundation) unifies NVIDIA / AMD / Intel / custom silicon. The only real portability route — and the slowest, governance-heavy and against the incumbent’s incentives.
Security-first route: confidential-computing / multi-tenant-isolation demand (regulation) forces the abstraction. Hardware-enabled route: vendors add HW preemption + CXL shared memory, and the software OS follows. Net: NVIDIA’s vertical stack will likely be the first thing called a “GPU OS” (largely already), but the open device-layer is the more consequential battle, and a cross-vendor standard is the only true-portability endgame.
Why it’s all hard: GPUs are built for throughput (latency hidden by parallelism, no fast context switch); OS primitives — interrupts, preemption, blocking sync, fairness, QoS — are about latency, responsiveness and arbitration, the opposite design point. Every primitive fights the GPU’s reason for existing. This mechanism layer sits under the dependencies above, and is what neither hardware-altitude nor software-altitude analyses engage.
Under a captive ~$295B domestic buildout (≈80% to domestic vendors), NVIDIA’s China AI-GPU share has reportedly collapsed toward zero and domestic chips passed 40% of shipments. China is pursuing “make the accelerator THE computer” hardest, freed from legacy constraints — the “sanctions paradox” accelerating indigenous innovation.
Missing in China: advanced nodes (sanctions), mature drivers, a true GPU-native OS (the gap everyone shares). Also building: EU (SiPearl / EPI), Japan (PEZY), Korea (Samsung, SK Hynix HBM). This entire dimension was absent from the first run despite two Chinese-lineage nodes — recovered only when the prompt explicitly probed geography.
Qwen (grounded): the persistent-kernel pattern at layer (A) is the real locus-of-control fight — a possible “Linux replaces UNIX” moment at the seam NVIDIA’s closed driver owns. The consensus treats Dynamo as the story; the deeper battle is the device-side execution model.
DeepSeek (training): the GPU is already a tripartite OS (GSP firmware + driver + CUDA runtime). People don’t want a GPU OS to exist — they want an open, portable, multi-vendor one. The question is openness, not existence.
Mistral (training): security / isolation is the under-weighted driver — the first true GPU OS may emerge for confidential computing and multi-tenant isolation (TEEs, GPU page tables), not performance.
Same grounding-split as the Q-day run: web-grounded nodes (Claude, Qwen) surfaced the specific 2025–26 systems (Dynamo, DSX OS, LithOS, GPUOS, Eldric); training-knowledge nodes (Llama, Mistral, DeepSeek) missed the current products — Llama offered only generic examples. But the un-grounded nodes added orthogonal conceptual angles (tripartite-openness, security) the grounded ones under-emphasized. Keep at least one live-grounded node in every fold.