CRISM and GPU OS

Author: Paul E. Sorvik — architect of CRISM, CRISMOS, and CRISM Watch

Paul E. Sorvik — Principal Investigator · ORCID 0009-0008-5717-7110


Framing

This question and its issues are relevant to the Manifold Relativity Programme (MR) because both CRISM and CAC are used as tools with AI to produce pre-print papers and Q-Day leaves GPUs (and “GPU OS“) as the AI substrate for years afterwards and may eventually replace GPUs (and “GPU OS“).

But “GPU OS” may fill Q-day gaps before Q-day and far longer after Q-day than we can write here (because previous “CAC gated in zero trust” papers identified what GPU OS could deliver before and after Q-day – thus should before could in CAC paper publishing gating is executing zero trust). Thus MR needs to predict and know what GPU OS may deliver as well as when MR tooling can start migrating to GPU OS, and then again when to start to migrating to Quantum after it meets or exceeds MR’s required capabilities and they exceed those of the then current GPU OS.

That reads complicated because it is!

Note: CRISM is explained in detail in the About Page

CRISM — GPU OS Landscape
5-node fold · opened re-run · 25 Jun 2026
Revocable verdict

No OS runs on the GPU as primary compute — that is research (CMU’s LithOS leads). But a “GPU OS” does exist and is consolidating fast at the orchestration / inference layer, where NVIDIA is explicitly claiming the category: Dynamo 1.0 is officially “the inference operating system for AI factories,” alongside DSX OS. The term resolves to GPU-fleet / inference OS (exists), not OS-on-GPU (doesn’t yet).

The four senses
(A) Device-side GPU OS — runs on the silicon itself research only

Fast-moving research on a persistent-kernel + JIT-operator pattern: LithOS (CMU/MSR, SOSP 2025), GPUOS, AgileOS, Concordia. Narrow exception: NVIDIA’s GSP firmware RTOS on an embedded core — firmware, not a general-purpose OS. Nothing ships in production GPUs.

(B) Host-side GPU resource manager partial · coarse

NVIDIA MIG (static partitioning), MPS, vGPU ship — but none gives dynamic fine-grained scheduling + memory virtualization + isolation transparently. LithOS is the research frontier.

(C) Datacenter GPU fleet OS exists · fragmented

Kubernetes + NVIDIA GPU Operator (de facto), Run:ai / KAI, NVIDIA DSX OS (“operating AI factories at scale”), dstack, Slurm. Startups brand it literally — Eldric: “the operating system for the AI datacenter.” No hardware-agnostic GPU-native kernel.

(D) AI / inference operating layer hottest · exists

NVIDIA Dynamo 1.0 (GTC Mar 2026, production, Apache 2.0) — disaggregated prefill/decode, KV-aware routing, multi-tier KV cache; officially “the inference operating system for AI factories.” Plus vLLM (PagedAttention ≈ a GPU memory manager), SGLang, Ray Serve, llm-d, TensorRT-LLM.

Biggest gap (consensus)

No unified, hardware-agnostic GPU OS spanning device scheduling (A) → fleet (C) with a coherent process model. Missing: a GPU process abstraction with fine-grained preemption (GPUs lack HW preemption), cluster-wide GPU virtual memory (each HBM is an island), transparent multi-tenancy with hard isolation, and cross-vendor portability (everything is NVIDIA/CUDA-locked — no “POSIX for GPUs”).

Dependencies — what gates a true unified GPU OS
Hardware fine-grained preemption — GPUs can’t interrupt a running kernel; needed for a real process modelopen · HW
GPU process abstraction + memory isolation — per-process address spaces, page tables, protectionresearch
Cluster-wide GPU virtual memory — one address space / paging across HBM islands (CXL may help)open
Open device-side scheduling interface — break the closed-driver gatekeeper (persistent-kernel pattern)research · contested
Cross-vendor abstraction (“POSIX for GPUs”) — portability across CUDA / ROCm / oneAPI / custom siliconmissing
Transparent multi-tenancy + hard isolation — secure sharing; TEEs / confidential computingpartial
Checkpoint / restore + live migration + fault tolerance — move a running GPU job; survive node lossresearch
Fleet telemetry, scheduling & fault handling — DCGM, KAI, topology-aware placementmaturing

The deepest blockers are at the hardware level (preemption, isolation, shared memory) — no software GPU OS fully closes them until the silicon supports OS semantics.

Options / paths to get there (ranked)
NVIDIA vertical integration (incumbent) #1 · ships first

Absorb every layer: Dynamo (D) + DSX OS / Run:ai / KAI (C) + MIG/MPS (B) + driver/GSP (A) + DRA donated to Kubernetes. Most likely to deliver the first coherent “GPU OS” — but closed and vendor-locked. Front-runner by execution, not by openness.

Open persistent-kernel device layer — the “Linux moment” #2 · highest leverage

An open, replaceable device-side OS (LithOS / GPUOS-style) interposing at the driver boundary becomes the standard layer — structurally like Linux replacing UNIX. Highest strategic value; research-stage; NVIDIA may absorb or enclose it.

Kubernetes-up (orchestration-first) #3 · pragmatic

K8s + GPU Operator + DRA + Grove + KAI become the de-facto cross-vendor fleet OS from the top down. Vendor-neutral-ish and shipping now — but treats the GPU as an opaque device; lacks device-level OS semantics.

Cross-vendor standard (“POSIX / oneAPI for GPUs”) #4 · true-portability endgame

An open standard abstraction (oneAPI/SYCL, MLIR, UXL Foundation) unifies NVIDIA / AMD / Intel / custom silicon. The only real portability route — and the slowest, governance-heavy and against the incumbent’s incentives.

Wildcards & convergent read

Security-first route: confidential-computing / multi-tenant-isolation demand (regulation) forces the abstraction. Hardware-enabled route: vendors add HW preemption + CXL shared memory, and the software OS follows. Net: NVIDIA’s vertical stack will likely be the first thing called a “GPU OS” (largely already), but the open device-layer is the more consequential battle, and a cross-vendor standard is the only true-portability endgame.

OS primitives & HBM orchestration — the missing middle
Interrupts / async events — page faults still route out to the CPU driver; no on-GPU interrupt handlingmissing
Semaphores / sync — warp barriers, atomics, HW semaphores exist, but spin/busy-wait; no cheap block-and-yield, priority inheritance or fairnesspartial
HBM allocation + paging — cudaMalloc / pools / UVM (CPU-driven); no demand paging, tiering, compaction, OOM arbitrationpartial
HBM bandwidth QoS — capacity partitions (MIG) exist; bandwidth contention is unarbitrated (no cgroups-for-HBM)open · sharpest
Scheduling + priorities — coarse stream priority + MIG; no fine-grained preemptive / real-time scheduler (kernel-boundary, ms-scale)partial
MMU / protection / isolation — page tables + MIG partitions, but no per-process protected address spaces at CPU granularitypartial

Why it’s all hard: GPUs are built for throughput (latency hidden by parallelism, no fast context switch); OS primitives — interrupts, preemption, blocking sync, fairness, QoS — are about latency, responsiveness and arbitration, the opposite design point. Every primitive fights the GPU’s reason for existing. This mechanism layer sits under the dependencies above, and is what neither hardware-altitude nor software-altitude analyses engage.

Geography — who leads outside the US (China)

Under a captive ~$295B domestic buildout (≈80% to domestic vendors), NVIDIA’s China AI-GPU share has reportedly collapsed toward zero and domestic chips passed 40% of shipments. China is pursuing “make the accelerator THE computer” hardest, freed from legacy constraints — the “sanctions paradox” accelerating indigenous innovation.

Huawei Ascend · Da Vinci NPU · EulerOS · CloudMatrix 384 rack-supernode Moore Threads · Huagang · H100-class 2026 · 6nm Biren · HK IPO · 7nm Cambricon · Hygon · Kunlun Loongson · LoongArch 3C6000 + GPU Alibaba T-Head · XuanTie RISC-V HarmonyOS NEXT · non-Linux microkernel SMIC 7nm (no EUV) · CXMT domestic HBM

Missing in China: advanced nodes (sanctions), mature drivers, a true GPU-native OS (the gap everyone shares). Also building: EU (SiPearl / EPI), Japan (PEZY), Korea (Samsung, SK Hynix HBM). This entire dimension was absent from the first run despite two Chinese-lineage nodes — recovered only when the prompt explicitly probed geography.

Non-obvious angles (decorrelated coverage)

Qwen (grounded): the persistent-kernel pattern at layer (A) is the real locus-of-control fight — a possible “Linux replaces UNIX” moment at the seam NVIDIA’s closed driver owns. The consensus treats Dynamo as the story; the deeper battle is the device-side execution model.

DeepSeek (training): the GPU is already a tripartite OS (GSP firmware + driver + CUDA runtime). People don’t want a GPU OS to exist — they want an open, portable, multi-vendor one. The question is openness, not existence.

Mistral (training): security / isolation is the under-weighted driver — the first true GPU OS may emerge for confidential computing and multi-tenant isolation (TEEs, GPU page tables), not performance.

Decorrelation finding

Same grounding-split as the Q-day run: web-grounded nodes (Claude, Qwen) surfaced the specific 2025–26 systems (Dynamo, DSX OS, LithOS, GPUOS, Eldric); training-knowledge nodes (Llama, Mistral, DeepSeek) missed the current products — Llama offered only generic examples. But the un-grounded nodes added orthogonal conceptual angles (tripartite-openness, security) the grounded ones under-emphasized. Keep at least one live-grounded node in every fold.

Quarantined / flagged
  “Dynamo is a GPU OS” — partly aspirational branding; it’s an inference-serving OS specific to LLM inference, not general-purpose GPU compute. May not generalize if workloads shift.
  Yantra / neurOS / coconutOS — unverified conceptual/hobby projects (Qwen flagged these itself). Quarantined as specifics.
  “A general-purpose OS runs on the GPU today” — rejected; only GSP firmware and research prototypes exist.
CRISM run, 25 Jun 2026 — grounder Claude (live web) + dreamers Llama 4 Scout, Qwen3.7 Max (Qwen live web) + grounders Mistral Large 3, DeepSeek V4 Pro (via OpenRouter). Folded with trust weighting and quarantine. Verdict is revocable. Why it matters: with Q-Day not replacing GPUs for AI initially, this layer is the near-term battleground — and NVIDIA is moving to own it end-to-end.

Leave a comment