— Sovereign AI Platform Flagship

Run frontier AI entirely
on your own hardware.

Cube AI is Ultraviolet's flagship platform for private LLM deployment — inference, retrieval, guardrails, governance, and a production workspace, all inside your perimeter. Nothing leaves your network.

Read the docs

Apache 2.0 ·Air-gapped ready ·Model agnostic ·TEE

cube.ultraviolet.rs/dashboard

Cube AI dashboard showing model status, domains, and audit logs

3 models online · 0 bytes egressed

What is Cube AI

The complete operating environment for private, sovereign AI.

Cube AI gives regulated and sovereign organizations everything they need to build and run AI without sending a single byte to a third party. Open models serve from your own GPUs, retrieval runs over your own knowledge bases, and every prompt is policed by your guardrails and recorded in your audit trail.

It is the flagship of the Ultraviolet ecosystem — the product most teams start with, and the one the rest of the stack is built to support.

Private

Every prompt, embedding, and response stays inside your perimeter. Zero outbound by default.

Sovereign

Data residency, compliance, and control governed by your policies and jurisdiction.

Open

Apache 2.0 core. Inspect the source, fork the platform, extend it for your environment.

Portable

One platform across on-prem, air-gapped, sovereign cloud, private cloud, and edge.

How it works

One platform, top to bottom of the stack.

Cube AI bundles everything an organization needs to operate AI privately: model serving, retrieval, policy enforcement, and governance — all running on the same confidential-computing substrate as the rest of Ultraviolet.

vLLM Ollama NeMo Guardrails TEE OpenAI-compatible API RBAC + ABAC

Cube AI

Sovereign AI platform

Inference — vLLM · Ollama

Retrieval — RAG on your data

Guardrails — policy on every call

Governance — audit · RBAC · usage

Hardware TEE — AMD SEV-SNP · Intel TDX

Runs on Cocos AI — TEEs, remote attestation, hardware isolation

Capabilities

Everything you need to
operate AI in production.

Cube AI is not a model — it is the full platform around your models: serving, retrieval, safety, governance, and the interfaces your teams actually use.

Private inference

vLLM and Ollama runtimes serve open models on your own GPUs — no API keys, no egress, no rate limits but your own.

RAG on your data

Generate embeddings inside the environment and connect internal knowledge bases with retrieval that never copies a byte off-premises.

Guardrails & PII redaction

NeMo guardrails, prompt-injection defense, output sanitization, and automatic PII redaction via Microsoft Presidio on every call.

Governance & audit

Full audit trail, role-based access, and per-domain usage accounting built in.

Model management

Hardware TEE

Run the whole stack inside AMD SEV-SNP or Intel TDX enclaves — weights and tensors stay encrypted in use, even from the host.

Multi-tenancy & domains

SuperMQ-backed domain isolation gives every team or tenant a strictly separated workspace with its own identity and policies.

Secure chat

An end-to-end encrypted chat workspace backed by verifiable hardware attestation for everyday private AI.

OpenAI-compatible API

OpenAI-compatible endpoints and SDKs drop Cube AI into the apps and agents you already run.

The platform

A production workspace,
not just an endpoint.

Cube AI ships with the interfaces your teams actually use — operations, model management, safety, audit, and chat. Pick a surface to see it.

cube.ultraviolet.rs/dashboard

Models & infrastructure

The models you choose,
on the hardware you own.

Cube AI is model-agnostic and infrastructure-agnostic. Serve open-weight models through the runtime that fits your workload, on everything from a single GPU to an air-gapped confidential cluster.

Models & runtimes

Open models

LlamaMistralQwenDeepSeekPhiGemma

Embeddings

Nomic EmbedBGECustom

Runtimes

vLLMOllamaHugging Face

Formats

GGUFsafetensorsFine-tunes

Infrastructure

GPUs

NVIDIA A100H100Blackwell

Compute

GPUCPU inferenceEdge

Confidential HW

AMD SEV-SNPIntel TDX

Deploy

On-premAir-gappedSovereign cloudPrivate VPC

Integrations

Plugs into the tools
your developers already use.

Because Cube AI speaks the OpenAI API, it slots into existing developer workflows — IDEs, agents, and SDKs — while keeping every prompt inside your perimeter.

⟶ IDE assistant

Continue

AI-assisted coding inside VS Code and JetBrains, served by your own private Cube models — autocomplete and chat with zero IP leakage.

Read the docs

◳ Agentic coding

OpenCode

Point the OpenCode terminal agent at Cube for confidential, in-perimeter pair-programming on your own infrastructure.

Read the docs

/v1 SDKs & clients

OpenAI-compatible API

Drop-in OpenAI-compatible endpoints mean the OpenAI SDK, LangChain, and custom HTTP clients all work unchanged — just repoint the base URL.

Read the docs

Any OpenAI-compatible client connects — your data never leaves the enclave.

Deploy anywhere

Wherever your data has to live,
your AI can run.

On-premises

Run on your own servers and GPUs, behind your firewall, under your change control.

Same platform · same governance · same audit trail

FAQ

Questions teams ask
before they deploy.

What is Cube AI?

Cube AI is a self-hosted platform for running large language models privately. It bundles inference, retrieval-augmented generation, guardrails, governance, audit, and a production UI into one stack that runs entirely on infrastructure you control.

How is it different from a hosted API like OpenAI or Anthropic?

With a hosted API, your prompts and data leave your organization and are processed in someone else's cloud and jurisdiction. Cube AI runs inside your own perimeter — on-prem, air-gapped, or in a sovereign cloud — so data never leaves, and you own the models, the policies, and the audit trail.

Which models can I run?

Cube AI is model-agnostic. It serves open-weight models such as Llama, Mistral, Qwen, DeepSeek, Phi, and Gemma — plus custom fine-tunes in GGUF or safetensors format — through vLLM, Ollama, and Hugging Face, all from a single control plane.

Where can I deploy it?

Anywhere your data has to live: on-premises behind your firewall, fully air-gapped for classified environments, in an EU or national sovereign cloud, in your own private VPC, or at the edge — with the same governance everywhere.

What are guardrails?

Guardrails are policy rules that sit between your users and the model — inspecting every prompt and response before it is processed or returned. They can block harmful inputs, detect prompt-injection attempts, filter profanity or off-topic requests, and enforce domain-specific rules your organization defines.

How do guardrails work in Cube AI?

Cube AI uses NVIDIA NeMo Guardrails as its policy engine, augmented with Microsoft Presidio for automatic PII detection and redaction. You author guardrail configurations as YAML-based Colang rules through the Cube dashboard — no code required. Rules hot-reload with zero downtime, and every guardrail decision is recorded in the audit trail so you can see exactly what was allowed, blocked, or redacted on every call.

What is a Trusted Execution Environment (TEE)?

A TEE is a hardware-isolated region inside a CPU — such as an AMD SEV-SNP confidential VM or an Intel TDX Trust Domain — where code and data are encrypted in memory at all times. Even the hypervisor and the host OS cannot read the contents. Remote attestation lets you cryptographically verify that the correct, unmodified software is running inside the enclave before trusting it with sensitive data.

How do I enable TEE support in Cube AI?

Deploy Cube AI on a host with AMD SEV-SNP or Intel TDX hardware — available on several cloud providers and on-prem servers. Cube AI runs on top of Cocos AI, Ultraviolet's open-source confidential-computing layer, which handles enclave provisioning, remote attestation, and key management automatically. Once running inside a TEE, every model weight, prompt, and response is encrypted in use, and clients can verify the enclave before sending any data.

Is Cube AI open source?

Yes. The core is Apache 2.0. You can inspect the source, fork it, and run it indefinitely with no vendor lock-in. Commercial support and enterprise features are available from Ultraviolet.

One ecosystem

Part of the Ultraviolet
sovereign AI stack.

Three products, designed to work as one. Each runs on the same confidential-computing foundation, shares the same governance model, and deploys anywhere your data must live.

Sovereign AI Platform

Cube AI

Full-stack private LLM platform — inference, RAG, guardrails, governance, audit.

You are here

Secure AI Collaboration

Prism AI

Run joint AI workloads across organizations without exposing any party's data.

Explore Prism AI

Confidential Computing Foundation

Cocos AI

The open-source TEE abstraction layer the rest of the stack is built on.

Explore Cocos AI

— Get started

Bring frontier AI inside
your perimeter.

Talk to the team about pilots, deployment architectures, and regulated-industry rollouts — on your hardware, on your terms.

Schedule Demo Read the docs

Apache 2.0 · Deploy anywhere · No vendor lock-in

Run frontier AI entirelyon your own hardware.

The complete operating environment for private, sovereign AI.

One platform, top to bottom of the stack.

Everything you need tooperate AI in production.

Private inference

RAG on your data

Guardrails & PII redaction

Governance & audit

Model management

Hardware TEE

Multi-tenancy & domains

Secure chat

OpenAI-compatible API

A production workspace,not just an endpoint.

The models you choose,on the hardware you own.

Plugs into the toolsyour developers already use.

Continue

OpenCode

OpenAI-compatible API

Wherever your data has to live,your AI can run.

On-premises

Questions teams askbefore they deploy.

Part of the Ultravioletsovereign AI stack.

Cube AI

Prism AI

Cocos AI

Bring frontier AI insideyour perimeter.

Run frontier AI entirely
on your own hardware.

Everything you need to
operate AI in production.

A production workspace,
not just an endpoint.

The models you choose,
on the hardware you own.

Plugs into the tools
your developers already use.

Wherever your data has to live,
your AI can run.

Questions teams ask
before they deploy.

Part of the Ultraviolet
sovereign AI stack.

Bring frontier AI inside
your perimeter.