modulexai is the ModuleX-managed model provider — the default option for every surface that calls a language model. You do not supply or manage any provider key: ModuleX provisions the upstream keys for you and meters usage in credits against your plan allowance and wallet. Managed models are available out of the box on the Chat model selector, the Assistant, the LLM node, the Agent node, and the AI Composer.
The alternative is BYOK (bring your own key): you connect a first-party provider account and that vendor bills you directly, with no ModuleX credit charge. This page covers managed models end to end — the auth model, the curated catalog, exactly what each call costs in credits, and how to switch a selection to BYOK. For how managed-versus-BYOK selection works across all providers, see Connect LLM providers.
🎬 MEDIA PLACEHOLDER · MX-MEDIA-4100 · [SCREENSHOT]
[SCREENSHOT_DESCRIPTION]: The ModuleX-managed (
modulexai) entry in the model selector.
[SCREENSHOT_DETAILS]: Capture the model picker in chat (or Assistant settings) showing the ModuleX-managed models grouped together — for example Claude Sonnet 4.6, GPT 5.5, Claude Opus 4.6 — with no API-key/connect prompt, signalling they are ready to use. Light theme, 16:9, crop to the dropdown.How managed models work
A managed model is one whose provider wire name ismodulexai. When you pick such a model, the request runs through ModuleX-provisioned upstream keys and the call is costed in credits:
You select a model id
Choose a
modulexai model id (for example claude-sonnet-4.6 or gpt-5.5) anywhere a model can be picked. No credential is required — modulexai is exempt from credential validation because it uses ModuleX’s own model-level keys.ModuleX resolves and routes the model
ModuleX maps the model id to its upstream wire model and request settings. Several managed models are routed through an aggregator with a Bedrock-then-Anthropic fallback order; you select by ModuleX
model_id and the routing is handled for you. Deprecated ids are resolved to their replacement automatically — see Deprecated models route automatically.The billing gate admits and meters the call
Every managed turn passes through the usage gate: it reserves credit before any work, charges on success, and denies with a
DenialEnvelope on 402 / 403 / 429 when your allowance and wallet are exhausted. See Credit cost.🎬 MEDIA PLACEHOLDER · MX-MEDIA-4101 · [IMAGE]
[IMAGE_DESCRIPTION]: Request path for a managed (
modulexai) model call versus a BYOK call.
[IMAGE_DETAILS]: Show two lanes from a single “model call” box. Managed lane: ModuleX-provisioned key, routed through the aggregator, passing the credit usage gate (reserve to charge to settle) and metered in credits. BYOK lane: your own credential, routed straight to the upstream vendor, billed by the vendor, not credited. Label which lane consumes credits. 16:9, light and dark variants, ModuleX brand palette, no UI chrome.No key required: the modulex_key auth schema
Managed access uses a single auth schema, modulex_key. Unlike BYOK providers, it has no fields for you to fill in — its setup_environment_variables array is empty. ModuleX supplies the upstream key automatically and tracks usage against your subscription plan’s monthly credit allowance.
modulex_key — the managed-key scheme. ModuleX provides the API keys automatically; no key configuration is needed.ModuleX Managed Key.Empty (
[]). There is nothing to enter to use managed models.Because
modulexai needs no credential, you do not call POST /credentials for it. Managed models are usable as soon as your organization is on any plan (Free included). You still need to be an owner or admin to manage the integration catalog or change defaults; the member role is retired.Available models
ModuleX serves the managed model list from the catalog atGET /integrations/llm-providers/modulexai. The catalog is read-only metadata used by the model selectors and the builder — it is not the execution path, and it never returns or stores a secret key. The catalog read is cached for up to 10 minutes.
The active models below are the ones ModuleX currently advertises for modulexai. There are six deprecated bedrock-* ids in addition; see Deprecated models.
Chat and reasoning models
| Model id | Display name | Upstream | Max input | Max output | Vision | Input $/1M | Output $/1M |
|---|---|---|---|---|---|---|---|
claude-sonnet-4.6 | Claude Sonnet 4.6 | Anthropic | 200,000 | 64,000 | Yes | $3.00 | $15.00 |
claude-opus-4.6 | Claude Opus 4.6 | Anthropic | 200,000 | 128,000 | Yes | $5.00 | $25.00 |
claude-haiku-4.5 | Claude Haiku 4.5 | Anthropic | 200,000 | 64,000 | Yes | $0.80 | $4.00 |
gpt-5.5 | GPT 5.5 | OpenAI | 1,050,000 | 128,000 | Yes | $5.00 | $30.00 |
gpt-5.4 | GPT 5.4 | OpenAI | 1,050,000 | 128,000 | Yes | $2.50 | $15.00 |
gpt-5.4-mini | GPT 5.4 mini | OpenAI | 400,000 | 128,000 | Yes | $0.75 | $4.50 |
gpt-5.4-nano | GPT 5 nano | OpenAI | 400,000 | 128,000 | Yes | $0.20 | $1.25 |
gpt-5-chat-latest | GPT 5 chat latest | OpenAI | 128,000 | 16,384 | Yes | $1.25 | $10.00 |
gpt-5.3-codex | GPT 5.3 codex | OpenAI | 400,000 | 128,000 | Yes | $1.75 | $14.00 |
o3 | o3 | OpenAI | 200,000 | 100,000 | Yes | $2.00 | $8.00 |
gemma-3-27b | Gemma 3 27B | 128,000 | 8,192 | Yes | $0.10 | $0.10 | |
gemma-3-12b | Gemma 3 12B | 128,000 | 8,192 | Yes | $0.05 | $0.05 | |
gemma-3-4b | Gemma 3 4B | 128,000 | 8,192 | No | $0.02 | $0.02 |
Embedding models
| Model id | Display name | Upstream | Max input | Dimensions | Input $/1M | Output $/1M |
|---|---|---|---|---|---|---|
text-embedding-3-large | Text Embedding 3 Large | OpenAI | 8,191 | 3,072 | $0.13 | $0.00 |
text-embedding-3-small | Text Embedding 3 Small | OpenAI | 8,191 | 1,536 | $0.02 | $0.00 |
The model list is served from the catalog and changes as ModuleX adds, routes, or deprecates models. Read
GET /integrations/llm-providers/modulexai for the current set rather than hard-coding ids; catalog reads are cached for up to 10 minutes.The model record
Each entry in the provider’smodels[] array carries the fields below.
The ModuleX
model_id you select (for example claude-sonnet-4.6).Human-readable name shown in selectors.
Display vendor —
Anthropic, OpenAI, or Google on active managed models.Routing provider slug. Managed chat/reasoning models route through
openrouter; the OpenAI embedding models route through openai.Maximum context window in tokens.
Maximum tokens the model can generate.
0 for embedding models.Knowledge-cutoff date (for example
2026-01-01). This field, not a knowledge_cutoff key, carries the cutoff.Quality score, 1–5.
Latency score, 1–5 (higher is faster).
Whether the model accepts image input.
Upstream input price per 1M tokens, as a flat number (not a nested
pricing object).Upstream output price per 1M tokens.
0.0 on embedding models.true on the two embedding models. Embedding models also report embedding_dimension and a zero output price.Vector dimension for embedding models (
3072 for text-embedding-3-large, 1536 for text-embedding-3-small).Lifecycle state. Present and set to
deprecated on the retired bedrock-* ids; active models omit it (treated as healthy).On a deprecated model, the successor
model_id ModuleX routes the call to instead.Read the model catalog
models array contains the entries above plus an auth_schemas array describing the single modulex_key scheme. A truncated example:
Catalog detail (truncated)
The catalog detail endpoints are owner/admin-gated and org-scoped: every request needs
Authorization: Bearer mx_live_… plus X-Organization-ID, and the caller must be an owner or admin. See Authentication.Credit cost
Managed (modulexai) usage is the only LLM-provider usage that consumes credits. A credit is the managed-usage billing unit: **1000 credits = 0.001`). BYOK calls are never credited.
A managed call is metered in two parts.
Per-turn run charge
Each logical run or chat turn is charged a flatRUN_CREDIT of 1 credit, recorded once and deduplicated by an idempotency key so a resumed turn is not double-charged.
Token metering
The model’s tokens are converted to credits using the upstream per-token rates plus a system margin:Token-to-credit formula
| Symbol | Value | Meaning |
|---|---|---|
input_rate / output_rate | the model’s input_usd_per_1m_tokens / output_usd_per_1m_tokens | upstream USD rates from the model record |
MARGIN | 1.05 | a 5% system margin applied to managed usage |
SCALE | 1000 | 1000 credits = $1.00, so 1 credit = $0.001 |
completion_tokens = 0, so only input tokens are metered. If a model id is unknown to the pricing table, it meters to 0 rather than a silent fallback rate.
Other managed operations carry their own flat base charges, also billed in credits: a managed knowledge retrieval reserves
RETRIEVAL_BASE = 1 credit, a managed document ingest reserves FILE_INGEST_BASE = 1 credit, and an integration tool call has a TOOL_BASE of 10 credits ($0.01). These apply to the corresponding managed surfaces, not to a plain LLM call. See Credits & metering.Worked example
A singleclaude-sonnet-4.6 turn that reads 10,000 prompt tokens and writes 1,000 completion tokens:
- Token cost =
(10000 · 3.0 / 1e6 + 1000 · 15.0 / 1e6) · 1.05 · 1000=(0.03 + 0.015) · 1.05 · 1000≈ 47.25 credits (≈$0.047). - Plus the flat per-turn
RUN_CREDITof 1 credit.
When a managed call is denied
When managed usage exhausts your plan allowance and wallet, the billing admission gate denies the call with a flatDenialEnvelope — {code, layer, key, current, limit, reason} — returned on:
| Status | layer | When |
|---|---|---|
402 | credit / wallet | Plan credits exhausted, overage disabled, or wallet balance insufficient. |
403 | quota | A plan quota is exceeded. |
429 | rate | A per-plan rate class (for example sync_exec) is exceeded; includes Retry-After and X-RateLimit-* headers. |
{detail} HTTPException shape instead. See Errors & status codes and Usage gating & limits.
Where you select a managed model
A managed model id is selectable anywhere a model is chosen — no connection step:Chat
Pick a managed model in the chat composer.
Assistant
Set a managed model for the Assistant to reason and act with.
LLM node
Set
model_id on the node to a managed id.Agent node
Run an autonomous step on a managed model.
Deprecated models route automatically
When you select a model whosestatus is deprecated (or maintenance), ModuleX resolves it to its replacement_id before the call — for example a bedrock-claude-sonnet-4.6 id routes to claude-sonnet-4.6. Routing is cycle- and depth-guarded, so a chain of replacements terminates at a live model. For a managed integration, if a requested id is unknown or its chain cannot resolve, ModuleX softens to the integration’s first serving (healthy, non-deprecated) model rather than failing. You do not need to update saved selections immediately, but prefer a live id for clarity.
Deprecated models
These six ids exist only for backward compatibility and route to their replacements. Do not select them for new work.| Deprecated id | Routes to |
|---|---|
bedrock-claude-sonnet-4.6 | claude-sonnet-4.6 |
bedrock-claude-opus-4.6 | claude-opus-4.6 |
bedrock-claude-haiku-4.5 | claude-haiku-4.5 |
bedrock-gemma-3-27b | gemma-3-27b |
bedrock-gemma-3-12b | gemma-3-12b |
bedrock-gemma-3-4b | gemma-3-4b |
Switching to BYOK
Managed and BYOK are a per-selection choice — you switch by changing which model a surface uses, not by a global toggle. To move a chat, Assistant, or node off managed credits and onto a vendor-direct bill, select a model from a connected BYOK provider instead of amodulexai model.
Connect a BYOK provider
Create a credential for a first-party provider (
openai, anthropic, gemini, or xai) with POST /credentials, supplying your own provider key. You must be an owner or admin. See Connect LLM providers and the per-provider pages, for example Anthropic and OpenAI.Select a BYOK model where you want vendor-direct billing
In the chat model selector, Assistant settings, or a node, choose a model from the connected provider in place of the
modulexai model.Connect LLM providers
All providers, and how managed vs BYOK selection works.
Credits & metering
What a credit is and exactly what consumes credits.
Errors
Catalog reads formodulexai return ModuleX’s standard error shapes. Managed model calls can additionally return the billing DenialEnvelope described in When a managed call is denied.
| Status | When it happens | Shape |
|---|---|---|
400 | Asking a typed endpoint for the wrong integration type (for example modulexai is not a tool). | {detail: string} |
401 | Missing or invalid Authorization, or missing X-Organization-ID. | {detail: string} |
403 | Caller is not an owner or admin in the organization. | {detail: string} |
404 | Unknown provider, for example GET /integrations/llm-providers/<unknown> returns LLM provider not found. | {detail: string} |
422 | Query/parameter validation error. | {detail: [ ... ]} |
402 / 403 / 429 | A managed model call hits the billing gate (credit, quota, or rate). | DenialEnvelope — {code, layer, key, current, limit, reason} |
500 | Unhandled server error. | {detail: string} |
Related
Connect LLM providers
Managed vs BYOK, and the full provider list.
Credits & metering
The credit unit and what consumes credits.
Usage gating & limits
The billing admission gate and its 402/403/429 responses.
Anthropic (BYOK)
The bring-your-own-key alternative for Claude models.
Open questions (TBD)
- Upstream routing detail. Several managed models route through an aggregator (
openrouter) with a Bedrock-then-Anthropic fallback order, and the OpenAI embedding models route throughopenai. The exact upstream selected for a given call is handled internally and is not exposed in any catalog API response; treat the routing as an implementation detail subject to change. - Public catalog base URL. The catalog and SDK examples use
https://api.modulex.devper the repo-wide convention; the exact public host for these endpoints is not pinned from source and should be confirmed against the deployed environment. - Per-model max-output overrides. ModuleX applies scenario-specific maximum-output-token limits to managed calls (for example a smaller cap for simple chat than for an agent step). These limits are backend-only and are never returned in a catalog response, so the catalog’s
max_output_tokensis the model ceiling, not necessarily the per-scenario limit applied at run time.