Documentation Index
Fetch the complete documentation index at: https://docs.kaireonai.com/llms.txt
Use this file to discover all available pages before exploring further.

POST /api/v1/recommend runs the tenant’s published decision flow for one customer and returns a ranked list of offers, each tagged with an interactionId and recommendationId for downstream attribution. The route is the production hot path for next-best-action delivery.
What it does
The route resolves a decision flow for the (tenant, channel, placement) tuple, then hands execution to the decision-flow engine. The engine walks the flow’s nodes (Enrich → Qualify → Score → Rank → Compute) and returns the ranked candidates plus a compact trace summary. Resolution preference is explicit > routed > auto-selected: an explicitdecisionFlowKey (or legacy blueprintKey) wins; if absent, a flow-route lookup keyed by channel and placement runs; if no route matches, the most recently updated published or active flow is picked; if no flows exist, a base flow is lazy-created on the first call.
The route has two synchronous side effects per call. Every returned decision is written to the interaction history as a recommendation row so POST /api/v1/respond can look it up by recommendationId + rank. Decisions whose channel does not require explicit impression tracking are also auto-recorded as impression rows in the same partitioned table. Both writes use raw SQL because the standard batch-insert path emits an ON CONFLICT (id) clause that is invalid against a composite primary key on a partitioned table.
Quick start
Live sample — captured from playground
The following request and response are captured verbatim from the 2026-05-05 functional test againsthttps://playground.kaireonai.com,
showing differential propensity scoring for customer C1 (high credit,
short tenure) when the published flow’s score node points at
Scorecard A (credit-first weights).
Scorecard B (loyalty-first weights)
and repeating the same request returns score: 0.6460 for the same
customer — proving the algorithm swap takes effect end-to-end (see
functional test report).
How it works
Authentication and quota
Every call resolves a tenant before any work runs. Requests carrying anX-API-Key that starts with krn_ are validated against the database and the bound tenant id is used (header X-Tenant-Id is ignored to prevent spoofing); other requests fall back to a tenant id read from the request headers. Missing tenant context returns 401; a tenant id that doesn’t exist in the database returns 403.
After auth, the handler enforces per-window rate limits and a lifetime decision quota. Playground tenants are capped at 5,000 lifetime decisions; past that the route returns 429 with error code "PLAYGROUND_QUOTA_EXCEEDED". Non-playground tenants face no decision quota.
Anonymous-customer derivation
WhencustomerId is missing or set to "anonymous", the route derives a stable surrogate. With a sessionId present, the surrogate is anon-{sessionId} after validating the session id against ^[a-zA-Z0-9_-]+$ and capping at 64 chars. Without a session id, the surrogate is anon-{8-hex} derived from an FNV-1a hash of x-forwarded-for + user-agent. The GET endpoint runs the same derivation.
Decision-flow resolution
Resolution order:- Explicit
decisionFlowKey(or legacyblueprintKey) in the body. If the value is not a key, the route attempts a case-insensitive name lookup and rewrites it to a key. - Flow-route lookup keyed by
(tenantId, channelId | channel, placement), cached for 120s underroute:{tenantId}:{channelId}:{placement}. - Most recently updated published or active flow.
- If no flow exists, a base flow is lazy-created on the first call.
400 No published decision flow found.
Kill switch and control group
The tenant kill switch readstenant.settings.nbaEnabled and caches the result for 60s under nba-enabled:{tenantId}. When the flag is false, the route bypasses flow execution and returns a fallback priority response (offers sorted by priority descending, with nbaEnabled: false and meta.fallbackMode = "priority_only").
Control-group bucketing runs an FNV-1a hash of control:{customerId}:{YYYY-MM-DD} to deterministically bucket the customer for the day. The percentage is read from tenant.settings.controlGroupPercent (default 2%, cached 60s under control-group-pct:{tenantId}). Control-group decisions keep qualification and contact-policy filtering but get scores randomized via the same hash function so the rank order is independent of the model.
Engine execution
The engine returns a result envelope with the following shape:scoreExplanation block is set on every decision and is asserted to be present by the integration test suite.
Realtime EXP3-IX bandit
WhentenantSettings.aiAnalyzerSettings.ranking.exp3IxEnabled is true and arms are configured, the route samples one arm before flow execution. The picked armIndex and armId thread into the auto-impression’s response JSON and into the top-level response body. When the flag is off or no arms are configured, the response omits banditArmIndex. See EXP3-IX Ranking for arm configuration.
Side effects
The auto-impression block filters decisions whose channelimpressionMode !== "explicit" and inserts one impression row per decision via raw SQL. The recommendation-recording block then inserts one recommendation row per decision. Both blocks loop sequentially — N decisions produce up to 2N round-trips against the partitioned interaction-history table. This is intentional: a single batch INSERT … ON CONFLICT (id) DO NOTHING cannot be expressed against a composite primary key on a partitioned table.
Reference
Request body
The POST body is read field-by-field rather than validated against a single Zod schema. Per-field shape requirements are listed below. The batch endpoint at/api/v1/recommend/batch does use a single Zod schema.
Unique customer identifier. When omitted or set to
"anonymous", the route derives a stable surrogate from sessionId (preferred) or x-forwarded-for + user-agent.Filter candidates to creatives whose channel matches this channel type or name.
Channel ID (UUID) used for flow-route lookup. When supplied, takes precedence over
channel for routing.Filter candidates to creatives bound to this placement.
Multi-placement request. Each entry is
{ placementId: string, limit?: number }. When present, the response is the multi-placement shape and the single-placement fields are omitted.Multi-placement only. When true, placements are resolved sequentially and each placement excludes offers already returned by earlier placements.
Maximum decisions returned. Clamped to
[1, 50].Session identifier. Used both for anonymous-customer derivation and for echoing back into the response and the auto-impression’s
context. Validated as alphanumeric/-/_, max 64 chars.Free-form real-time context (device, page URL, etc.) merged into the auto-impression’s
context JSON.Customer segment ids passed through to the engine for qualification rules that match against segments.
Per-request customer attributes. Available to the Compute stage as
attributes.<key> variables when evaluating computed-field formulas.Locale code (e.g.
en-US). Echoed back in the response; reserved for content selection in future stages.ISO 4217 currency code. Echoed back in the response.
inbound or outbound. Stored on the recommendation interaction row.Offer IDs to exclude from candidates. Legacy alias
excludeActions is also accepted.Creative IDs to exclude. Legacy alias
excludeTreatments is also accepted.Explicit flow key (or name — a case-insensitive name lookup also resolves). Legacy alias
blueprintKey is also accepted.When true, the engine attaches a
debugTrace block (per-rule pass/fail reasons) to the response.When true, the route adds an
explanation object to each decision and a rejectedOffers[] block built from the debug trace. Implies debug: true.Response (single-placement)
This is the shape returned for both the explicit-key path and the auto-resolve path.UUID minted at the start of the request. Echoed in the auto-impression’s
response.interactionId and the recommendation row’s response.interactionId.Same value as
interactionId. Use either when calling POST /api/v1/respond.Either the supplied
customerId or the derived anonymous surrogate.Echoed back from the request body.
Key of the flow that ran. In the explicit-key path the value reflects the raw request input (after name-to-key translation); in the auto-resolve path it reflects the resolved key.
Version number of the flow’s published snapshot, or null when running from a draft configuration.
Variant name when the flow has an experiment node and the customer was assigned a variant.
True when the customer was bucketed into the always-on control group for the current UTC day.
Echoed from the request body, default
"inbound".ISO timestamp from
bpResult.now, set when the engine started.Echoed from the
channel request field, or "all" when none was supplied.Echoed from the
placement request field, or "all" when none was supplied.Echoed from the request body.
Echoed from the request body.
Number of items in
decisions[] after the per-request limit was applied.Ranked offers. See the per-decision sub-fields below.
Present only when
explain=true and the debug trace contains rejection reasons. Each entry is { offerId, offerName, stage: "eligibility" | "contact_policy", reason }.Present only when EXP3-IX is enabled for the tenant AND
tenantSettings.aiAnalyzerSettings.ranking has configured arms. The route picks one arm before flow execution and threads its index into the response.Companion to
banditArmIndex. Echo this value back into interaction.response when calling /respond so the arm’s log-weight gets updated on outcome.Trace counters from the engine. See sub-fields below.
Number of offers that entered the pipeline.
Candidates remaining after qualification rules ran.
Candidates remaining after contact policies ran.
Candidates remaining after suppression rules ran. Present in the explicit-key path response; see Honest limits for the auto-resolve path’s omission.
True when at least one scorer threw an error and a fallback score was used.
Present only when the realtime negotiation-apply pass ran and produced a non-noop result. Shape:
{ applied: number, rejected: number }. See Constraints and Negotiation.Present only when
debug=true or explain=true.decisions[] per-item shape
1-based rank after sorting and any control-group reshuffle.
Final score from the scorer (or randomized in control group).
Joined from
creative.channel.name.Joined from
creative.channel.channelType.Joined from
creative.placement.name.Joined from
offer.categoryRef.name with fallbacks.Joined from
offer.subCategoryRef.name with fallbacks.Mirrored from
offer.mandatory.Mirrored from the candidate’s priority (driven by
offer.priority).Mirrored from the candidate’s weight.
From
creative.templateType.From
creative.content.Per-candidate property bag from the engine.
From
creative.abTestVariant.From
creative.constraints.ISO timestamp from
offer.expiresAt.From
offer.metadata.Set on every decision; the integration test suite asserts it must be present. Shape:
{ method, priority, weight, fitMultiplier, finalScore }.Free-form
Record<string, any>. Keys are tenant-defined — they come from the category’s customFields of type computed, plus optional flow-level extras and per-flow overrides. Standard examples include personalized_rate or greeting, but the field set is open.Present only when the candidate’s channel uses implicit impression tracking and the auto-impression insert succeeded. Looked up by deduplication id and threaded into the decision.
Present only when the realtime apply-mode wire ran and the candidate was accepted. Shape:
{ sessionId, proposal }.Present only when the realtime apply-mode wire rejected the candidate. Shape:
{ sessionId, reason }.Response (multi-placement)
Returned when the request body supplies aplacements[] array.
Map of
placementId → { offers: [...], count: number }. As of the multi-placement fix (#158), each offers[] entry includes the same render-essential creative fields as the single-placement shape — specifically templateType and content — so consumers can render the creative without a follow-up fetch. The fields that remain single-placement-only are weight, scoreExplanation, impressionId, and appliedNegotiation (these come from stages that don’t fire in the multi-placement code path).Same value as
interactionId.Same value as
interactionId and recommendationId. Multi-placement only — kept for legacy callers that key on requestId.ISO timestamp set when the response was assembled.
Present only when the multi-placement request crossed at least one channel with
couplingMode = "atomic" (or a DecisionFlow.couplingOverride = "atomic"). One entry per channel touched by the request. Shape: [{ channelId, channelName, mode, cascaded, emptyPlacements: [placementId, ...] }]. When cascaded = true, the channel’s other placements were emptied because at least one placement in the same channel had no candidates — surface this in the consumer UI to distinguish “we suppressed this because the sibling was empty” from “this placement just wasn’t configured.” Cross-channel coupling is intentionally NOT supported; different channels are independent attention surfaces.How the multi-placement engine runs the flow
Whenplacements: [...] is provided AND every placement resolves to the same decision flow (via resolveFlowRoute → channel+placement → channel-only → tenant default), the recommend route runs executeDecisionFlow ONCE with placementFilters: [<all requested placement ids>]. The match_creatives node keeps candidates whose placementId is in the requested set (plus wildcards when allowWildcard is true). The group node’s allocator (Hungarian or Greedy) sees the full set of slots across all placements in a single cost matrix, so Hungarian enforces per-offer uniqueness across placements within the channel. Each surviving candidate is stamped with its assigned placementId, and the route splits the flat result list back into per-placement buckets in the response.
When placements resolve to different flows, the route falls back to per-placement execution (one executeDecisionFlow call per placement, each scoped to a single placement). In that fallback, Hungarian uniqueness only holds within a single placement; the channel-coupling pass still runs at the route boundary across the aggregated placementResults.
Response (engine-emitted grouped)
Returned when the executed flow’s last node is a Group node that emits placements (V2 grouped response).meta block; the request-driven multi-placement response (when the caller supplies placements: [...]) does not.
GET endpoint
GET /api/v1/recommend accepts a subset of POST body fields as query-string parameters: customerId, channel, placement, limit, decisionFlowKey, explain, debug. The GET response omits sessionId, locale, and currency because no request body carries them.
The GET endpoint returns 400 decisionFlowKey is required. Multiple active flows exist: … when more than one published or active flow exists and no decisionFlowKey query parameter is set. POST auto-selects in the same situation.
Status codes
| Code | When |
|---|---|
| 200 | Successful recommendation |
| 400 | Missing required body fields, invalid JSON, or unresolvable flow |
| 400 | GET only — multiple active flows exist and no decisionFlowKey query param |
| 401 | Missing tenant context |
| 403 | Invalid tenant identifier |
| 429 | Rate limit exceeded OR playground 5,000-decision quota exhausted |
| 500 | Unexpected server error |
| 504 | Request exceeded the 30s timeout |
- Standard API-error envelope — used by 400 and 500:
{ error: { code, message, status, traceId, timestamp } }. - Tenant-error envelope — used by 401 and 403:
{ title, detail }. - Timeout envelope —
{ error: { code: "TIMEOUT", message: "Request timed out", status: 504 } }. OmitstraceIdandtimestamp.
{ error: { code, message, used, limit } }; the rate-limiter’s 429 envelope is also distinct.
Required headers
| Header | Required | Purpose |
|---|---|---|
Content-Type | POST only | application/json for POST bodies |
X-API-Key | Yes (one of the two) | API key (krn_…) — also used as the rate-limit identifier |
X-Tenant-Id | Yes (one of the two) | Direct tenant id; ignored when X-API-Key resolves a tenant |
X-Forwarded-For | No | Falls back to anonymous-id derivation and rate-limit identifier |
User-Agent | No | Used in the FNV-1a hash for anonymous customers |
x-user-id | No | Triggers onboarding-step tracking only |
Authorization: Bearer … is not a supported authentication mode on this route. The middleware reads the Authorization header only to gate CSRF; tenant resolution only verifies X-API-Key (when prefixed with krn_) and X-Tenant-Id.
Configuration
Environment variables
| Variable | Effect |
|---|---|
NODE_ENV=test | Disables rate limiting in the test runner |
tenant.settings and tenantSettings.aiAnalyzerSettings.
Caches
| Cache key | TTL | What it caches |
|---|---|---|
route:{tenantId}:{channelId}:{placement} | 120s | Flow-route resolution by channel + placement |
flowkey:{tenantId}:{flowId} | 120s | Flow id → key lookup |
flow:{tenantId}:{flowKey} | 60s | Compiled decision-flow object |
nba-enabled:{tenantId} | 60s | Tenant kill-switch check |
control-group-pct:{tenantId} | 60s | Control-group percentage |
shap-enabled:{tenantId} | 60s | Whether to compute SHAP in the hot path |
Rate limits
| Tenant type | Per-window | Window | Lifetime decision quota |
|---|---|---|---|
Playground (tenant.isPlayground = true) | 100 | 60s | 5,000 |
| Non-playground | 1,000 | 60s | None |
X-API-Key > X-Forwarded-For > "anonymous".
Request timeout
The POST handler is wrapped in a 30-second request timeout. The GET handler is not wrapped. On timeout the response is504 { error: { code: "TIMEOUT", message: "Request timed out", status: 504 } }.
Honest limits
- The auto-resolve fallthrough response returns a
metablock that omitsafterSuppression; the explicit-key path includes all five counters. Tracked as a code-side cleanup. - Auto-impression and recommendation writes are up to 2N round-trips per request — one raw insert per decision in the impression loop, plus one per decision in the recommendation loop — because the interaction-history table is partitioned and a batch INSERT cannot use
ON CONFLICT (id)against a composite primary key. A request withlimit=50produces up to 100 sequential SQL round-trips. - The POST body is not validated against a single Zod schema. Field validation is per-read in the handler. The batch endpoint at
/api/v1/recommend/batchdoes use a single Zod schema. Authorization: Bearer …is not a supported auth mode. The middleware reads theAuthorizationheader only to gate CSRF; tenant resolution only verifiesX-API-KeyandX-Tenant-Id.- The 504 timeout envelope shape diverges from the standard API-error envelope — it omits
traceIdandtimestamp. The 401/403 envelope uses{ title, detail }and is also distinct. - Bandit arm-index threading fires only when the tenant has EXP3-IX enabled AND configured
banditConfig.armsintenantSettings.aiAnalyzerSettings.ranking. Without arms it is a structured no-op (nobanditArmIndexin the response). See EXP3-IX Ranking for arm configuration.
Related
- Respond API — record the outcome of a recommendation.
- Decision Flows — the engine that backs this route.
- Decisioning Gates — qualification + contact-policy stages.
- Ranking Profiles — multi-objective scoring weights.