RAG API · Production-ready

Skip the six-month RAG build

Production REST API with OpenAPI spec, real auth, rate limits that scale, and source citations on every response. Start with curl. Ship in production. The infrastructure your team would have built, without the year of building it.

OpenAPI 3.0EU-hosted99.9% uptime SLA
curlnodepython
# Ask any question of your indexed docs
curl https://app.biel.ai/api/v1/chats/ \
  -H "Authorization: Token $BIEL_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "project_id": "prj_2K4hXp",
    "message": "How do I rotate API keys?",
    "stream": true
  }'

# Streamed response, with sources cited
{
  "answer": "Rotate keys from Settings → API
            Keys → Rotate...",
  "sources": [
    "keys/rotation.md",
    "security/best-practices.md"
  ],
  "chat_id": "cht_9X8Yqp"
}

Used by teams that take their documentation seriously

GrepTime
ScyllaDB
SecuroSys
Katalon
Talon.one
Tezos
CrazyGames
GrepTime
ScyllaDB
SecuroSys
Katalon
Talon.one
Tezos
CrazyGames
The build vs. buy moment

What it costs to build this yourself

Standing up production RAG on your own is a real engineering project: ingestion pipelines, vector store, embedding refresh, prompt templates, hallucination guards, citation rendering, rate limits, observability, on-call. Most teams realize the cost six months in.

Build it yourself

Six months and a dedicated team

  • iBuild a doc ingestion pipeline that handles markdown, code blocks, OpenAPI specs.
  • iiPick and run a vector database. Manage embeddings refresh on every doc change.
  • iiiEngineer prompts that resist hallucination. Test against real questions. Iterate.
  • ivBuild citation rendering. Make sure links survive every model upgrade.
  • vSet up rate limits, auth, monitoring. Pager rotation. Cost dashboards.
  • viMaintain it forever. Every model change, every doc platform update, every edge case.
~6 months to a v1 you’d actually ship. Plus ongoing.
Use the Biel RAG API

One afternoon and a curl command

  • iPoint us at your docs URL or repo. We crawl, parse, embed, index.
  • iiOne REST endpoint. OpenAPI 3.0 spec. SDKs for the languages your team writes.
  • iiiStreaming responses by default. Real-time tokens, real-time citations.
  • ivSource citations on every answer. Built-in. No template engineering.
  • vAuth, rate limits, monitoring, status page. EU-hosted with a 99.9% SLA.
  • viModel and prompt upgrades land on our side. Your code doesn’t change.
From signup to first grounded answer in 15 minutes. From $50/mo.
Endpoints

The endpoints you actually need

A short, focused API surface. Six endpoints cover ingestion, retrieval, conversational chat, and analytics. No SDK lock-in: any HTTP client works, OpenAPI spec generates clients in any language.

POST/v1/chats

Ask anything

Send a question and get a grounded, source-linked answer. Streaming or buffered. Multi-turn aware: pass a chat_id to continue an existing conversation with full context.

POST/v1/search

Retrieve, don’t generate

Pure retrieval. Pass a query, get ranked chunks back with relevance scores and citations. For when you want to plug Biel into your own LLM stack and control generation yourself.

POST/v1/sources

Manage what’s indexed

Add a docs URL, a GitHub repo, a Notion workspace, a sitemap. Trigger a re-index, set a schedule, exclude paths. Webhook callbacks let you wire ingestion into CI/CD.

GET/v1/projects/<id>/analytics

What users actually asked

Every question, every answer, every cited source. Aggregate by intent, source, language, or unanswered queries. The same data the dashboard renders, available programmatically.

POST/v1/feedback

Close the loop

Record thumbs-up, thumbs-down, or freeform feedback on any response. Surfaces in your dashboard, feeds back into ranking signals, helps you spot which sources actually answer well.

GET/v1/openapi.json

Generate clients automatically

Full OpenAPI 3.0 specification. Use openapi-generator, Stainless, or your favorite SDK builder to scaffold typed clients in any language your team writes in.

What’s included

The infrastructure your team would have built

i.

Streaming first

Server-sent events out of the box. Tokens stream as the model produces them, with citations attached as soon as sources resolve. Buffered responses available for batch jobs.

ii.

Real auth, real keys

Token-based auth with scoped permissions. Rotate, revoke, scope by project. Per-key rate limits, audit logs, and IP allowlisting on higher plans.

iii.

Rate limits that scale

Generous defaults. Burst-tolerant. Higher limits available for production deployments without renegotiating your contract every quarter.

iv.

Observability built in

Every request gets a trace ID. Every response carries timing, model version, and source provenance. Wire the API logs to your existing observability stack via webhooks.

v.

Webhooks for ingestion

Trigger re-indexes from CI/CD. Subscribe to ingestion-complete events. Get notified when a source fails to crawl. Everything event-driven, nothing manual.

vi.

Multi-tenancy ready

One account, many projects. Each project is its own isolated index, its own permissions, its own ranking rules. Built for agencies, platforms, and multi-product orgs.

Use cases

What teams build with the API

Custom in-product chat

An assistant inside your app

Build the chat experience that fits your product. The widget is a starting point; the API lets you control everything: layout, persona, scope, escalation paths, follow-up actions, billable events.

Internal copilots

For sales, success, and support

Power internal tools with grounded answers from your knowledge base. Let your team query past tickets, runbooks, deal notes, and product docs from one prompt.

Pipelines and automations

Wire docs into your workflows

Auto-respond to GitHub issues with linked docs. Pre-populate Zendesk replies. Generate context-aware tooltips at build time. Anywhere your docs are useful, the API is.

They turn our feature requests into reality in no time.

Lars Wilhelmer · Documentation Engineer at Talon.One

Things teams ask before integrating

Which AI models do you support?

We run OpenAI and Anthropic models behind the API, alongside retrieval and embedding models tuned for technical documentation. We upgrade models on our side without breaking your integration. The API contract stays stable; the models behind it get better.

Can I bring my own LLM?

Yes, on Enterprise plans. Use the /v1/search endpoint to retrieve grounded chunks and feed them into your own model. You get the indexing and ranking infrastructure, you keep control of generation.

What about latency?

Streaming responses start within a few hundred milliseconds. Full answers complete in 1 to 8 seconds for most queries.

Is anything used for training?

No. Conversations and indexed content are never used to train AI models. You own your data and can export or delete it at any time.

Ship grounded answers this afternoon

14-day free trial · No credit card required · From $50/mo
Try me ↓