A2RAG - Stop Your RAG From Guessing

The problem

Confident answers from incomplete knowledge.

Standard RAG systems are built to answer. They don't know when to stop. When your knowledge base doesn't cover the full picture, they fill in the gaps - confidently, incorrectly.

In regulated industries - insurance, legal, healthcare, HR - a wrong answer isn't just unhelpful. It's a liability.

A2RAG gives your pipeline three options instead of one: answer safely, ask for clarification, or refuse to guess.

live example

// Without A2RAG

user › Can I return this item?

rag › Yes, returns accepted within 14 days.WRONG

User has a digital product - non-refundable.

// With A2RAG

user › Can I return this item?

a2rag › CLARIFY SAFE

"Was this a physical product or a digital download?"

User confirms → correct policy applied ✓

How it works

Three outcomes instead of one.

A2RAG evaluates every query against two independent scores and routes accordingly - in under 2ms, with no changes to your existing pipeline.

STEP 01

Your pipeline runs unchanged

Pass the user query, retrieved documents, and draft answer to A2RAG. Your retrieval and LLM stay exactly as-is.

STEP 02

Two scores computed

Evidence score - does the corpus support this answer?

Completeness score - is there enough context for a specific answer?

STEP 03

Routing decision returned

One of three actions returned instantly. You define what happens next - show it, ask a follow-up, or escalate to a human.

Conversation examples

Clarify

Answer

Abstain

User

How fast will support respond to my ticket?

Instance-specific - depends on plan

A2RAG

→ CLARIFY (conf: 0.85)

"What is your current plan? (Free / Starter / Growth)"

User

Growth plan

A2RAG

→ ANSWER (conf: 0.97)

Growth plan: 4-hour response SLA for priority tickets.

❓ Clarify first→ context provided →✓ Safe answer

User

What is the refund window for physical items?

Generic policy - directly in knowledge base

A2RAG

→ ANSWER (conf: 0.94, evidence: 91%)

Corpus fully supports draft. Safe to present to user.

✓ Direct answerNo friction added

User

What is your chargeback review timeline?

Topic not covered in knowledge base

RAG

Draft: "30-45 business days." hallucinated

A2RAG

→ ABSTAIN (evidence: 0%)

Topic not in corpus. Configured action fires → escalate / message / webhook.

🚫 AbstainHallucination blocked before user sees it

Use cases

Built for high-stakes industries.

A2RAG is particularly valuable where wrong answers carry real consequences. Pre-tuned domain profiles available for each.

🛡️

Insurance

Prevent coverage misstatements and liability from hallucinated policy answers.

Coverage eligibility questions
Claim filing timelines
Policy exclusion queries
Deductible calculations

⚖️

Legal & Compliance

Stop jurisdiction-specific speculation and unsupported legal interpretations.

Contract term questions
NDA enforceability
IP ownership queries
Regulatory compliance

🏥

Medical & Clinical

Intercept unsupported clinical information before it reaches patients or staff.

Drug interaction queries
Treatment eligibility
Dosage questions
Clinical protocol lookups

👥

HR & Employee Support

Avoid wrong policy answers that create entitlement disputes or legal exposure.

Leave entitlement queries
Parental leave policy
Remote work requests
Benefits & expense policy

🎧

Customer Support

Route plan-specific questions correctly. Stop bots from making commitments they can't keep.

Plan-specific feature queries
SLA & support questions
Billing & cancellation
Integration availability

🏦

Financial Services

Prevent speculative financial guidance and product misrepresentation.

Product eligibility
Fee & rate questions
Regulatory disclosures
Transaction dispute handling

Integration

3 lines. Any pipeline.

Works with LangChain, LlamaIndex, OpenAI, Anthropic, or any custom RAG setup. No infrastructure changes. Your retrieval and LLM stay exactly as-is.

⚡

pip install a2rag

One package. No required infrastructure changes. Works alongside your existing setup.

🎛

Configurable actions

Define what happens on abstain - escalate, custom message, webhook, or silent fallback.

📊

Local analytics dashboard

client.dashboard() opens a private browser dashboard. Your data stays on your machine.

🌍

Multilingual

Automatic language detection. English, Hebrew, Arabic, French, Spanish - no configuration needed.

integration.py

from a2rag import A2RAGClient

client = A2RAGClient(api_key="your_key")

# Your existing pipeline - unchanged
contexts     = rag.retrieve(user_query)
draft_answer = llm.generate(user_query, contexts)

# Add A2RAG - 3 lines
decision = client.decide(
    query=user_query,
    contexts=contexts,
    draft_answer=draft_answer,
    domain="insurance",  # optional preset
)
if decision.should_answer:
    show_to_user(draft_answer)

elif decision.should_clarify:
    # Generated follow-up question
    ask_user(decision.clarification)

elif decision.should_abstain:
    # Topic not in corpus
    escalate_to_human()

$pip install a2rag

Early results

Tested across domains and languages.

Controlled evaluation across 200+ decision points including edge cases, partial queries, multilingual inputs, and contradicting corpora.

Unsafe Answer Rate

Never answers confidently when the corpus cannot support it.

91%

Decision accuracy

Correct answer / clarify / abstain decisions across all test scenarios.

100%

Abstain precision

Every abstention was correct. Zero false refusals on answerable queries.

<2ms

Added latency

Decision overhead is negligible relative to RAG retrieval and LLM inference.

⚠ A2RAG is a statistical system. Results reflect controlled test sets and may vary by domain, corpus quality, and query type. See Terms of Service for full disclaimer.

Security & privacy

Your data never leaves your pipeline.

A2RAG is architecturally designed so user query content never reaches our servers. We return a routing decision - that's all.

🔒

Query content never stored

Queries, documents, and answers are processed in-memory and immediately discarded. We cannot reconstruct any conversation.

📦

Local deployment available

Docker image available for teams that need private infrastructure. Enterprise tier includes full on-premise deployment support.

📊

Optional anonymized analytics

Free tier sends anonymous decision metadata (action, confidence, latency - no content) to improve accuracy. Opt-out available.

🛡️

EU infrastructure

Hosted on AWS EU (Frankfurt). GDPR compliant. Israeli Privacy Protection Law (Amendment 13) compliant.

🔑

API key security

Keys are one-way hashed (SHA-256). Plaintext keys are never stored. Rate limiting and automatic expiry available.

🚫

No training on your data

Customer data is never used for model training by default. Anonymized aggregate patterns only, with explicit opt-in.

Early access

Join the private beta.

We're working with a small group of developers and teams to validate A2RAG on real production corpora. No pricing yet - this is about learning together.

For developers

Developer Access

Builders · indie AI · experimentation

Get started immediately. No review required.

500 decisions / month
Sandbox API access
Python SDK + demo notebooks
Anonymous usage analytics
Community feedback program

Free during early access · Terms of Use applies

Request Access →

PILOT PROGRAM

For teams

Startup / Team Pilot

Startups · SMBs · internal AI tools

Real workflow testing with guided onboarding. We review each application.

10,000 decisions / month
Guided onboarding call
Workflow reliability evaluation
Priority support
Pilot deployment support

Pilot Agreement required · We'll schedule a call

Apply for Pilot →

For enterprise

Enterprise

Insurance · Legal · Healthcare · Enterprise AI

Private infrastructure, custom integrations, full on-premise deployment.

Docker / local deployment
Private infrastructure
Custom integrations
Security-focused architecture
Custom decision volumes

Always starts with a conversation

FAQ

Common questions.

Each call to client.decide() is one decision. A2RAG evaluates the query, retrieved contexts, and draft answer - and returns one of three actions: answer, clarify, or abstain. One API call = one decision, regardless of outcome.

No. Query content, documents, and answers are processed in-memory and immediately discarded. We never store, log, or train on the content passing through the API. We only retain anonymous decision metadata (action, confidence, latency) to improve accuracy - and only with your consent on the free tier.

No. A2RAG sits between your RAG retrieval and your users. Your LLM still generates the draft answer. A2RAG decides whether that answer is safe to show, needs clarification, or should be withheld. Your existing LLM and RAG setup stay completely unchanged.

You define what happens. Options include: route to a human agent, return a custom predefined message, fire a webhook to your internal systems, or handle it silently in your application. Every behavior is fully configurable per customer and per domain.

Yes - A2RAG is a statistical system and decisions may be incorrect. It may answer when it should abstain (false positive) or abstain when an answer was available (false negative). Performance varies by domain, language, and corpus quality. We recommend testing on your specific corpus before production deployment. A2RAG reduces risk - it does not eliminate it.

A2RAG includes automatic language detection and works across English, Hebrew, Arabic, French, Spanish, and other languages. Performance is best on English corpora. Hebrew and Arabic operate in a specialized heuristic mode that handles right-to-left text and morphology.

Yes. Docker deployment is available for teams that need private infrastructure. Enterprise tier includes full on-premise support with no data leaving your environment. Contact us to discuss your requirements.

Most developers are up and running in under an hour. Install with pip install a2rag, pass your existing RAG output to client.decide(), and handle the three possible outcomes. No infrastructure changes required. Demo notebooks are included for Insurance, HR, Legal, and Support domains.

No. Query content is never used for training. The free developer tier optionally shares anonymous decision metadata (not content) to help improve model accuracy - this is disclosed at signup and can be disabled. Pilot and Enterprise tiers have no telemetry by default.

Your RAG answers
when it shouldn't.

Confident answers from incomplete knowledge.

Three outcomes instead of one.

Your pipeline runs unchanged

Two scores computed

Routing decision returned

Built for high-stakes industries.

3 lines. Any pipeline.

Tested across domains and languages.

Your data never leaves your pipeline.

Join the private beta.

Common questions.

Stop your RAG from guessing.

Your RAG answerswhen it shouldn't.

Confident answers from incomplete knowledge.

Three outcomes instead of one.

Your pipeline runs unchanged

Two scores computed

Routing decision returned

Built for high-stakes industries.

3 lines. Any pipeline.

Tested across domains and languages.

Your data never leaves your pipeline.

Join the private beta.

Common questions.

Stop your RAG from guessing.

Your RAG answers
when it shouldn't.