AI that only knows
what you tell it.
InsightsAssist embeds a citation-first AI into any website. Users explore your data through natural language — every answer grounded in your dataset, never hallucinated, never invented.
→ “Which departments led external funding from 2019–2024, and who were the top principal investigators?”
Across 4,847 funded research initiatives in this dataset, three departments account for 62% of total awarded funding. DEPT-ENG-042 leads with $284.3M across 1,203 active grants — PI-0891 is the highest-funded investigator at $47.3M in awarded federal funding. DEPT-MED-017 follows at $198.1M, up 43% from the prior five-year period. DEPT-ENV-008 recorded a 340% increase since 2019, driven by 14 new federal climate initiatives tracked under GRANT-FED-2023-C. Full rankings available across all 4,847 funded projects.
AI normally answers from everything it has ever learned.
InsightsAssist answers only from you.
Every mainstream AI — ChatGPT, Gemini, Claude — was trained on enormous amounts of internet text. When you ask a question, it searches its training memory for a plausible answer. Most of the time this works fine. But when you ask about your specific data — your products, your records, your database — the model is guessing from general knowledge, not reading your files. That's why it confidently gets things wrong.
RAG — Retrieval-Augmented Generation — solves this by changing the order of operations. Before the AI writes a single word, it first searches your actual dataset for the relevant records. Only then does it compose an answer — and only from what it retrieved. Think of it as the difference between asking a colleague to answer from memory versus asking them to look it up first and show you their sources.
InsightsAssist is pre-built RAG for any website. You connect your dataset. We handle everything that makes RAG work — the indexing, the search, the grounding rules, the citation layer. Your users get accurate, sourced answers. You get a product that reflects your data exactly as it is, not as an AI imagines it might be.
The reference librarian.
Imagine a reference librarian at a highly specialized library — one that holds only your collection. When a visitor asks a question, the librarian searches the shelves, pulls the relevant volumes, reads the relevant passages, and delivers a precise, sourced answer.
She does not answer from personal opinion. She does not guess what a missing book might say. If the library does not have a text that addresses the question, she says exactly that — and points to the gap.
Searches its training memory. Answers from general knowledge. Sounds confident. May be wrong.
Searches your dataset first. Answers only from retrieved records. Shows its sources. Says so when it doesn't know.
This is what RAG — Retrieval-Augmented Generation — means in practice. InsightsAssist is RAG, pre-built and ready for any dataset.
From dataset to live AI in under ten minutes.
Connect your dataset
Upload a CSV, JSON, or connect a live database. InsightsAssist indexes your records, runs semantic embeddings, and builds SQL aggregation queries automatically. Supports Google Datasets, Supabase, PostgreSQL, Airtable, and custom uploads.
Drop one script tag
Paste a single line of code onto any page. Works on WordPress, Webflow, Shopify, React, Framer, or plain HTML — no rebuild required.
<script src="insightsassist.ai/embed.js" data-key="your-api-key" data-dataset="ds_abc123"> </script>
Users explore. Facts stay grounded.
Every answer the AI gives cites the exact record IDs it drew from. Users can verify any claim in one click. If your dataset doesn't support an answer, InsightsAssist says so — it never fills the gap with a guess.
Built for accuracy. Designed for users.
Grounded by architecture, not instruction
The AI is structurally prohibited from using training data. Your records are the only source. There is no fallback, no invented answer — because there's no pathway for hallucination to occur.
Citations on every single claim
Record IDs appear inline with every statement. Tap any citation to see the source record in full. Verification is one click — not one Google search.
Semantic + SQL dual-mode search
"How many X?" runs an exact SQL count. "Tell me about Y" runs semantic search. The engine picks the right mode automatically — both types of queries are accurate.
One embed tag. Any platform.
Shadow DOM widget deploys on WordPress, Webflow, Shopify, React, Framer, or raw HTML — without touching your CSS or breaking your existing layout.
Full admin dashboard
Upload datasets, configure appearance, set scope restrictions, manage API keys, monitor usage. Built for non-technical site owners.
Query analytics and coverage gaps
See exactly what your users are asking. Questions your dataset can't answer appear in a gap report — a direct signal for what content to add next.
Instant RAG.
No infrastructure required.
Retrieval-Augmented Generation — RAG — is the technique that makes AI answers trustworthy: instead of asking the model to answer from memory, you first retrieve the relevant records from your dataset, then instruct the model to reason only from those records. Every serious AI data product is built on RAG. Building it yourself takes a senior engineer four to eight weeks. InsightsAssist is pre-built RAG infrastructure — connect your dataset and it's live in minutes.
Building RAG yourself
estimated: 4–8 weeks, senior engineer
- Choose and configure an embedding model
- Stand up a vector database (Pinecone, Weaviate, pgvector)
- Build an ingestion and chunking pipeline
- Write retrieval logic with similarity thresholds
- Engineer a hallucination-prevention prompt layer
- Handle semantic vs. structured query routing
- Build citation extraction and rendering
- Design and ship an end-user chat UI
- Wire up usage analytics and monitoring
- Maintain all of the above as models and APIs evolve
...then repeat this for every new dataset.
InsightsAssist
estimated: under 10 minutes
- Upload your dataset (CSV, JSON, or DB connection)
- Done.
Embedding model selection, vector storage, retrieval tuning, hallucination prevention, citation rendering, analytics, and UI — all pre-built, pre-tuned, and maintained by InsightsAssist. Every new dataset you connect gets the same production-grade RAG pipeline in the same ten minutes.
# That's the entire integration. <script src="insightsassist.ai/embed.js" data-key="ia_live_••••••••" data-dataset="ds_••••••••"> </script>
What makes RAG work
When a user submits a question, InsightsAssist converts it into a vector embedding using the same model that indexed your dataset. It then runs a cosine similarity search across your stored record embeddings to find the most semantically relevant matches. Those top records — and only those records — are passed to Claude as context. The model is given a strict system prompt: answer only from the provided records, cite every claim with a record ID, and respond ‘the dataset does not contain data to answer that’ if the records don't support the question. Claude's training knowledge is not a fallback. It has no access pathway.
Why dual-mode retrieval matters
Pure semantic search — the approach most RAG implementations use — performs well for open-ended, exploratory questions. But it breaks down on analytical queries. Ask ‘how many members traded technology stocks in Q1?’ and a vector search returns similar records, not an exact count. InsightsAssist detects query intent and routes accordingly: semantic questions go through vector similarity, counting and comparison questions go through parameterized SQL against your indexed dataset. Both paths enforce the same citation requirement. The result is a system that handles the full range of questions real users actually ask — not just the ones RAG demos are designed to look good on.
Real data. Real questions. Right now.
Two very different datasets. The same grounded AI.
Congressional Stock Trade Tracker
VoterHQ tracks STOCK Act filings, congressional trading activity, and member financial disclosures in real time. InsightsAssist lets any voter ask natural-language questions across the full trading database — 'Which members traded pharmaceutical stocks before drug pricing votes?' — with every answer linked directly to the filing.
Notable Italian Americans Database
A verified database of 250+ notable Italian Americans across entertainment, sports, science, business, law, and public service — spanning 1776 to today. InsightsAssist enables rich exploration: 'Who were the first Italian Americans in each field?' or 'Compare contributions by decade.' All citations trace to verified records.
Simple, transparent pricing. Lock in beta rates today.
All beta participants receive permanent rate lock — your price never increases as long as your account is active.
- ✓1 dataset
- ✓Up to 1,000 records
- ✓500 queries/month
- ✓Citation engine
- ✓Standard embed widget
- ✓Community support
- ✓3 datasets
- ✓Up to 10,000 records
- ✓5,000 queries/month
- ✓Citation engine
- ✓Custom widget appearance
- ✓Query analytics
- ✓Email support
- ✓Unlimited datasets
- ✓Up to 100,000 records
- ✓25,000 queries/month
- ✓Citation engine
- ✓Full widget customization
- ✓Advanced analytics + coverage gaps
- ✓Google Datasets connector
- ✓Priority support
- ✓Unlimited everything
- ✓Private cloud deployment
- ✓Guaranteed SLA
- ✓Oracle/Salesforce/SAP connectors
- ✓AWS/Azure Marketplace billing
- ✓White-label option
- ✓Dedicated account manager
Nonprofits and 501(c)(3) organizations receive the Free tier permanently — no application required, no time limit.
Coming to every major marketplace.
Install InsightsAssist from the platforms you already use. One-click integrations. No custom development needed.
Join the waitlist. Shape the product.
Beta is limited to 50 sites. Participants get permanent rate lock, direct access to the founding team, and their use case prioritized in the development roadmap.
No credit card required. Beta access is limited to 50 sites. We respond within 48 hours. Questions? [email protected]
Common questions.
RAG stands for Retrieval-Augmented Generation. The name is technical but the idea is simple, and it directly determines whether an AI product is trustworthy for real business use.
Standard AI models — ChatGPT, Gemini, Claude, and every other large language model — were trained on vast amounts of text from the internet and other sources. When you ask them a question, they generate an answer by predicting what a plausible response looks like based on that training. This works remarkably well for general knowledge. It fails badly for specific, proprietary, or structured data — your product catalog, your database, your records — because the model has to guess at specifics it was never trained on. It often guesses confidently and incorrectly. That’s hallucination.
RAG changes the process entirely. Before the model writes a single word, it retrieves the relevant records from your actual dataset. The model then answers only from what it retrieved — not from training memory. The result is an AI that is accurate on your data for the same reason a human expert is accurate when they look something up: they are reading the source, not recalling an impression of it.
For your business, this means three things. First, your data is the authority — not the AI’s approximation of it. Second, every answer is auditable: users see exactly which records were used, and can verify any claim in one click. Third, when your dataset doesn’t support an answer, the system says so explicitly — it does not fill the gap with a guess that could mislead a customer, damage your credibility, or expose you to liability.
InsightsAssist is pre-built RAG infrastructure. You do not need to understand the technology to use it. You connect your dataset, embed one script tag, and your users get a grounded, cited AI explorer that reflects your data exactly as it is — immediately.
Most AI tools use your data as context but can still fall back on training knowledge when that data doesn't answer — that's where hallucination happens. InsightsAssist is architecturally constrained: if the answer isn't in your dataset, it responds "The dataset doesn't contain data to answer that" explicitly, every time. There is no fallback, no invented answer, because the architecture doesn't allow one.
Your data never trains any AI model. It is indexed in an isolated environment scoped to your account, retrieved only at query time, never shared with other customers or Anthropic. Enterprise customers can deploy entirely within their own AWS or Azure infrastructure with zero data leaving their environment.
Currently: CSV, JSON, Supabase/PostgreSQL direct connection, and Google BigQuery public datasets (Pro and Enterprise). Q3 2026 roadmap includes Airtable, Notion databases, Salesforce objects, and Snowflake.
One natural language question from a user equals one query, regardless of how many records were searched to formulate the answer. Follow-up questions in the same conversation count individually. Page loads and widget opens do not count.
Datasets under 10,000 records are fully indexed within 2–5 minutes of upload. The widget is available for testing immediately while indexing completes in the background.
Yes — white-label is available on Enterprise plans. Fully customize widget branding, remove all InsightsAssist attribution, deploy under your own domain, and manage multiple client datasets from one admin account.