Why this exists
Most "AI chatbots" are bare wrappers around the model, which means they confidently invent answers the moment a question goes off-topic. For support use cases that's worse than no bot at all. This one is built around the opposite default: answer only from indexed source documents, cite which file the answer came from, and refuse the question if no relevant context is found.
What's built
- Drag-and-drop ingestion. The admin uploads a PDF, TXT, or MD file; the server parses it, chunks it, embeds each chunk, and writes vectors to Pinecone with the source filename in the metadata. Chunk count is tracked in Postgres so the admin UI can show "doc X — 42 chunks" and remove the doc cleanly.
- Retrieval-augmented prompting. Each user message triggers a top-5 vector lookup; the matching chunks are stitched into the system prompt as a
## Knowledge baseblock before the model sees the question. - Streaming chat UI. Tokens stream from the API route to the browser via the Vercel AI SDK, so users see the answer build in real time.
- Persona with rules of engagement. The system prompt defines an
Ariapersona, but more importantly defines what not to do — no guessing, no competitor talk, no asking for credentials, escalate after three failed attempts. - Admin console.
/adminlists indexed documents with chunk counts, supports delete, and validates upload types before ingestion.
Technical choices worth calling out
Pinecone serverless, 1024-dim cosine
The embedding model is llama-text-embed-v2 (1024 dimensions). Cosine distance over normalised vectors gives stable nearest-neighbour scores, and Pinecone Serverless means there's no idle pod cost between conversations.
"Don't know" is the default, not the fallback
If retrieval returns zero relevant chunks, the prompt builder strips the knowledge-base block entirely, and the persona's rule kicks in: "Never guess. If you are unsure, say so and suggest the user contact a human agent." This pushes the model toward refusing a question rather than confabulating — the most common production failure mode for support bots.
Source attribution baked into the prompt
Every retrieved chunk is prefixed with [1] (from: returns-policy.pdf) etc. The prompt instructs the model to mention file names naturally ("According to our returns policy…"), so users see where an answer came from without needing a UI for it.
Postgres tracks documents, Pinecone tracks vectors
Two-store design: Pinecone holds the vectors and chunk text; Neon Postgres holds the canonical document record (filename, chunk count, upload time). Listing docs in the admin doesn't hit Pinecone, and deleting a doc cascades both stores in one server action.
Outcome
The same RAG pipeline drops into client products that need grounded AI Q&A — internal knowledge bases, customer support, document portals — without rewriting the persona logic, retrieval pattern, or two-store data model.