Project
EventQuote AI
Retrieval-augmented quoting for enterprise event vendors
- Role
- ML Engineer & Backend Lead
- Stack
- FastAPI, PostgreSQL, Redis, Docker, Cloudflare, OpenAI, pgvector
- Metrics
- 24k requests/day, p99 180ms, 99.8% uptime over 90 days
Event vendors spend hours assembling quotes manually — pulling specs from PDFs, checking vendor availability, formatting line items. The process is inconsistent enough to lose deals to competitors who respond faster. EventQuote AI replaces that workflow with a retrieval-augmented pipeline that reads a client brief and generates a structured, catalog-grounded quote in under two minutes.
What I built
The core is a three-stage RAG pipeline. Ingestion: vendor rate sheets and product catalogs are chunked, embedded, and stored in PostgreSQL with pgvector; a background worker keeps embeddings current as vendors update pricing. Retrieval: at query time the client brief is embedded, top-k chunks are fetched by cosine similarity, and a reranking pass filters noise before generation. Generation: an LLM with a structured output schema assembles the quote; line items that don’t trace back to retrieved catalog entries are flagged for human review rather than silently hallucinated.
FastAPI serves the API layer. Redis caches embedding lookups for repeat queries. Everything runs in Docker on Cloudflare Workers backed by a managed Postgres instance.
Hard parts
Catalog freshness vs. query latency. Vendors update pricing frequently, so stale embeddings produce wrong quotes. The naive fix — re-embed on every update — blocked reads. Decoupling ingestion into an async worker with a write-ahead log solved this: the query path always reads from a consistent snapshot while the worker catches up.
Hallucination containment. LLMs will confidently invent plausible-sounding line items. A post-generation validation pass checks every quoted item against the retrieval context; anything ungrounded gets quarantined. This added ~30ms to p99 latency but eliminated the category of error that most eroded vendor trust.
Retrieval quality floor
Result
45 minutes of manual quoting, down to under 2 minutes.
24k requests/day at p99 180ms end-to-end. 99.8% uptime over 90 days in production. The retrieval-quality floor from hybrid search reduced quote revision requests by roughly a third compared to dense-only retrieval in A/B testing.