Exit
2025

ToS RAG Analyzer

Grounded legal answers
Retrieval-first AI
Scroll to Explore
ToS RAG Analyzer — hero
Overview

A Retrieval-Augmented Generation system that makes Terms of Service and privacy policies transparent. Every answer is grounded in real policy text through a vector retrieval pipeline, so plain-English explanations always trace back to the clauses they came from.

Year

2025

Role

AI Systems Engineer

Featured Stack
  • Python
  • Streamlit
  • FAISS
  • Hugging Face
ToS RAG Analyzer — gallery 1
ToS RAG Analyzer — gallery 2

Tech Stack

Technologies and tools used to bring this project to life.

  • Python
  • Streamlit
  • FAISS
  • Hugging Face
  • Sentence Transformers (all-MiniLM-L6-v2)
  • OpenAI GPT-3.5
  • OPP-115
  • RAG
ToS RAG Analyzer — showcase

My Role

AI Systems Engineer A few of the surfaces I shaped on this project

ToS RAG Analyzer — Vector pipeline and retrieval

Vector pipeline and retrieval

Policy chunks are embedded with Hugging Face all-MiniLM-L6-v2 and indexed with FAISS, so each user query is matched against semantic vectors rather than keyword overlap. The result is sub-second top-k retrieval that stays accurate across phrasing variations.

ToS RAG Analyzer — Common-concerns templates

Common-concerns templates

A curated set of the questions users actually have about a policy (data sharing, retention, third parties, opt-outs) gives them a one-click way to interrogate any document without having to formulate the query from scratch.

ToS RAG Analyzer — Grounded answers with citations

Grounded answers with citations

Every response surfaces the plain-English answer alongside the policy chunks it was derived from. Because generation is constrained to the retrieved context, the citations are not decorative; they are the only source the LLM is allowed to reason from.

ToS RAG Analyzer — Pluggable generation backends

Pluggable generation backends

The same retrieved context can be routed through GPT-3.5 or through an open-source Hugging Face model, making the system useful both for higher-quality answers and for fully self-hosted, cost-free deployments.

ToS RAG Analyzer — Validated on live policies

Validated on live policies

Tested end-to-end against real-world Terms of Service the system was not pre-loaded with (UberEats among others) to confirm that retrieval quality and grounding hold up when the corpus is swapped at runtime.

Notes

The ToS RAG Analyzer is a retrieval-augmented AI system designed to make Terms of Service and privacy policies easier to understand through grounded, citation-backed answers.

I developed the full RAG pipeline, including document chunking, semantic retrieval, vector indexing, and constrained generation workflows. A key challenge was reducing hallucination while keeping responses understandable for non-technical users.

The system uses FAISS vector search, Hugging Face sentence transformers, and interchangeable GPT/Open-source generation backends, all deployed through a Streamlit interface.

Experiences & Works