Project2025

ToS RAG Analyzer

Grounded legal answers

Retrieval-first AI

Scroll to Explore

Overview

A Retrieval-Augmented Generation system that makes Terms of Service and privacy policies transparent. Every answer is grounded in real policy text through a vector retrieval pipeline, so plain-English explanations always trace back to the clauses they came from.

View Live App Github

Year

2025

Role

AI Systems Engineer

Featured Stack

Python
Streamlit
FAISS
Hugging Face

Tech Stack

Technologies and tools used to bring this project to life.

Python
Streamlit
FAISS
Hugging Face
Sentence Transformers (all-MiniLM-L6-v2)
OpenAI GPT-3.5
OPP-115
RAG

My Role

AI Systems Engineer • A few of the surfaces I shaped on this project

Vector pipeline and retrieval

Policy chunks are embedded with Hugging Face all-MiniLM-L6-v2 and indexed with FAISS, so each user query is matched against semantic vectors rather than keyword overlap. The result is sub-second top-k retrieval that stays accurate across phrasing variations.

Common-concerns templates

A curated set of the questions users actually have about a policy (data sharing, retention, third parties, opt-outs) gives them a one-click way to interrogate any document without having to formulate the query from scratch.

Grounded answers with citations

Every response surfaces the plain-English answer alongside the policy chunks it was derived from. Because generation is constrained to the retrieved context, the citations are not decorative; they are the only source the LLM is allowed to reason from.

Pluggable generation backends

The same retrieved context can be routed through GPT-3.5 or through an open-source Hugging Face model, making the system useful both for higher-quality answers and for fully self-hosted, cost-free deployments.

Validated on live policies

Tested end-to-end against real-world Terms of Service the system was not pre-loaded with (UberEats among others) to confirm that retrieval quality and grounding hold up when the corpus is swapped at runtime.

Notes

The ToS RAG Analyzer is a retrieval-augmented AI system designed to make Terms of Service and privacy policies easier to understand through grounded, citation-backed answers.

I developed the full RAG pipeline, including document chunking, semantic retrieval, vector indexing, and constrained generation workflows. A key challenge was reducing hallucination while keeping responses understandable for non-technical users.

The system uses FAISS vector search, Hugging Face sentence transformers, and interchangeable GPT/Open-source generation backends, all deployed through a Streamlit interface.

Experiences & Works

Projects & Case Studies