Product requirements

The full product vision for MediGen's Semantic Search Engine: ReviewOS.

The PRD's first build target is MVP Stage 0: a Corpus API + MCP server. It should prove retrieval quality, citation coverage, permissions, and auditability before ReviewOS expands into ingestion, synthesis, and workflow.

Download full PRD

MVP Stage 0: Corpus API + MCP server

Given an approved corpus and a high-stakes question, ReviewOS should return ranked passages with exact source text, document metadata, scores, filters, and permission-safe citations through a read-only API and MCP tool.

Out of scope

User document upload, answer synthesis, contradiction detection, review queues, approvals, memo templates, multimodal search, fine-tuning, broad eDiscovery replacement, and autonomous legal or regulatory judgment.

Release tags

Tag	Release	Product form
MVP Stage 0	Corpus API + MCP server	Read-only retrieval substrate over the existing corpus.
MVP Stage 1	Document ingestion	Controlled corpus maintenance and refresh workflows.
MVP Stage 2	Citation + cross-reference agent	Grounded synthesis, contradiction checks, and evidence packets.
MVP Stage 3	ReviewOS application	Projects, queues, approvals, templates, exports, analytics, and audit.
X1-X3	Extensions	Multimodal search, thinking-model expansion, and recursive retrieval optimization.

MVP Stage 0 functional requirements

ID	Requirement	Priority
FR-001	Ingest the existing ~50K-document corpus.	Must
FR-002	Parse PDF, DOCX, TXT, and spreadsheet files into text plus structure.	Must
FR-003	Chunk documents into retrievable passages.	Must
FR-004	Extract and store document ID, title, type, source, date, page, and section metadata.	Must
FR-005	Create vector embeddings for semantic retrieval.	Must
FR-006	Maintain a keyword/BM25 index for exact terms, IDs, clauses, dates, and references.	Must
FR-007	Perform hybrid retrieval with configurable weighting.	Must
FR-008	Support metadata filters for type, date range, source, project, confidentiality, study, and compound.	Must
FR-009	Expose a read-only retrieval API.	Must
FR-010	Expose at least one MCP-compatible read-only retrieval tool.	Must
FR-011	Return structured citation-first evidence objects with source text and retrieval metadata.	Should
FR-012	Enforce document-level permissions with no cross-scope citation leakage.	Should
FR-013	Log query, retrieved chunks, latency, errors, tool used, and session.	Must
FR-014	Log every failed parse with reason.	Must
FR-015	Ship a golden-set eval harness with 30-50 known-answer questions.	Must
FR-016	Ship an adversarial prompt set that must fail safely.	Must
FR-017	Capture usefulness rating and structured failure reason when offered.	Should
FR-018	Provide indexing-health status reporting.	Should

Evaluation

MVP Stage 0 ships with a 30-50 question golden set and adversarial prompt set. Gate on top-5 retrieval accuracy, citation metadata coverage, latency, and safe-fail behavior.

Security

Document-level permissions are enforced before retrieval returns any source text. MVP Stage 0 stays read-only and logs queries, returned chunks, latency, errors, and sessions.

Analytics

Capture query text, filters, chunks, scores, latency, errors, most queried documents, failed retrievals, usefulness ratings, and opened citations.

Success metrics

Metric	Baseline	Target	Release
Time to first useful source	Hours-days	<120 sec	MVP Stage 0
Known-answer retrieval accuracy	Uncaptured	>80% top-5	MVP Stage 0
Citation metadata coverage	Variable	>95% of chunks	MVP Stage 0
Retrieval latency	n/a	<10 sec	MVP Stage 0
Factual-claim citation rate	Uncaptured	>95%	MVP Stage 2
End-to-end cycle time	Days	>20% faster	MVP Stage 3

Non-functional requirements

Every returned passage must trace to an exact source document and location.
Permissions are enforced at retrieval; restricted documents cannot leak through citations.
MVP Stage 0 is read-only: no writes, no workflow app, no autonomous synthesis ownership.
Failed parses are logged and visible; documents are never silently dropped.
Model confidence is separate from evidence correctness and citation coverage.
Provider abstraction prevents lock-in and keeps architecture model-agnostic.