Product requirements
The full product vision for MediGen's Semantic Search Engine: ReviewOS.
The PRD's first build target is MVP Stage 0: a Corpus API + MCP server. It should prove retrieval quality, citation coverage, permissions, and auditability before ReviewOS expands into ingestion, synthesis, and workflow.
MVP Stage 0: Corpus API + MCP server
Given an approved corpus and a high-stakes question, ReviewOS should return ranked passages with exact source text, document metadata, scores, filters, and permission-safe citations through a read-only API and MCP tool.
Out of scope
User document upload, answer synthesis, contradiction detection, review queues, approvals, memo templates, multimodal search, fine-tuning, broad eDiscovery replacement, and autonomous legal or regulatory judgment.
Release tags
| Tag | Release | Product form |
|---|
| MVP Stage 0 | Corpus API + MCP server | Read-only retrieval substrate over the existing corpus. |
| MVP Stage 1 | Document ingestion | Controlled corpus maintenance and refresh workflows. |
| MVP Stage 2 | Citation + cross-reference agent | Grounded synthesis, contradiction checks, and evidence packets. |
| MVP Stage 3 | ReviewOS application | Projects, queues, approvals, templates, exports, analytics, and audit. |
| X1-X3 | Extensions | Multimodal search, thinking-model expansion, and recursive retrieval optimization. |
MVP Stage 0 functional requirements
| ID | Requirement | Priority |
|---|
| FR-001 | Ingest the existing ~50K-document corpus. | Must |
| FR-002 | Parse PDF, DOCX, TXT, and spreadsheet files into text plus structure. | Must |
| FR-003 | Chunk documents into retrievable passages. | Must |
| FR-004 | Extract and store document ID, title, type, source, date, page, and section metadata. | Must |
| FR-005 | Create vector embeddings for semantic retrieval. | Must |
| FR-006 | Maintain a keyword/BM25 index for exact terms, IDs, clauses, dates, and references. | Must |
| FR-007 | Perform hybrid retrieval with configurable weighting. | Must |
| FR-008 | Support metadata filters for type, date range, source, project, confidentiality, study, and compound. | Must |
| FR-009 | Expose a read-only retrieval API. | Must |
| FR-010 | Expose at least one MCP-compatible read-only retrieval tool. | Must |
| FR-011 | Return structured citation-first evidence objects with source text and retrieval metadata. | Should |
| FR-012 | Enforce document-level permissions with no cross-scope citation leakage. | Should |
| FR-013 | Log query, retrieved chunks, latency, errors, tool used, and session. | Must |
| FR-014 | Log every failed parse with reason. | Must |
| FR-015 | Ship a golden-set eval harness with 30-50 known-answer questions. | Must |
| FR-016 | Ship an adversarial prompt set that must fail safely. | Must |
| FR-017 | Capture usefulness rating and structured failure reason when offered. | Should |
| FR-018 | Provide indexing-health status reporting. | Should |
Evaluation
MVP Stage 0 ships with a 30-50 question golden set and adversarial prompt set. Gate on top-5 retrieval accuracy, citation metadata coverage, latency, and safe-fail behavior.
Security
Document-level permissions are enforced before retrieval returns any source text. MVP Stage 0 stays read-only and logs queries, returned chunks, latency, errors, and sessions.
Analytics
Capture query text, filters, chunks, scores, latency, errors, most queried documents, failed retrievals, usefulness ratings, and opened citations.
Success metrics
| Metric | Baseline | Target | Release |
|---|
| Time to first useful source | Hours-days | <120 sec | MVP Stage 0 |
| Known-answer retrieval accuracy | Uncaptured | >80% top-5 | MVP Stage 0 |
| Citation metadata coverage | Variable | >95% of chunks | MVP Stage 0 |
| Retrieval latency | n/a | <10 sec | MVP Stage 0 |
| Factual-claim citation rate | Uncaptured | >95% | MVP Stage 2 |
| End-to-end cycle time | Days | >20% faster | MVP Stage 3 |
Non-functional requirements
- Every returned passage must trace to an exact source document and location.
- Permissions are enforced at retrieval; restricted documents cannot leak through citations.
- MVP Stage 0 is read-only: no writes, no workflow app, no autonomous synthesis ownership.
- Failed parses are logged and visible; documents are never silently dropped.
- Model confidence is separate from evidence correctness and citation coverage.
- Provider abstraction prevents lock-in and keeps architecture model-agnostic.