MEDIGEN logoReviewOSMediGen strategy portal

Product requirements

The full product vision for MediGen's Semantic Search Engine: ReviewOS.

The PRD's first build target is MVP Stage 0: a Corpus API + MCP server. It should prove retrieval quality, citation coverage, permissions, and auditability before ReviewOS expands into ingestion, synthesis, and workflow.

MVP Stage 0: Corpus API + MCP server

Given an approved corpus and a high-stakes question, ReviewOS should return ranked passages with exact source text, document metadata, scores, filters, and permission-safe citations through a read-only API and MCP tool.

Out of scope

User document upload, answer synthesis, contradiction detection, review queues, approvals, memo templates, multimodal search, fine-tuning, broad eDiscovery replacement, and autonomous legal or regulatory judgment.

Release tags

TagReleaseProduct form
MVP Stage 0Corpus API + MCP serverRead-only retrieval substrate over the existing corpus.
MVP Stage 1Document ingestionControlled corpus maintenance and refresh workflows.
MVP Stage 2Citation + cross-reference agentGrounded synthesis, contradiction checks, and evidence packets.
MVP Stage 3ReviewOS applicationProjects, queues, approvals, templates, exports, analytics, and audit.
X1-X3ExtensionsMultimodal search, thinking-model expansion, and recursive retrieval optimization.

MVP Stage 0 functional requirements

IDRequirementPriority
FR-001Ingest the existing ~50K-document corpus.Must
FR-002Parse PDF, DOCX, TXT, and spreadsheet files into text plus structure.Must
FR-003Chunk documents into retrievable passages.Must
FR-004Extract and store document ID, title, type, source, date, page, and section metadata.Must
FR-005Create vector embeddings for semantic retrieval.Must
FR-006Maintain a keyword/BM25 index for exact terms, IDs, clauses, dates, and references.Must
FR-007Perform hybrid retrieval with configurable weighting.Must
FR-008Support metadata filters for type, date range, source, project, confidentiality, study, and compound.Must
FR-009Expose a read-only retrieval API.Must
FR-010Expose at least one MCP-compatible read-only retrieval tool.Must
FR-011Return structured citation-first evidence objects with source text and retrieval metadata.Should
FR-012Enforce document-level permissions with no cross-scope citation leakage.Should
FR-013Log query, retrieved chunks, latency, errors, tool used, and session.Must
FR-014Log every failed parse with reason.Must
FR-015Ship a golden-set eval harness with 30-50 known-answer questions.Must
FR-016Ship an adversarial prompt set that must fail safely.Must
FR-017Capture usefulness rating and structured failure reason when offered.Should
FR-018Provide indexing-health status reporting.Should

Evaluation

MVP Stage 0 ships with a 30-50 question golden set and adversarial prompt set. Gate on top-5 retrieval accuracy, citation metadata coverage, latency, and safe-fail behavior.

Security

Document-level permissions are enforced before retrieval returns any source text. MVP Stage 0 stays read-only and logs queries, returned chunks, latency, errors, and sessions.

Analytics

Capture query text, filters, chunks, scores, latency, errors, most queried documents, failed retrievals, usefulness ratings, and opened citations.

Success metrics

MetricBaselineTargetRelease
Time to first useful sourceHours-days<120 secMVP Stage 0
Known-answer retrieval accuracyUncaptured>80% top-5MVP Stage 0
Citation metadata coverageVariable>95% of chunksMVP Stage 0
Retrieval latencyn/a<10 secMVP Stage 0
Factual-claim citation rateUncaptured>95%MVP Stage 2
End-to-end cycle timeDays>20% fasterMVP Stage 3

Non-functional requirements

  • Every returned passage must trace to an exact source document and location.
  • Permissions are enforced at retrieval; restricted documents cannot leak through citations.
  • MVP Stage 0 is read-only: no writes, no workflow app, no autonomous synthesis ownership.
  • Failed parses are logged and visible; documents are never silently dropped.
  • Model confidence is separate from evidence correctness and citation coverage.
  • Provider abstraction prevents lock-in and keeps architecture model-agnostic.