# ReviewOS — Prioritized 3-Month Roadmap

**Audience:** Product Team · **Companion to:** [PRD](tenex_prd.md), [Executive Memo](tenex_executive_memo.md)
**Plan window:** 12 weeks · six 2-week sprints · Fibonacci story points (1/2/3/5/8/13)
**Strategy:** Build the substrate first (`[M0]`→`[M1]`→`[M2]`); productize the app (`[M3]`) only after usage proves the workflows.

---

## 1. Team & Capacity (Lean Pod)

| Role | Alloc. | Code | Focus |
|---|---|---|---|
| AI/ML Engineer | Full-time | **AIE** | Parsing, indexing, hybrid retrieval, MCP, citation agent, evals |
| Full-Stack Engineer | Full-time | **FSE** | API, services, ingestion, permissions, logging, dashboard |
| Product / Strategist (acting PM) | ~50% | **PM** | Eval sets, pilot, prioritization, discovery, go/no-go |

**Velocity assumptions:** 2 full-time engineers carry the points. Sprint 1 ramps lower (setup); steady-state ≈ 28–31 pts/sprint. PM coordinates and owns non-build artifacts (eval design, pilot, discovery). Total planned: **170 points**.

| Sprint | Weeks | Target pts | Theme | Milestone |
|---|---|---|---|---|
| S1 | 1–2 | 25 | Foundations & corpus assessment | Eval set ready |
| S2 | 3–4 | 29 | `[M0]` ingestion & indexing | Corpus indexed |
| S3 | 5–6 | 31 | `[M0]` retrieval, API, MCP | **`[M0]` ships → pilot** |
| S4 | 7–8 | 28 | `[M1]` ingestion workflow | **`[M1]` ships** |
| S5 | 9–10 | 29 | `[M2]` citation agent core | Agent in pilot |
| S6 | 11–12 | 28 | `[M2]` harden + `[M3]` discovery | **`[M2]` ships · `[M3]` go/no-go** |

### Timeline (Gantt)

Renders in any Mermaid-capable viewer (GitHub, VS Code, Notion embed). Dates are illustrative (weekends excluded; 10 working days = one sprint); `crit` marks the two shipping gates.

```mermaid
gantt
    title ReviewOS — 12-Week Delivery Plan (illustrative start)
    dateFormat YYYY-MM-DD
    axisFormat %b %d
    excludes weekends

    section [M0] Substrate
    S1 Foundations & corpus assessment :s1, 2026-06-01, 10d
    Eval set ready                     :milestone, msEval, after s1, 0d
    S2 Ingestion & indexing            :s2, after s1, 10d
    Corpus indexed                     :milestone, msIndex, after s2, 0d
    S3 Retrieval, API, MCP             :s3, after s2, 10d
    M0 ships -> pilot                  :milestone, crit, msM0, after s3, 0d

    section [M1] Ingestion
    S4 Ingestion workflow + pilot      :s4, after s3, 10d
    M1 ships                           :milestone, msM1, after s4, 0d

    section [M2] Citation agent
    S5 Citation agent core             :s5, after s4, 10d
    S6 Harden + M3 discovery           :s6, after s5, 10d
    M2 ships                           :milestone, crit, msM2, after s6, 0d

    section [M3] App (out of window)
    M3 go/no-go decision               :milestone, msGo, after s6, 0d
    M3 app build (deferred)            :m3build, after s6, 20d
```

---

## 2. Sprint Backlog (story-pointed)

The **Ref** column ties each story to its PRD requirement(s) so scope is traceable.

### Sprint 1 — Foundations & Corpus Assessment (25 pts)

| Story | Pts | Owner | Release | Ref |
|---|---|---|---|---|
| Project scaffolding: repo, CI, infra baseline, provider-abstraction interfaces | 5 | FSE | `[M0]` | NFR-009 |
| Provision client-approved hosting / database environment | 3 | FSE | `[M0]` | §6, NFR-003, OpenQ3 |
| Corpus inventory: file types, sizes, dates, metadata availability | 3 | PM/AIE | — | Phase 0 |
| Parsing spike: LlamaParse on document sample, quality assessment | 5 | AIE | `[M0]` | FR-002 |
| Build golden eval set: 30–50 known-answer questions w/ expected passages | 5 | PM/AIE | `[M0]` | FR-015 |
| Adversarial / red-team prompt set | 2 | PM | `[M0]` | FR-016 |
| Access & permission constraints discovery | 2 | PM/FSE | `[M0]` | FR-012, NFR-003 |

### Sprint 2 — `[M0]` Ingestion & Indexing (29 pts)

| Story | Pts | Owner | Release | Ref |
|---|---|---|---|---|
| Document parser pipeline (PDF/DOCX/TXT/spreadsheet) → text + structure | 8 | AIE | `[M0]` | FR-001, FR-002 |
| Section-aware chunking + metadata extraction | 5 | AIE | `[M0]` | FR-003, FR-004 |
| Vector index + embeddings (semantic) | 5 | AIE | `[M0]` | FR-005 |
| Keyword/BM25 index (exact terms, IDs, clauses, dates, refs) | 5 | FSE | `[M0]` | FR-006 |
| Failed-parse logging + indexing-health status | 3 | FSE | `[M0]` | FR-014, FR-018 |
| Permission data model (access_scope, confidentiality_level) | 3 | FSE | `[M0]` | FR-012, NFR-003 |

### Sprint 3 — `[M0]` Retrieval, API, MCP → **ship** (31 pts)

| Story | Pts | Owner | Release | Ref |
|---|---|---|---|---|
| Hybrid retriever (semantic + keyword fusion, configurable weighting) | 8 | AIE | `[M0]` | FR-007 |
| Metadata filters (type, date, source, project, confidentiality) | 3 | FSE | `[M0]` | FR-008 |
| Citation-first evidence object contract + read-only REST API | 5 | FSE | `[M0]` | FR-009, FR-011, NFR-001 |
| MCP server exposing read-only retrieval tool | 5 | AIE | `[M0]` | FR-010, NFR-004 |
| Permission enforcement at retrieval (no cross-scope leakage) | 5 | FSE | `[M0]` | FR-012, NFR-003 |
| Query/retrieval logging + usefulness rating + failure reason | 3 | FSE | `[M0]` | FR-013, FR-017, NFR-008 |
| Eval harness run + tune to >80% top-5, >95% citation coverage | 2 | AIE/PM | `[M0]` | FR-015, §14 |

### Sprint 4 — `[M1]` Ingestion Workflow + Pilot (28 pts)

| Story | Pts | Owner | Release | Ref |
|---|---|---|---|---|
| Upload endpoint (supported types) + required metadata capture | 5 | FSE | `[M1]` | FR-101, FR-103 |
| Batch ingestion | 3 | FSE | `[M1]` | FR-102 |
| Incremental index refresh (no full reindex) | 5 | AIE | `[M1]` | FR-104 |
| Duplicate / near-duplicate detection | 5 | AIE | `[M1]` | FR-105 |
| Failed-ingestion queue + deprecation/archive/remove | 5 | FSE | `[M1]` | FR-106, FR-107 |
| Ingestion audit log (uploader, timestamp, source, status) | 3 | FSE | `[M1]` | FR-108 |
| Pilot onboarding + structured feedback capture | 2 | PM | `[M0]`/`[M1]` | FR-017, §16 |

### Sprint 5 — `[M2]` Citation Agent Core (29 pts)

| Story | Pts | Owner | Release | Ref |
|---|---|---|---|---|
| Query decomposition (complex Q → sub-queries) | 5 | AIE | `[M2]` | FR-201 |
| Multi-pass retrieval loop (search→inspect→refine) | 8 | AIE | `[M2]` | FR-202 |
| Cross-reference / contradiction / gap detection | 8 | AIE | `[M2]` | FR-203 |
| Citation validation (material claims → source passages) | 5 | AIE/FSE | `[M2]` | FR-204, NFR-002 |
| Structured evidence packet output | 3 | FSE | `[M2]` | FR-205 |

### Sprint 6 — `[M2]` Harden + `[M3]` Discovery (28 pts)

| Story | Pts | Owner | Release | Ref |
|---|---|---|---|---|
| Confidence scoring + low-confidence/high-risk routing | 5 | AIE | `[M2]` | FR-206, FR-208 |
| Guardrails: safe-fail on adversarial set (100%) | 5 | AIE | `[M2]` | FR-207, FR-016 |
| Security review: permissions, leakage, read/write boundary | 5 | FSE/PM | `[M2]` | NFR-003, NFR-004 |
| `[M2]` eval gate: factual-claim citation >95%, unsupported <5% | 3 | AIE/PM | `[M2]` | §14 |
| Quality / eval dashboard | 5 | FSE | `[M2]` | §16 |
| `[M3]` discovery: query-log analysis, workflow prioritization, `[M3]` PRD + go/no-go | 5 | PM | `[M3]` | FR-301–310, §17 |

---

## 3. What We Cut or Defer (high-cost, low-value first)

| Item | Decision | Why |
|---|---|---|
| `[M3]` ReviewOS application build | **Out of 3-month window** | Custom UI, queues, approvals, templates, exports, RBAC — don't build until `[M0]`–`[M2]` prove which workflows repeat. Discovery + PRD only in S6. |
| Autonomous legal/regulatory judgment | **Cut** | High-risk, unnecessary for first value; humans keep accountability |
| Deep per-DMS integrations | **Defer** | API/upload/export is enough to prove value |
| Multimodal document search `[X1]` | **Defer** | Complexity-heavy; only if corpus demands it |
| Model fine-tuning | **Cut** | RAG + hybrid retrieval + structured prompting suffices |
| AI expanded search w/ Thinking model `[X2]` | **Defer** | Needs `[M0]`–`[M2]` substrate first; adds latency/cost |
| Retrieval optimization agent `[X3]` | **Defer** | Needs query logs + eval failures from `[M0]`/`[M2]` first |
| Full RBAC | **Defer to `[M3]`** | `[M0]` ships with document-level permission scoping; full roles later |
| Output templates / exports | **Defer to `[M3]`** | Need usage evidence before locking formats |

---

## 4. Milestones & Gates

- **End S1:** Golden + adversarial eval sets ready; corpus readiness report; `[M0]` build plan locked.
- **End S3 — `[M0]` ships:** Searchable, citable corpus callable from existing AI tools. **Gate:** top-5 ≥ 80%, citation coverage ≥ 95%, latency < 10s, adversarial safe-fail = 100%. Pilot opens.
- **End S4 — `[M1]` ships:** Corpus maintainable by authorized users.
- **End S6 — `[M2]` ships:** Grounded synthesis with evidence packets. **Gate:** factual-claim citation > 95%, unsupported material claims < 5%. **`[M3]` go/no-go decision** based on real usage data.

---

## 5. Dependencies & Risks

| Risk / dependency | Impact | Mitigation |
|---|---|---|
| Corpus access & permissions not provisioned by S1 | Blocks ingestion (S2) | Make data/IT owner access the first PM deliverable; assess in S1 |
| Parse quality poor on scanned/visual docs | Retrieval gaps | S1 parsing spike surfaces early; flag visual-heavy docs, defer to `[X1]` |
| Hybrid retrieval misses 80% target | `[M0]` gate fails | Reranking held as S3 buffer; eval-driven tuning before ship |
| Hosting / data-policy (commercial LLM vs on-prem) undecided | Architecture rework | Open question in PRD §18; resolve before S2 |
| Lean pod has no dedicated QA/security | Quality/defensibility risk | Security review explicitly pointed in S6; evals gate every release |
| Pilot users unavailable | No adoption signal for `[M3]` | Recruit pilot users during S1–S3, before `[M0]` ships |

---

## 6. Notes

### Story points → effort days (reference for non-agile stakeholders)

Story points measure relative size and uncertainty, not a fixed time. For readers who think in days, the table below converts points to **ideal engineering-days** at this pod's planning velocity (~30 pts / 2-week sprint across 2 engineers ≈ **~0.67 engineer-day per point**). Treat these as planning estimates, not commitments.

| Points | ≈ Ideal eng-days | Rough meaning |
|---|---|---|
| 1 | ~0.5 day | Trivial, well-understood change |
| 2 | ~1 day | Small, low-risk task |
| 3 | ~2 days | Moderate task, some unknowns |
| 5 | ~3–4 days | Substantial; multiple parts |
| 8 | ~1 week | Large; meaningful complexity/uncertainty |
| 13 | ~1.5–2 weeks | Very large — should usually be split |

**Plan totals:** 170 points ≈ **~115 ideal engineer-days**, sequenced across 12 weeks for a 2-engineer build team (plus ~50% PM). This sits just inside the ~120 engineer-day capacity of the window, leaving limited slack — see §5 risks.

### Porting to Notion / Linear

This markdown is the source of truth and ports cleanly to Notion/Linear: each Sprint = a cycle, each story row = an issue (Title, Points, Owner/Assignee, Release label, Ref, Status). The cut-list (§3) maps to a "Won't Do / Backlog" view. A CSV export can be generated on request for direct import.