Skip to content
owerczuk.dev
Back to Blog
RAG
Vector Database
PageIndex
AI Architecture
LLM
Production

PageIndex vs Vector Database: Why Vectorless RAG Might Be the Smarter Choice in 2026

PageIndex hit 98.7% accuracy on FinanceBench while traditional vector RAG sits at 60-80%. We break down the architecture differences, benchmark numbers, and a practical decision framework for choosing the right retrieval approach.

March 15, 202620 min

What if the entire foundation of your RAG pipeline — the chunking, the embeddings, the vector similarity search — is the reason your AI keeps hallucinating on complex documents?

If you've spent any time building retrieval-augmented generation systems, you know the pain. You chunk a 200-page SEC filing into neat little pieces, embed them into Pinecone or Qdrant, fire off a query, and get back... fragments that miss the actual answer by three sections. Traditional vector databases have carried RAG this far, but a new open-source project called PageIndex just posted 98.7% accuracy on FinanceBench — while vector-based RAG systems typically land somewhere between 60% and 80% on the same benchmark. That gap is hard to ignore.

In this article, we'll break down how PageIndex actually works, where vector databases still shine, and help you figure out which approach fits your use case. We'll cover the core architecture differences, real benchmark numbers, cost and complexity tradeoffs, and practical scenarios where each approach wins.

VECTOR RAGPAGEINDEXchunk_14chunk_03chunk_27chunk_08chunk_19chunk_31chunk_02chunk_45context lost10-K FilingFinancialsOperationsRisk FactorsIncome StmtBalance SheetRevenueOp. Expensesstructure kept

How Vector Database RAG Actually Works (And Where It Breaks)

Before we talk about what PageIndex does differently, let's be honest about what vector database RAG does well — and where it falls apart.

The standard RAG pipeline goes like this: take your documents, split them into chunks (usually 500-1000 tokens), run each chunk through an embedding model to get a numerical vector, store those vectors in a database like Pinecone, Weaviate, Qdrant, or Milvus, and then at query time, embed the user's question and find the most "similar" chunks by cosine distance.

This works surprisingly well for simple lookups. Need to find a paragraph that discusses "employee benefits policy"? Vector similarity nails it. The embedding captures semantic meaning, and the closest vectors usually contain relevant text.

But here's where things get ugly. Take a financial analyst asking: "What was the year-over-year change in operating margin for Q3 2025 compared to Q3 2024?" To answer that, your system needs data from two different sections of a 10-K filing, needs to understand the relationship between those sections, and needs to perform a calculation. A vector similarity search doesn't reason — it just finds text that looks similar to the question. And "looks similar" and "contains the answer" are two very different things.

Marco, a machine learning engineer at a fintech startup in Berlin, learned this the hard way. His team spent four months building a RAG pipeline with Qdrant for analyzing earnings reports. The retrieval accuracy on simple factual questions was around 78%. But when analysts asked comparative or multi-step questions — the kind that actually matter in financial analysis — accuracy dropped to roughly 40%. They tried recursive chunking, semantic chunking, hybrid search with BM25. Each tweak moved the needle by a few percentage points. Nothing came close to the reliability their compliance team needed.

VECTOR RAG PIPELINEDocumentChunkingEmbeddingVector DBSimilarityLLMAnswerStructure destroyedTables, hierarchy, cross-refs lostSemantic ≠ Correct"Looks similar" misses answersTwo failure points in a single pipeline

The Chunking Problem Nobody Wants to Talk About

The dirty secret of vector database RAG is that chunking destroys document structure. A 10-K filing has a carefully designed hierarchy: sections, subsections, tables, footnotes, cross-references. When you chop it into 512-token pieces, you lose all of that.

Stack Overflow's engineering blog put it bluntly: "Breaking up is hard to do." There's no single best chunking strategy. Fixed-size chunking is fast but context-blind. Semantic chunking is smarter but expensive. Recursive chunking tries to respect structure but still fragments tables and multi-page narratives.

The result? Your vector database faithfully stores thousands of decontextualized fragments. It retrieves the ones that are semantically closest to your query. But "closest" doesn't mean "correct" — especially when the answer requires understanding how pieces of the document relate to each other.

How PageIndex Works: Trees Instead of Vectors

PageIndex takes a fundamentally different approach. Instead of breaking documents into chunks and embedding them, it builds a hierarchical tree index — essentially a machine-readable table of contents — and uses LLM reasoning to navigate that tree.

The process has two phases.

Phase 1 — Index Generation. PageIndex reads your document and constructs a tree structure that mirrors the document's natural organization. Headers become parent nodes. Subsections become children. Each node gets metadata: a title, page range, summary, and unique identifier. Think of it like a librarian building a card catalog, except the catalog preserves the full structure of every book.

Phase 2 — Reasoning-Based Retrieval. When a query comes in, instead of doing a nearest-neighbor search in vector space, PageIndex sends the tree index to an LLM and asks it to reason about which sections are relevant. The LLM traverses the tree top-down, making decisions at each level: "This question is about operating margins, so I should look under Financial Statements → Income Statement → Operating Expenses." It navigates the document the same way a human expert would.

No embeddings. No vectors. No chunking. Just structured reasoning over document architecture.

CHUNKED: 20 FRAGMENTSTREE INDEX: 4 LEVELSchunk_01chunk_02chunk_03chunk_04chunk_05chunk_06chunk_07chunk_08chunk_09chunk_10chunk_11chunk_12chunk_13chunk_14chunk_15chunk_16chunk_17chunk_18chunk_19chunk_20No relationships between fragmentsCross-references brokenTable data split across chunksHierarchy completely lostSEC 10-KFinancial StmtsMD&ARisk FactorsIncome StmtBalance SheetLiquidityRevenueOp. CostsAssetsLiabilitiesParent-child relationships preservedCross-references maintainedTables stay intact within nodesFull document hierarchy navigable

Why This Matters: The FinanceBench Results

FinanceBench is the standard benchmark for evaluating how well AI systems answer questions about SEC filings. The questions range from simple factual lookups to multi-step calculations requiring cross-referencing multiple sections.

Here's how the numbers stack up:

SystemFinanceBench AccuracyApproach
Mafin 2.5 (PageIndex)98.7%Vectorless tree indexing
Traditional vector RAG60-80%Chunking + embeddings
Perplexity~45%General-purpose retrieval
GPT-4o (no RAG)~31%Pure LLM knowledge

That 98.7% isn't a cherry-picked number — Mafin 2.5 covered 100% of the benchmark questions. And the jump from Mafin 1.0 (38%) to Mafin 2.5 (98.7%) shows how much the tree-indexing approach has matured in a short time.

Want to test PageIndex on your own documents? The open-source repo includes cookbooks and a Colab notebook to get started in under 10 minutes.

Where Vector Databases Still Win

Let's not pretend PageIndex kills vector databases. It doesn't. There are real scenarios where vector search is the better tool.

Scale and Speed

If you're searching across millions of documents — not pages within a single document, but millions of separate documents — vector databases are built for that. Pinecone handles sub-50ms queries at scale. Qdrant, written in Rust, delivers exceptional throughput for on-premises deployments. Milvus can store billions of vectors on NVMe SSDs instead of RAM, cutting infrastructure costs by 10x at massive scale.

PageIndex's LLM reasoning step is inherently slower per query. It's making API calls to GPT-4o (or whatever model you configure) for each retrieval. For a single complex document, that's fine. For a real-time search across a million product descriptions? Vector databases win by a landslide.

Unstructured, Short-Form Content

PageIndex's tree structure shines on long, hierarchical documents — financial filings, legal contracts, technical manuals, academic papers. But if your corpus is a collection of Slack messages, support tickets, or product reviews, there's no inherent hierarchy to exploit. Vector similarity search over short text snippets is still the most practical approach for that kind of data.

Mature Ecosystem and Tooling

Vector databases have years of ecosystem development behind them. Pinecone, Weaviate, Qdrant, Milvus, and Chroma all have robust client libraries, integrations with LangChain and LlamaIndex, managed cloud offerings, monitoring dashboards, and battle-tested production deployments. PageIndex is open-source and growing fast, but it's newer. If you need enterprise support contracts and SLA guarantees today, the vector database ecosystem is further along.

DECISION MATRIXUSE CASEPAGEINDEX BETTERVECTOR DB BETTERLong structured docs (10-K, legal)Multi-step reasoning questionsMillions of short documentsSub-50ms latency requiredRegulated industry complianceSlack / support ticket searchAudit trail for answers

Real-World Decision Framework: Which One Should You Use?

Let's cut through the hype and get practical. Here's a framework for choosing.

Choose PageIndex when:

  • Your documents are long and structured (financial reports, legal filings, regulatory documents, technical manuals)
  • Accuracy on complex, multi-step questions is non-negotiable
  • You need an audit trail showing exactly how the system found its answer
  • Your corpus is measured in hundreds or thousands of documents, not millions
  • You're working in regulated industries where "close enough" retrieval isn't acceptable

Choose a vector database when:

  • You're searching across millions of short-form documents or records
  • Latency matters more than per-query accuracy (sub-50ms requirements)
  • Your content lacks clear hierarchical structure
  • You need real-time semantic search at massive scale
  • Your team already has vector database expertise and infrastructure

Consider a hybrid approach when:

  • You need both: fast initial retrieval across a large corpus (vector search) followed by deep reasoning within retrieved documents (PageIndex)
  • Your pipeline handles mixed content types — some structured, some not
  • You're building a system that routes queries to different retrieval strategies based on complexity

Elena, an AI architect at a compliance SaaS company in Amsterdam, landed on exactly this hybrid setup. Her team uses Weaviate to quickly surface the 5-10 most relevant documents from a library of 50,000 regulatory filings. Then PageIndex takes over, building a tree index of each retrieved document and reasoning through it to extract precise answers. The combination gives her team both the breadth of vector search and the depth of structured reasoning. Retrieval accuracy on their internal benchmark went from 71% (vector-only) to 94% (hybrid).

Cost and Complexity: What Nobody Mentions

There's a practical tradeoff most comparison articles skip: cost.

Vector databases have infrastructure costs that scale with your data. Pinecone's managed service starts affordable but climbs as you add vectors. Weaviate and Qdrant can run self-hosted, but you're paying for compute and memory — and vector search is memory-hungry. At enterprise scale with billions of vectors, you're looking at significant monthly bills.

PageIndex shifts the cost from infrastructure to inference. You're not paying for vector storage or embedding generation. But you are paying for LLM API calls on every retrieval. With GPT-4o, that's roughly $2.50 per million input tokens and $10 per million output tokens. For a corpus of a few thousand documents queried a few hundred times a day, this is often cheaper than running a managed vector database. For millions of queries per day, the LLM costs add up fast.

The sweet spot? PageIndex tends to be more cost-effective for low-to-medium query volume on high-complexity documents. Vector databases tend to be more cost-effective for high-volume queries on large, simple corpora.

FactorPageIndexVector Database
InfrastructureMinimal (just LLM API)Significant (compute + memory + storage)
Per-query costHigher (LLM inference)Lower (vector similarity is cheap)
Setup complexitySimple (pip install + API key)Moderate to high (infra, embeddings, tuning)
Scaling cost curveLinear with queriesLinear with data volume
Best economicsLow volume, high accuracy needsHigh volume, large corpus

Getting Started with PageIndex

If you want to try PageIndex, the setup is minimal. Install from pip, point it at your documents, and let it build the tree index. The GitHub repo has a cookbook with a simple RAG notebook you can run in Google Colab right now.

For production use, PageIndex offers MCP (Model Context Protocol) support, a cloud chat platform at chat.pageindex.ai, and an API in beta. Enterprise on-premises deployment is available if your data can't leave your infrastructure.

Start with one document. Take your most problematic PDF — the one your current RAG pipeline keeps getting wrong — and run it through PageIndex. Compare the answers side by side. That single test will tell you more than any benchmark table.

The Bottom Line

PageIndex and vector databases solve retrieval differently, and the right choice depends on what you're actually building.

Vector databases remain the best option for high-volume semantic search across large, unstructured corpora where sub-50ms latency matters. They've earned their place in the stack.

But if your RAG pipeline struggles with long, structured documents — if accuracy on complex questions matters more than raw query speed — PageIndex's vectorless approach delivers results that vector similarity search simply can't match. That 98.7% on FinanceBench isn't theoretical. It's a benchmark number on real financial filings, answering real analyst questions.

The smartest teams in 2026 aren't picking sides. They're using vector search for breadth and PageIndex for depth, routing queries to the right tool based on complexity. That hybrid architecture is where the field is heading.

Ready to test it yourself? Clone the PageIndex repo, run the Colab cookbook on your own documents, and see the difference firsthand. No credit card, no signup — just open source.

Pawel Owerczuk
Pawel Owerczuk

AI Agent & RAG Developer

AI Agent & RAG Developer with 10+ years of software engineering experience. Specialized in intelligent AI solutions for enterprises in the DACH & Nordic region.