Hydra DB · FinanceBench Benchmark Report

About this report

This report shares the results of running Hydra DB against FinanceBench14, an industry benchmark for financial question answering. The goal is a clear view of how Hydra DB performs on real-world financial documents, so you can judge its fit for your use case.

We evaluated two things that matter for production use: how often it finds the correct supporting evidence in the document, and how much context it sends to the downstream language model.

The benchmark

FinanceBench is a public benchmark built from 10-K, 10-Q, 8-K, and earnings documents of publicly traded companies. Each question has a human-verified answer and a pointer to the exact passage in the source document that supports it. The open-source sample covers a mix of direct metric lookups, domain knowledge, and reasoning-based queries.

Hydra DB offers two retrieval modes, and we evaluated both:

Fast mode

Optimized for low-cost, high-throughput retrieval. Run against the full set of 150 open-source questions.

Thinking mode

Performs additional reasoning over candidate passages to improve ranking quality. Run against a 120-question subset.

Retrieval accuracy

Accuracy is measured with Recall@K - the share of questions where the correct supporting passage appears somewhere in the top K results Hydra DB returns.

1Recall@K

Thinking modeFast mode

Recall@K across both modes. Both converge near Recall@10; thinking mode adds 5–6 points at the top of the ranking.

Top K results	Fast mode	Thinking mode	Improvement
Top 1	44.1%	50.3%	+6.2 pts
Top 3	68.1%	74.3%	+6.2 pts
Top 5	78.9%	84.3%	+5.4 pts
Top 10	89.0%	91.4%	+2.4 pts

Table 1. Recall@K for fast and thinking modes.

In fast mode, Hydra DB surfaces the correct evidence within its top 10 results for nearly 9 out of 10 questions. At top 5, the correct answer is present ~79% of the time - a practical working range for passing context into a model. Thinking mode improves recall at every level, with the biggest gains at the top of the ranking: Recall@1 rises by more than 6 points and Recall@5 crosses 84%.

Both modes converge at Recall@10 - the underlying retrieval finds the correct evidence in nearly every case. Thinking mode primarily reorders results to put the right one higher up; choose it when the first few results must be correct.

Context size

When Hydra DB returns results, it assembles them into a context package for the downstream model. Smaller, more predictable context sizes mean lower inference cost and faster responses. The numbers below are from the fast-mode run.

5,299

Smallest context
tokens

7,997

Average context
tokens

12,175

Largest context
tokens

1×

Fits every major
model window

Table 2. Context size per query (fast mode).

On average, Hydra DB produces about 8,000 tokens of context per query. The largest in the benchmark was around 12,000 tokens - comfortably inside the context window of every major language model on the market. The size is predictable across queries, making cost planning straightforward.

Summary

On FinanceBench, Hydra DB retrieves the correct evidence in the top 10 results 89% of the time in fast mode and 91% in thinking mode, with an average context size of roughly 8,000 tokens per query. Thinking mode adds 5–6 points of recall at the top of the ranking - the range that matters most when you want the first result to be correct.

The benchmark shows Hydra DB can reliably find the right information inside long, complex financial filings and deliver it in a form ready to use with any modern language model. The two modes give a direct choice between throughput and ranking precision based on what your application needs.

Reliable evidence retrieval inside dense filings, delivered in a compact, predictable ~8K-token package.

References

Hydra DB: Technical Paper. Hydra DB (2026). benchmarks.hydradb.com/HydraDB.pdf
FinanceBench: A New Benchmark for Financial Question Answering. Islam, P., Kannappan, A., Kiela, D., Qian, R., Scherrer, N., Vidgen, B. (2023). arXiv:2311.11944
Hydra DB FinanceBench Evaluation Code. GitHub. github.com/usecortex/hydradb-bench

Cite this work

@techreport{hydradb-financebench2026,
  title  = {FinanceBench: Retrieval over Dense Financial Filings},
  author = {{Hydra DB Research Team}},
  institution = {Hydra DB},
  address = {San Francisco, California, USA},
  year   = {2026},
  note   = {FinanceBench Recall@10 91.4\% (thinking mode)},
  url    = {https://benchmarks.hydradb.com/financebench.pdf}
}

FinanceBench - Retrieval over Dense Filings

About this report

The benchmark

Retrieval accuracy

Context size

Summary

Related research

References

Cite this work