AI Still Can’t Replace Financial Analysts: Vals AI’s New Benchmark Stumps Top Models, GPT-5.5 Barely-AI Topic

AI Cannot Replace Financial Analysts Yet – Vals AI’s New benchmark Shows Top Models Failing Miserably

Vals AI has released the SECond generation of its Finance agent benchmark (Finance Agent v2), an end-to-end test designed to simulate the workflow of a junior financial analyst. The benchmark includes 927 expert-verified questions, with a sharp increase in difficulty compared to the prior veRSIon.

The top-performing model, GPT-5.5, achieved only 51.76% accuracy, followed closely by Claude Opus 4.7 (51.51%) and claude Sonnet 4.6 (51.03%). Unlike single-turn Q&A, the test requires models to autonomously search through hundreds of pages of 10-K and 10-Q filings, adjust for cross-year financial statement changes, and perform multi-step calculations with precise intermediate numbers.

Under a strict scoring Standard where every part of an answer must be correct, all frontier models scored below 40%. In the most challenging categories—financial modeling and precedent transACTion analysis—the highest accuracy was just 23%.

Among other models, Kimi K2.6 ranked fifth with 44.87%, the best-performing Chinese model, followed closely by GLM 5.1 (44.79%) and DeepSeek V4 (44.08%). Claude Opus 4.7 received the "fastest" label (360 seconds per task), while GLM 5.1 was the most budget-friendly at $0.62 per task.

The sharp drop in scores—Opus 4.7 had scored 64.4% in the previous benchmark—dEMOnstrates that while current AI can handle bASIc retrieval tasks, it remains far from replacing human analysts in the compliance-heavy, precision-critical domain of Professional finance.

★★★★★

Be the first to rate this article.

AI Still Can’t Replace Financial Analysts: Vals AI’s New Benchmark Stumps Top Models, GPT-5.5 Barely

Comments & Questions (0)

No comments yet