An Analysis of the Zartis AI Application Development Experiment
A research-driven exercise to deepen the team’s understanding of what it takes to engineer real business value from LLMs
Executive summary
This whitepaper presents the key learnings from an internal Zartis prototype aimed at building an end-to-end AI application for processing complex Merger & Acquisition (M&A) Information Memorandums (CIMs). The initiative was designed not as a commercial product, but as a research-driven exercise to deepen the team’s understanding of what it takes to engineer real business value from LLMs.
The experiment focused on a domain-specific problem:
Extracting structured, comparable insights from unstandardised, graphically rich CIM documents. To tackle this, Zartis developed an “agentic system” composed of multiple specialised AI agents with defined roles and autonomy that collaborate to produce a standardised analytical report.
Challenges:
This prototype served as a testbed for exploring three of the most persistent challenges in AI development: hallucinations, determinism, and LLM cost effectiveness.
1. Hallucinations: The team explored how to embrace uncertainty while designing robust fallback mechanisms to mitigate its effects. This shift reframes hallucinations not as defects to be eradicated but as predictable behaviours to be detected, managed, and leveraged. The team’s approach includes developing early detection techniques based on uncertainty metrics that signal when to activate verification or review loops.
2. Determinism: The team explored whether repeatable outcomes can be engineered in LLM-based systems. This investigation distinguished between system-level determinism (achieved through architecture, task decomposition, and environment control) and model-level determinism, where variance arises not from the model’s internal logic but from the infrastructure used to serve it.
3. Cost-effectiveness: The experiment highlighted that one of the main challenges in building practical AI systems lies in managing the cost of intelligence. Rather than setting “cost monitoring” as an isolated objective, the focus was on designing a system capable of balancing precision, reliability, and efficiency within real operational limits. The team learned that cost-effectiveness must be engineered holistically — through architecture, model selection, and observability — not by cutting corners on reasoning or over-optimising prompts.

Experiment: Quality & cost-efficiency in multi-agent AI system development
Get the details of our research-driven exercise to deepen the understanding of what it takes to engineer real business value from LLMs
Download the full paper right now!
Whitepaper by:
Take a look at some of the findings from our AI application development experiment:
- AI development is system development, where technology choices must always serve the business problem.
- Moving from a functional prototype to a production-grade, value-generating AI system requires multidisciplinary collaboration.
- Building effective AI applications is about orchestrating hybrid systems that integrate data science, engineering, product.
- Cost efficiency and control depend on deep observability and continuous optimisation.
- Achieving reliable results at low cost required deliberate engineering choices.
The experiment demonstrated that pragmatism is the key to progress. Complex frameworks such as graph-based retrieval systems proved powerful in certain contexts, yet simple keyword searches outperformed them for specific KPI extractions.
Download whitepaper:
Discover more whitepapers
Whitepaper
AI Solutions: Moving From POC to Production
This “pilot to production gap” is where countless hours and investments disappear. Discover insights from a panel of industry leaders, who shared their learnings at the 2025 Zartis AI Summit.
Whitepaper
Advanced Techniques for Managing Hallucination and Determinism in LLMs
Only 5% of companies report ROI from genAI and true control lies in managing the main challenges: hallucination and non-determinism.