AI Architecture Strategy for Multi-Agent System to Automate 40% of Customer Support Queries

ai architecture strategy workshop case study

Executive summary

A leading SaaS platform serving the professional services industry provides appointment management, point-of-sale, marketing, and client relationship tools to thousands of small business owners globally.

The problem

The company’s engineering team had built a multi-agent AI system for business-to-customer interactions but faced critical technical roadblocks: unreliable intent detection in production, inconsistent service recommendations from vector stores, and authentication flow issues that broke conversational experiences.

The solution

Zartis conducted a strategic AI architecture workshop, analysing the client’s existing system and delivering targeted recommendations on information retrieval, agent design, testing infrastructure, and production deployment strategy.

About the client

Industry: SaaS – Professional Services Management

Headquarters: Dublin, Ireland

Global Reach: Serving businesses across North America, UK, Europe, and Australia

The client serves thousands of independent professional service businesses with a comprehensive platform managing the full customer lifecycle—from appointment booking and point-of-sale to marketing automation and client relationships.

As customer expectations evolved toward instant, 24/7 support, the company recognised AI-powered customer interaction as a strategic priority—both to reduce operational burden on business staff and to differentiate their platform in a competitive market.

The problem

Business owners using the platform faced a persistent operational challenge:

Customer enquiries such as booking changes, service questions, and staff availability consumed significant front-desk time during peak hours when staff should focus on in-person clients.

The company’s product vision was clear: build an AI agent that could interact with customers like a human staff member—handling appointment changes, recommending services, matching new customers with the right provider, and maintaining each business’s unique brand voice.

The business case was compelling, however, the engineering team had hit critical technical walls in their initial implementation that threatened to derail the entire initiative.

Why traditional approaches weren't working

The internal team had already invested significant effort in building a multi-agent AI system using LangGraph and GPT-4, but encountered roadblocks that generic AI best practices couldn’t solve:

Intent detection failed in production despite perfect test results

While their test datasets achieved 100% intent recognition accuracy, real-world customer conversations with unexpected phrasing created inefficient loops between agents. Users would get bounced between agents as the system struggled to understand complex requests like “I need to reschedule my appointment and add an additional service if my usual provider is available.”

Initial attempts to use semantic similarity (vector embeddings) for matching customers with services and providers produced illogical suggestions—especially problematic for new customers with no history. Business owners demand predictability, and “black box” AI recommendations erode trust.

Prior experiments fine-tuning LLMs on conversation data led to overfitting and brittle models that couldn’t generalise. The team wasted weeks on this approach before abandoning it.

Per-agent authentication created jarring moments where customers would be asked for phone numbers late in a booking flow, destroying the natural conversation feel.

With ~14 agents and overlapping responsibilities, the system was becoming harder to debug and maintain. The team lacked confidence in how to structure agents for maximum reliability.

Why they chose Zartis

Zartis AI

The client team knew what they wanted to build but needed specialised AI/LLM architecture expertise to navigate these production challenges and validate their technical approach.

We were brought on to the project for our experience in delivering:

The Zartis approach

Rather than deliver a generic AI strategy deck, Zartis conducted a hands-on architecture workshop directly with the client’s engineering team—treating it as a collaborative problem-solving session with their actual system and real production challenges.

ai strategy consulting

90 minutes

Deep-dive discovery

The workshop began with an intensive discovery session where Zartis reviewed the client’s existing LangGraph architecture and analysed their current agent design and orchestration flow. The team examined actual production failure cases and user conversation logs to understand where the system was breaking down in real-world scenarios. This deep dive also assessed the client’s testing methodology and observability infrastructure to identify gaps in their ability to monitor and improve the system.

90 minutes

Targeted recommendations

With a clear understanding of the challenges, Zartis whiteboarded alternative architectural approaches for key problem areas, discussing the trade-offs between complexity and determinism. The session explored specific tools and techniques applicable to the client’s stack, validating their technical decisions whilst challenging assumptions that were holding the team back. This collaborative approach ensured recommendations were immediately actionable within their existing infrastructure.

60 minutes

Action planning

The workshop concluded with concrete next steps and deliverables, establishing a clear collaboration model for follow-up analysis. The teams discussed production deployment strategy and risk mitigation, ensuring the client had a roadmap they could execute on with confidence.

Our approach

What we delivered

Problem: Vector stores producing inconsistent service/staff recommendations

Recommendation: Replace semantic similarity with graph-based retrieval

Why this matters: The client needed reliability over sophistication. Graph-based approach provides predictable results whilst maintaining the flexibility to handle complex service catalogues.

Problem: Intent detection unreliable in production; agents with overlapping responsibilities

Recommendation Break agents into smaller, more focused units with clear boundaries

Why this matters: Smaller agents = clearer failure points. When something breaks, the team can debug specific components rather than wrestling with monolithic LLM behaviour.

Problem: Fine-tuning attempts led to overfitting and brittle models

Recommendation: Invest in semantically rich, detailed prompt engineering

Why this matters: Prompt engineering is 10x faster to iterate, doesn't require model retraining when business logic changes, and produces more predictable results.

Problem: Limited visibility into production failures; unclear metrics for success

Recommendation: Build automated evaluation pipelines with:

Key insight from workshop: Zartis introduced perplexity as a diagnostic metric—analysing the probability distribution of generated tokens reveals where prompts are confusing or context is insufficient.

Why this matters: "Works in testing but fails in production" is solvable with the right observability. Measuring model confidence (perplexity) at each step creates early warning signals.

Problem: Using GPT-4 Turbo for everything; occasional hallucination on ID extraction

Recommendation: Task-specific model selection

Why this matters: Right-sizing models to tasks balances performance, cost, and reliability. Using a specialised extraction model for structured data avoids GPT-4's tendency to "hallucinate" valid-looking IDs.

Problem: Unclear path from workshop to production; security concerns

Recommendation: Gradual rollout with risk mitigation

Why this matters: AI in production requires different risk management than traditional software. Phased rollout with kill switches ensures issues don't impact entire customer base.

what made this work

Key technical decisions

Graph retrieval over vector stores

Whilst vector embeddings are trendy, they’re probabilistic. For matching new customers with the “perfect service and provider,” the client needed explainable, deterministic logic business owners could trust. Graph traversal provides that: “We recommended this provider because they’re certified in this service, have 4.9 star ratings, and have availability Tuesday afternoon.”

Prompt engineering over fine-tuning

Service definitions are diverse and constantly evolving (new services added, descriptions updated). Fine-tuned models become stale and require expensive retraining. Prompt engineering with retrieval stays fresh and adapts to changes immediately.

Granular agents over monolithic LLMs

The company’s prototype used a single LLM for all tasks—it was “confusing.” Breaking into specialised agents (intent detection → service reasoning → booking execution → summary) creates clear boundaries and testable units. Each agent does one thing well.

Perplexity as a diagnostic tool

Most teams treat LLMs as black boxes. Zartis’s recommendation to analyse perplexity (model confidence) at each workflow step gives the client a leading indicator of quality. High perplexity = “model is guessing” = prompt or context needs improvement.

the results

Immediate workshop deliverables

The client left the workshop with validated technical direction and confidence in their architecture approach. Key questions that had stalled internal discussions (“Should we use vector stores? How granular should agents be? Is fine-tuning worth it?”) were resolved with specific, actionable answers.

Architectural roadmap

Knowledge transfer

The engineering team gained production-tested patterns and techniques from Zartis experts who had built similar systems:

Tooling recommendations

ai agent case study success chart

What this enabled next

From workshop to production

Following the workshop, the engineering team moved forward with implementing Zartis’s recommendations.

The client now has confidence and clarity in their AI roadmap. Internal debates about architectural approaches were resolved with expert validation. The team can execute with speed, knowing they’re building on proven patterns rather than guessing.

The workshop didn’t just solve immediate technical problems, it equipped the team with frameworks and mental models for making future AI architecture decisions independently.

Facing AI architecture challenges?

If your team is building with LLMs but struggling with reliability, testing, or production deployment strategy, Zartis brings hands-on expertise from real implementations. We don’t deliver generic AI roadmaps—we solve actual technical problems alongside your engineers.