Agents

How Dynamiq built a cost-aware, legal research workflow with IBM watsonx

Vitalii Duk
April 15, 2026

Legal teams are expected to deliver fast, defensible answers across contracts, policies and compliance concerns. However, the volume of documents keeps growing while the business increasingly expects near-immediate response times. This was the main problem for a European insurance client working with IBM partner Dynamiq.

To tackle the challenge, IBM partner Dynamiq used IBM watsonx® to provide a technical solution to a problem around numerous unstructured information, handoffs and much manual effort for review. Dynamiq implemented a system to support multi-document contract synthesis, policy Q&A, competitive analysis and compliance checks across jurisdictions.

The resulting architecture combined orchestration, low-cost query classification and deep research, reducing contract review time from 90 minutes to 45 and cutting business inquiry response time from two days to an hour. Furthermore, it accelerated clause identification—the process of locating and extracting specific provisions, obligations or conditions buried across lengthy contracts—from 20 minutes to two.

In legal operations, that speed matters. It matters because a single overlooked clause can alter liability, payment terms or compliance obligations, and lawyers frequently need to compare dozens of clauses across multiple agreements under tight deadlines.

A legal use case that demands fast answers with tight control

The client needed one workflow to cover legal jobs that usually sprawl across disconnected tools and manual review. It had to summarize multiple contracts, answer policy questions against internal agreements, compare internal contractual language with external sources and run compliance checks across multiple jurisdictions. The end goal was straightforward: help legal teams deliver faster, better-informed decisions to the business without increasing costs.

What made the challenge harder was a stringent set of functional requirements. The system had to stay tightly cost-optimized. It had to route work through an agentic but partially deterministic flow, requiring the classification agent to categorize all queries (see diagram), and maintaining full traceability across each reasoning and tool step. It also had to integrate with existing search and document infrastructure and complete query execution in under four minutes.

The system also had to work with proprietary legal document stores, external legal research through EXA and access controls aligned to SOC 2 requirements. Addressing that combination of constraints is what turns a demo into a production challenge.

For end users, the value provided by Dynamiq’s solution was immediate. Lawyers can move from manual clause hunting and cross-document comparison to guided, cited outputs. Business stakeholders could get answers faster. From a total cost of ownership (TCO) perspective, the important shift is that expensive reasoning is reserved for the questions that need it, instead of being applied uniformly to every request.

A three-part architecture that routes first and reasons deeply only when needed

At the center of the design is a three-part system: an orchestrator, a low-cost legal query classification agent and an advanced legal research agent. Each component has a distinct role and cost profile, which is what lets the overall workflow stay both practical and scalable.

The orchestrator helps ensure that the lowest-cost agent choice is made. The classifier delivers quick and cost-efficient triaging, recommended routing and response to simple queries. The research agent focuses on advanced legal research, but only if recommended by the classifier agent.

1. An orchestrator that manages routing, synthesis and budget discipline

Dynamiq built an orchestration agent that enforces a route, decides and synthesizes pattern. It refers each request to the classifier before answering directly, then decides whether the lower-cost path is enough or whether the request should escalate. It also monitors token consumption across the run and applies budget controls in two ways.

First, it asks the researcher agent to confirm assumptions when the classifier sees a low-confidence, high-cost inference request. Then, it hard-caps researcher tokens per step when costs spike, returning a constrained summary and a “continue?” control when needed. The orchestrator also defaults phrases such as “our terms,” “my contracts” and “internal policies” to internal queries.

In determining the best model for the orchestrator, the team found Claude 4 Sonnet and GPT-5 to be relatively capable, but also slower with a relatively high inferencing cost. The team chose Grok-4-fast because it delivered the needed balance of speed, cost and quality relative to higher-cost options.

2. A Granite-powered classifier that keeps simple work inexpensive

The legal query classification agent (labeled “agent 1” in the diagram) is designed for quick, low-cost triage. It uses IBM® Granite® 4 Small as the first stop for every request and produces structured output across six fields: complexity, data sources, task type, reasoning, recommendation and confidence.

That output governs whether the system stays local, escalates to the research agent or takes a hybrid path. It also determines whether external calls are allowed. If the classifier marks a request as “INTERNAL_ONLY,” access to EXA is disabled.

Deterministic routing gives teams a predictable way to separate inexpensive work from more costly work before the workflow starts consuming more tokens.

Simple queries cover legal definitions, straightforward policy questions and status inquiries. Moderately complex queries might require the research agent for contract term explanations, single-document analysis and basic compliance checks. Complex queries require escalation for comparative document analysis, clause-level compliance scoring, multi-source legal research and more involved legal reasoning tasks.

In effect, the classifier becomes both a routing layer and an expense control layer. Granite 4 Small offered roughly 3 times better price-to-performance and cost-efficiency than the Grok-4-fast model used elsewhere in the system.

3. A research agent that combines internal RAG with external legal sources

The advanced legal research agent (labeled “agent 2” in the preceding diagram) handles the heavier semantic work and activates only when the classifier recommends an EXTERNAL_ONLY or HYBRID path. Powered by Grok, the research agent handles multi-tool searches and more demanding reasoning. It connects to a retrieval augmented generation (RAG) subsystem built on Milvus for the client’s proprietary legal documents. For context on public and case law, the subagent connects to EXA.

The researcher agent delivers four core outputs:

  1. Semantic contract search to find parallel or related clauses, obligations and rights
  2. Comparative analysis cross-referencing multiple legal documents, different material language and business impact
  3. Clause-level compliance scoring on a risk scale of 1–10, including rationale and mitigation advice
  4. Summarizing the output by blending Milvus hits with EXA results, including source citations and identification of internal versus external sources

Each response includes an executive summary, detailed analysis, recommendations, source citations and a risk assessment that can flag ambiguities, conflicts, outdated references and coverage gaps. When the system detects contradictory facts, it surfaces the conflict with side-by-side quotes. When the initial retrievals across Milvus and EXA are weak, the agent falls back to broader embeddings or curated queries and logs a coverage_gap flag.

Traceability is built into the runtime, not added later

To meet the client’s requirement for full traceability, the runtime uses an XML-patterned ReAct inference mode to keep the reasoning, action and observation loop uniform and able to be debugged. That structure creates fine-grained step logs—detailed, timestamped records of every reasoning decision, tool call and model response that occurs during a single workflow run. It also creates clearer postmortems across the manager, the agents, the tools and the foundation models.

It also preserves visibility into how the multi-agent system exchanges prompts and responses with watsonx Orchestrate when the workflow is triggered through the API. For auditability, model and embedding versions are stored alongside each trace.

Watsonx Orchestrate turns the workflow into a reusable enterprise tool

When the multi-agent system was built, Dynamiq imported it into IBM watsonx Orchestrate as an external agent through the API by using a bearer token and a service instance URL generated by the Dynamiq platform.

The team chose watsonx Orchestrate because it turns a stand-alone multi-agent workflow into an enterprise-wide capability. Rather than limiting access to a single interface, Orchestrate lets every authorized user invoke the legal research agent through chat. Moreover, users can coordinate it alongside agents connected to systems such as SAP, Salesforce and ServiceNow.

It also provides a governed catalog where the agent appears as a first-class tool, meaning IT and compliance teams retain visibility. The teams can see what is running, who is using it and how it connects to the rest of the enterprise stack—without requiring Dynamiq to rebuild any integration logic.

Where IBM created practical advantages for cost, governance and user trust

The solution addresses three key concerns with maintaining and scaling AI agents. First, the multi-agent system optimizes cost with IBM Granite 4 Small. Routine legal triage does not consume higher-cost reasoning resources. Second, the legal team across the entire organization can leverage the agent system with their choice of user interface with IBM watsonx Orchestrate.

Finally, the overall architecture of the agent system provides the flexibility to deploy in cloud and on-prem as it moves to production.

The use case isn’t just about answering legal questions faster. It’s about making those answers easier to trust and less expensive to operate at scale. The workflow preserves citation boundaries between internal and external sources, logs reasoning steps for auditability and flags uncertainty instead of hiding it.

In legal operations, that kind of transparency directly affects end-user confidence. A faster answer is useful, but a faster answer with source grounding, visible risk flags and a predictable cost profile is much easier to operationalize.

Curious to find out how Dynamiq can help you extract ROI and boost productivity in your organization?

Free consultation
Table of contents

Find out how Dynamiq can help you optimize productivity

Free consultation
Lead with AI: Subscribe for Insights
By subscribing you agree to our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Related posts

View all