By Alastair Doggett, Senior Machine Learning Engineer
In the world of contract negotiation, every clause, word, and revision must reflect legal intent, customer standards, and the precedents set by the customer’s previous negotiations.
Ontra’s products aim to empower legal teams with intelligent, reliable assistance at every step of the contract lifecycle. As one of the few platforms operating at this scale, Ontra has amassed one of the largest bodies of real-world negotiation precedent in the legal tech space, allowing us to refine and calibrate contract edits by leveraging a uniquely rich foundation of customer-specific history.
This article discusses how we’ve combined both tracts into a single, unified contract-editing pipeline. To meet that bar, we’ve developed a hybrid precedent-based suggestion system that marries the precision of rule-based transformations with the contextual understanding of large language models (LLMs).
A six-step hybrid pipeline
Contracts demand structured reasoning grounded in precedent and nuanced interpretation that adapts to context.
We will walk through a six-step hybrid pipeline that blends the strengths of both.
1. Counterpart matching
Every edit has a history and that history matters. When a new clause, from a contract pending review or markup enters the system we treat it as time step 0 in a structured sequence. The corresponding first markup precedent, usually crafted by a legal professional during a prior negotiation, becomes time step 1. Together, these form a transition pair or an edit event (observed clause-level edit): a before-and-after snapshot of how similar language was modified in practice.
Think of each clause’s evolution as a trajectory through an edit space. By observing how past clauses moved from t=0 (original) to t=1 (first markup), we can model the kinds of transformations legal professionals routinely make.
To surface these trajectories, we retrieve K nearest-neighbor clause pairs from a vector search index, where each match includes both time steps.
These matches are designed to be semantically aligned to the incoming clause, meaning the system is looking for a given customer’s precedents with similar starting conditions, helping it infer how those inputs evolved when handled by domain experts. This customer-specific retrieval process is only possible because of the depth of precedent Ontra has built over years of supporting high-volume, high-stakes negotiations.
This matching mechanism lets us condition future suggestions not just on the current clause, but on realistic edit trajectories observed in historical data.
In essence, we’re building a conditional distribution over edits:
Given that a clause looks like this at t=0, and we’ve seen similar clauses evolve to that at t=1, what’s the most likely or appropriate transformation to apply now?
This probabilistic framing enables us to:
- Incorporate temporal structure into editing logic.
- Learn common patterns of legal revisions over time.
- Guide LLM behavior using empirical edit data, rather than just its raw intuition.
By rooting our suggestions in time-aware precedent, we can tap into the editing wisdom embedded in dozens of the customer’s prior negotiations.
2. Non-LLM diff-transfer (structured suggestions)
Once we’ve retrieved semantically similar clause pairs from precedent, we use a diff-transfer algorithm to generate an initial set of structured suggestions, which forms a deterministic transformation grounded in prior human edits.
At a high level, the goal is simple:
Given a new clause c0 that’s similar to a past clause c1 , and a known edit transform c1 → cˆ1, how can we infer what that same edit would look like if applied to c0?
For example, let’s say we’ve seen this transformation before:
- c1 (original): “The indemnity cap is set to $1,000,000.”
- cˆ1 (edited): “The indemnity cap is set to 12 months of fees.”
Now we encounter a new clause:
- c0 (incoming): “The indemnity cap shall not exceed $5,000,000.”
Even though the dollar amount differs, the core structure is similar. The algorithm aligns c0 and c1, locates the change in c1 → cˆ1, and attempts to apply it to c0.
The output might be: “The indemnity cap shall not exceed 12 months of fees.”
Whilst this approach provides clear provenance, with edits grounded in precedent examples that are reinforced with both language patterns and internal standards, in many cases it can be structurally brittle, especially when c0 and c1 differ significantly (large drafting variations), and lacks any sort of context, with edits being transferred mechanically with no true legal understanding. To capture the inherent nuance of contracts, we need to use something flexible and dynamic.
3. Unstructured change extraction via LLM
To this end, we apply in parallel an LLM to the same precedent pairs to extract unstructured changes. These aim to capture subtler shifts in tone, obligations, or legal effect that a raw diff might miss.
This step hopes to leverage the adaptability of LLMs to infer context and legal nuance, even when these changes aren’t syntactically obvious.
Where they may lack the strict traceability of diff-based suggestions, they offer a richer understanding of how legal professionals adjust meaning based on deeper institutional knowledge.
4. Extract structured changes from diff suggestions
From the diff-transfer output, we extract a refined list of natural language structured change operations, like “replace indemnity cap with 12 months of fees” or “remove termination for convenience.”
This step provides us with a modular, rule-based view of edits, easily comparable with the LLM’s unstructured suggestions.
These operations can then act as the scaffolding for downstream analysis: discrete, explainable units of legal transformation.
5. LLM-based change refinement
Now comes the fusion step: We feed both the structured operations and unstructured suggestions into an LLM. The model evaluates overlaps, resolves conflicts, and synthesizes a final list of refined suggestions.
This step acts more like a senior editor by merging edits, adjusting language, and prioritizing changes based on legal context.
This refinement produces an edit plan that tries to bridge the gap between the blindly mechanical and overly creative. Here, the advantage of Ontra’s scale becomes especially clear: the LLM’s decisions are grounded not in abstract training data alone, but in the actual history of a particular customer’s negotiations.
6. LLM application of refined changes
Finally, we apply the refined suggestions back to the original clause. At this stage, the LLM now operates under much tighter guardrails: it has the original text, a structured edit plan, and clear precedent.
This hybrid system looks to offer the best of both worlds:
- Relevancy: Every suggestion is grounded in real contract edits.
- Explainability: Structured changes show why something is being changed.
- Adaptability: LLMs ensure the suggestions make sense in context.
- Consistency: Edits are tethered to known patterns, helping to reduce hallucinations and guesswork in interpretation.
Looking Ahead
By modeling contracts as time series, evolving across drafts and across deals, we move our systems at Ontra toward being temporally aware. This mirrors a broader shift in how we want to think about legal language. We have seen how to treat clauses as isolated text spans, but here we begin to enrich the model by framing them as members of dynamic families that are clustered by function, tracked across versions, and shaped by actual negotiation history.
This time-series view opens the door to a richer mathematical toolkit. We can use transition matrices to map how clauses move between functional clusters, survival analysis to measure the velocity of their movements, and autoregressive models to forecast their likely path through the negotiation.
By pairing this temporal modeling with our archive of negotiation precedent, Ontra continues to set the pace in the legal tech industry, offering customers insights that simply aren’t possible without both the data depth and the modeling sophistication.
Leaning into understanding these edit trajectories through our customers’ document lifecycle, we are widening our understanding from just anticipating what might change next to mapping the full landscape of how legal clauses truly evolve—revealing their typical paths, the speed of their movement, and the deeper patterns that guide their transformation over time.