AI Auditing Tool — Annapurna Banerjee

Overview

The problem we were solving

Lab Investigation Reports (LIRs) are high-stakes quality documents — every one must be accurate, consistent, and compliant with global SOPs. Before this tool existed, authors spent significant time drafting, self-reviewing, and correcting reports before they were even ready to be formally reviewed. The process was slow, error-prone, and relied heavily on individual author expertise.

The AI Auditor was designed to change that. The goal: give authors real-time, SOP-aligned feedback as they work — not after the fact — so they could catch issues earlier, reduce rework, and produce consistently higher-quality LIRs.

Scope and scale: We launched the pilot at one North American site and subsequently expanded to four additional sites. The tool also laid the foundation for a broader AI authorship strategy across CAPA, ER, and NCR workflows.

Context & Constraints

Designing with limited external access

Because this was an internal MVP, we couldn't conduct broad external research before building. Early design decisions were grounded in SME interviews, direct SOP analysis, and tight feedback loops with site teams during the pilot.

The tool also had to operate in a heavily regulated environment, any AI output needed to be clearly framed as assistive (not authoritative), and the design needed to maintain author accountability at every step. This was a compliance requirement.

Design Process

Starting with the right question

When we received the brief, we didn't jump straight to wireframes. Our first step was to study AI design patterns across similar document-review tools — understanding how they communicate recommendations, structure feedback, and handle the critical question of user trust.

Because the AI Auditor needed to function as a human-in-the-loop assistant — supplementing the author, not replacing their judgment, we built our initial design around two core principles:

Trust & Transparency

Every AI recommendation needed a clear "why" — linking suggestions directly to relevant SOP guidance, so authors understood the reasoning and could exercise their own judgment.

Reference Materials Within Reach

Authors frequently switch between documents during review. We designed the interface to surface supporting SOP documents inline — removing context-switching friction.

The initial home screen was designed with a single, clear primary action: start an audit. We also introduced a percentage-based compliance score to help authors quickly gauge the level of revision their document might need.

AI Auditing Tool — early landing page concept

Early concept — single-action home screen with compliance score

AI Auditing Tool — refined landing page design

Refined landing — clearer audit entry point and status indicators

Design Feedback & Iterations

What changed, and why

After the first round of feedback from SMEs, ML engineers, and developers, a few things became clear. Here's how the design evolved:

v1→v2

From a single score to section-level feedback

The percentage score created confusion — users over-interpreted it and couldn't act on it meaningfully. We replaced it with a section-based visual scale showing the status of each part of the document separately, paired with detailed lists of what was missing or misaligned. This gave authors clearer, more actionable feedback.

v2→v3

Strengthening AI transparency throughout

We added explicit disclaimers that results were AI-generated and not human-validated — at multiple touchpoints, not just once. We also embedded contextual SOP references alongside each suggestion, so authors always had a direct path to the authoritative source.

v3→v4

Selective section review for experienced users

Later iterations allowed authors to choose which sections the AI reviewed — giving experienced users targeted, faster audits without having to process feedback on sections they already knew were complete. This was especially impactful for power users.

ongoing

Integrated feedback loop

To support continuous improvement of the AI model, we built in a feedback mechanism that let authors flag issues or suggestions directly from the tool — closing the loop between real-world usage and model refinement.

AI Auditing Tool — review section design v3

Review section — section-level feedback with SOP references

AI Auditing Tool — review section alternate layout

Review section — refined layout with AI transparency disclaimers

Review summary — v4, targeted section review for power users

AI Auditing Tool — review summary v6 final

Review summary — v6, final iteration with integrated feedback loop

Results

What the tool delivered

Usage grew sharply after the pilot launch, expanding from one site to five as adoption spread geographically. By late Q3 2024, usage was higher and more geographically distributed — confirming strong scalability across diverse teams and contexts.

Quantitative Outcomes

80% reduction in LIR review time
Increased overall LIR quality and SOP compliance
Reduced time from draft to ready-for-review
~30% estimated reduction in manual authoring effort

What the Tool Delivered

Real-time, SOP-aligned feedback
Standardized language and structure recommendations
Clear traceability to regulatory requirements
Foundation for AI-assisted CAPA, ER, and NCR authoring

Organizational Impact

Established the first AI authorship pathway at AbbVie
Scaled to additional languages and products
Expanded from 1 pilot site to 5 sites across North America

Future Steps

Broader site rollout
Formal usability studies post-pilot
Quality dashboards and analytics
Standardized templates for CAPA, ER, and NCR workflows

Reflections

What this project taught me

This was one of the most technically and organizationally complex projects I've worked on — a genuine first for the organization. A few things stood out.

Trust is a design material

With AI-generated outputs in a regulated environment, trust isn't implicit, it has to be designed in. Transparency, explainability, and clear disclaimers weren't nice-to-haves; they were what made the tool adoptable.

Metrics can mislead

The percentage score felt like good UX until users started making decisions based on it. That taught me to push harder on how metrics will actually be interpreted not just how they're intended.

Constraints can sharpen design

Working without broad user research early on forced us to be more deliberate about SME feedback and iterative testing. It made the design process leaner and the decisions more defensible.

AI UX is a new discipline

Designing for an AI system where outputs are probabilistic, not deterministic, required new patterns and new conversations with the ML team. This project made me a stronger advocate for AI transparency across the entire product.