AI Coding Quality Engineering

Bridging the Quality Gaps in AI-Assisted Coding

Demo Practices & Future Vision

Press ← β†’ or click buttons to navigate

The Problem: Quality Blind Spots in AI-Assisted Coding

The Rise of AI Coding Tools

  • GitHub Copilot, Claude Code, Cursor are now mainstream
  • Development velocity has significantly increased
  • Built-in PR Review (e.g., GitHub Copilot) helps catch issues

Limitations of Built-in Review

  • Only sees the Diff, not full codebase context
  • Cannot compare PR code with existing codebase for similarity
  • Cannot detect cross-file resource conflicts
  • Cannot validate if implementation matches design specs
πŸ”

Real-World Scenario

Developer A writes TodoRepository.

A week later, Developer B (or AI) writes EventLogRepository with 90% similar code.

Built-in PR Review says: "LGTM"

It only sees the new diff and cannot know this code already exists in the codebase.

Quality Blind Spots: What Built-in Review Cannot Detect

Blind Spot Description Consequence
Code Duplication PR introduces code that duplicates existing code Increased maintenance cost, tech debt accumulation
Semantic Conflicts Resource name conflicts (table names, cache keys, lock names) Runtime conflicts, hard to debug
API Contract Breaking API modifications break dependent code Runtime errors, integration failures
Architecture Violations Layer violations, wrong dependency directions Architecture erosion, decreased maintainability
Implementation Drift Code doesn't match design documentation Missing features, unmet requirements
Legacy Issues Security vulnerabilities, dead code in existing codebase Security risks, code bloat
Key Insight: All these issues require cross-file or cross-system analysis that diff-only review cannot perform.

Core Methodology: Deterministic Tools + AI Refinement

Why Not Just Use AI?

  • Large codebases don't fit in AI context (128k token limit)
  • AI exhaustive analysis is expensive and slow
  • Deterministic problems need deterministic tools for reliability

Two-Phase Approach

Phase 1: Deterministic Scanning
Use jscpd, regex, static analysis to narrow scope
↓
Phase 2: AI Refinement
Use AI to filter false positives, explain issues, provide recommendations

Tool Selection Matrix

Issue Type Best Tool AI Role
Syntactic Duplication jscpd Not needed
Semantic Duplication Code Embedding Similarity calculation
Resource Conflicts Regex Extraction Conflict summarization
Dead Code Static Index Filter false positives
Spec Compliance AI Comparison Core analysis

Demo Practices Overview

πŸ”

Practice 1

Extended PR Quality Check

PR-triggered cross-file quality analysis

  • Code duplication detection
  • Semantic conflict detection
  • Anti-pattern identification

Implemented

πŸ“Š

Practice 2

Periodic Repo Scan

Scheduled full codebase scanning

  • Security vulnerability detection
  • Dead code / redundant code
  • Technical debt assessment

Implemented

βœ…

Practice 3

Feature Implementation Check

Implementation vs spec compliance validation

  • Missing requirement detection
  • Scope creep identification
  • Compliance scoring

Implemented

3
Implemented Workflows
5+
Detection Pattern Types
HTML
Email Report Output

Practice 1: Extended PR Quality Check

Workflow

PR Submitted (opened/synchronize) ↓ Get Changed Files (git diff) ↓ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Parallel Analysis β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ jscpd β”‚ Semantic β”‚ Anti- β”‚ β”‚ Dupli- β”‚ Conflict β”‚ Pattern β”‚ β”‚ cation β”‚ Detection β”‚ Rules β”‚ β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ Filter: Keep only PR-related results ↓ Generate HTML Report + Email

Key Technologies

  • jscpd - Token-level code duplication detection
  • PowerShell Regex - Resource identifier extraction
  • Full Scan + Filter - Scan entire src/, report only PR-related issues

Semantic Conflict Detection Patterns

Pattern NameDetects
DistributedLockDistributed lock names
CacheKeyCache key patterns
ConfigKeyConfiguration key names
QueueNameMessage queue/topic names
TableNameStorage table names

Sample Output

CODE DUPLICATION
#1 - 26 duplicated lines
  AzureTableEventLogRepository.cs (53-78)
  AzureTableTodoRepository.cs (53-78)

SEMANTIC CONFLICTS
#1 - TableName: 'TodoItems'
  PR: TracingService.cs:16
  Existing: TodoRepository.cs:9

Practice 2: Periodic Repo Scan

Trigger Methods

  • Scheduled - Weekly Monday 9am (cron)
  • Manual - workflow_dispatch
  • Configurable scan path (default: src/)

Scan Categories

CategoryDetectionSeverity
SecurityHardcoded passwords/keysCritical
SecuritySQL injection risksCritical
QualityEmpty catch blocksHigh
Qualitythrow ex loses stackMedium
QualityConsole.WriteLineLow

AI Deep Analysis (Optional)

  • Dead code detection
  • Useless code identification
  • Logic nonsense discovery

Sample Report

Repository Code Quality Scan
Critical Issues Found (24 issues)

Summary

Critical: 8 | High: 4 | Medium: 6 | Low: 3 | Duplicates: 3

Static Analysis

[Critical] Hardcoded connection string
  ReportGenerator.cs:11

[Critical] SQL injection risk
  ReportGenerator.cs:84

Alternative: Codex CLI

Replace complex regex with natural language prompts:

codex exec "Scan src/ for security issues: - Hardcoded credentials - SQL injection Output as JSON"

Practice 3: Feature Implementation Check

Core Concept

Validate that PR implementation strictly matches design spec/requirements to prevent:

  • Missing Requirements - Features specified but not implemented
  • Scope Creep - Features not specified but added
  • Implementation Deviation - Implementation differs from design intent

Spec Sources

  • PR description (title + body)
  • Associated design documents (*design*.md, *spec*.md)
  • docs/ folder within code directories

AI Analysis Prompt

Uses Azure OpenAI to compare spec vs implementation, outputs: missing requirements, extra features, deviations, compliance score (1-10)

Workflow

PR Triggered ↓ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Collect Inputs β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ PR Info β”‚ Design/Spec Files β”‚ β”‚ title β”‚ *design*.md β”‚ β”‚ body β”‚ *spec*.md β”‚ β”‚ diff β”‚ docs/*.md β”‚ β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ Azure OpenAI Comparison ↓ Generate Compliance Report

Sample Output

Missing Requirements
[Critical] Blob storage write is NOT implemented per design spec

Extra Features
None detected

Compliance Score: 6/10
Core notification implemented, but blob storage requirement missing.

Partially Implemented & Planned Features

Mentioned but Not Fully Implemented

Semantic Code Duplication Partial

Current: jscpd token-based syntactic duplication

Planned: Code Embedding + vector similarity to detect logically identical but syntactically different code

Architecture Violation Detection POC

Current: Example anti-pattern rules (hardcoded detection, etc.)

Planned: Dependency graph analysis, layer rules configuration, custom architecture rules

API Contract Breaking Detection Basic

Current: Detects API route reference conflicts

Planned: Full call chain analysis, breaking change detection

Extension Capabilities Planned

Multiple Notification Channels Planned

  • PR Comment - Add comments directly on PR
  • Slack/Teams - Team instant notifications
  • Work Item Creation - Azure DevOps/Jira integration
  • Merge Blocking - Quality gates

Dashboard & Trend Analysis Planned

Quality metrics visualization, historical trends, team comparisons

Auto-Fix Exploring

AI generates fix code β†’ Auto-commit β†’ Trigger new review

Future Vision: Evolution of AI Coding Quality Engineering

Multi-Agent Self-Review Loop

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ AI Code β†’ AI Review β†’ AI Fix β”‚ β”‚ Autonomous Loop β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ 1. Copilot/Cursor generates code ↓ 2. Claude/Codex reviews and comments ↓ 3. Issues found? β”œβ”€ Yes β†’ Copilot auto-fixes β†’ Back to step 2 └─ No β†’ Human reviews final result Humans only review final passing code. AI handles repetitive review-fix cycles.

Deep Source Control Integration

  • Auto-add review comments on PR (built-in + extended checks)
  • Auto-commit fix suggestions (with human approval gate)
  • Auto-run tests to verify fixes
  • Auto-merge when all quality gates pass

Evolution: Spec-Driven Development

Stage Approach Characteristics
Prompt-Driven Natural Language Ambiguous, inconsistent
Context-Driven Prompt + Codebase Better but unstructured
Spec-Driven Structured Specs Predictable, verifiable

Industry Practice References

  • Design-by-Contract - Preconditions, postconditions, invariants
  • BDD - Given-When-Then as executable tests
  • API-First - OpenAPI specs define contracts
  • Model-Driven - UML/DSL models as source of truth
Vision: Write spec first, AI generates implementation, automated validation ensures compliance.

Overall Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ AI Coding Quality Engineering β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β–Ό β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ PR Quality β”‚ β”‚ Repo Scan β”‚ β”‚ Feature Check β”‚ β”‚ Check β”‚ β”‚ β”‚ β”‚ β”‚ β”‚ ───────────── β”‚ β”‚ ───────────── β”‚ β”‚ ───────────── β”‚ β”‚ Trigger: PR β”‚ β”‚ Trigger: Cron/ β”‚ β”‚ Trigger: PR β”‚ β”‚ Scope: Changed β”‚ β”‚ Manual β”‚ β”‚ Scope: Spec + β”‚ β”‚ files β”‚ β”‚ Scope: Full β”‚ β”‚ Implementation β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β–Ό β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Deterministic Scanning Layer (Phase 1) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ jscpd β”‚ Regex Pattern β”‚ Static Rule β”‚ File/Dependency Analysis β”‚ β”‚ Duplication β”‚ Resource β”‚ Security/ β”‚ Spec Collection β”‚ β”‚ β”‚ Extraction β”‚ Quality β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ AI Refinement Layer (Phase 2) - Optional β”‚ β”‚ Azure OpenAI / Codex CLI β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ β€’ Filter false positives β€’ Explain issues β€’ Prioritize β€’ Compliance β€’ Suggest β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Output & Actions β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ HTML Email β”‚ PR Comment β”‚ Work Item β”‚ Merge Block β”‚ Dashboard Update β”‚ β”‚ βœ“Implementedβ”‚ Planned β”‚ Planned β”‚ Planned β”‚ Planned β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Summary

Problem & Solution

Quality Blind Spots in AI-Assisted Coding

Built-in PR Review only sees Diff, cannot detect cross-file issues, resource conflicts, spec deviation, etc.

Solution: Two-Phase Approach

  • Phase 1: Deterministic tools to narrow scope
  • Phase 2: AI refinement, explanation, prioritization

Implemented Practices

Practice 1 PR Extended Quality Check
Practice 2 Periodic Repo Scan
Practice 3 Feature Implementation Check

Core Value

Let AI Handle Repetitive Quality Checks
Engineers focus on architecture and business logic
Fill Built-in Review Blind Spots
Cross-file analysis, resource conflicts, spec compliance
Extensible Quality Engineering Framework
Add detection rules, notification channels, action responses

Next Steps

  • Multi-Agent Self-Review Loop
  • Deep Source Control Integration
  • Evolution to Spec-Driven Development

Q & A

Questions & Discussion

Repository: github.com/v1212/aivibingtest

Detailed Documentation: docs/ai-coding-quality-engineering.md

1 / 13