Skip to content

feat: add LLMAISmell checker for AI-generated requirement document detection#442

Merged
e06084 merged 9 commits into
devfrom
feature/ai-smell-checker
Jun 18, 2026
Merged

feat: add LLMAISmell checker for AI-generated requirement document detection#442
e06084 merged 9 commits into
devfrom
feature/ai-smell-checker

Conversation

@e06084

@e06084 e06084 commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds LLMAISmell, a new LLM-based checker that detects AI-generated writing patterns in requirement documents.

Background

Requirement docs written (or heavily assisted) by AI tend to share recognizable patterns: hollow truisms, repetitive rephrasing, inflated claims, lack of concrete detail, and buzzword overuse. This checker quantifies those patterns so reviewers can flag and push back on low-quality PRDs.

What's New

dingo/model/llm/llm_ai_smell.py — main checker, registered as LLMAISmell

Evaluates 5 dimensions (each scored 0–10):

Dimension CN Description
correct_nonsense 💊 正确的废话指数 Hollow truisms: "In today's rapidly evolving..."
infinite_mirror 🪞 无限镜像感 Same point rephrased multiple times
rainbow_fart 🌈 彩虹屁密度 Inflated claims without data: "revolutionize", "industry-leading"
detail_vacuum 🧩 细节真空度 Structurally complete but nothing is actionable
adjective_violence ✨ 形容词暴力指数 Buzzword overload: 赋能/闭环/颗粒度/抓手/降本增效

Scoring:

  • Weighted overall score (detail_vacuum ×0.3, correct_nonsense ×0.25, adjective_violence ×0.2, infinite_mirror ×0.15, rainbow_fart ×0.1)
  • Score ≥ 6AI_SMELL_DETECTED
  • Score < 6AI_SMELL_CLEAN
  • Output includes per-dimension scores, evidence quotes from the document, and a one-line verdict

Only requires content field — drop in any document text and it works.

Files

File Purpose
dingo/model/llm/llm_ai_smell.py Checker implementation
test/scripts/model/llm/test_llm_ai_smell.py 17 unit tests (pass/fail logic, score normalization, reason content, markdown cleanup, error handling)
examples/llm_ai_smell_example.py Usage example with high/low AI smell sample documents

Usage

# Register name: "LLMAISmell"
# Required field: content (document text)
# Output labels: AI_SMELL_DETECTED / AI_SMELL_CLEAN

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an AI Smell Detector (LLMAISmell) for requirement documents, evaluating them across five dimensions using an LLM, and includes an example script and unit tests. The review feedback focuses on enhancing the robustness of the LLM response parser, specifically by safely casting scores and dictionary fields to handle string or null values, stripping whitespace before markdown cleanup, validating that the parsed JSON is a dictionary, clamping progress bar scores, and ensuring the example script actually executes the evaluation.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread dingo/model/llm/llm_ai_smell.py Outdated
Comment thread dingo/model/llm/llm_ai_smell.py Outdated
Comment on lines +174 to +180
if response.startswith("```json"):
response = response[7:]
elif response.startswith("```"):
response = response[3:]
if response.endswith("```"):
response = response[:-3]
response = response.strip()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The markdown code block stripping logic is executed before response.strip(). If the LLM response contains leading or trailing whitespace/newlines (e.g., '\njson\n...\n\n'), the startswith and endswith checks will fail to detect and strip the markdown code blocks. Stripping the response before checking resolves this issue.

Suggested change
if response.startswith("```json"):
response = response[7:]
elif response.startswith("```"):
response = response[3:]
if response.endswith("```"):
response = response[:-3]
response = response.strip()
response = response.strip()
if response.startswith(chr(96) * 3 + "json"):
response = response[7:]
elif response.startswith(chr(96) * 3):
response = response[3:]
if response.endswith(chr(96) * 3):
response = response[:-3]
response = response.strip()

Comment on lines +182 to +185
try:
data = json.loads(response)
except json.JSONDecodeError:
raise ConvertJsonError(f"Failed to parse AI smell response as JSON: {response[:200]}")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the LLM returns a valid JSON that is not an object/dictionary (e.g., a list or a string), json.loads(response) will succeed but subsequent .get() calls on data will raise an AttributeError. It is safer to explicitly verify that the parsed JSON is a dictionary.

Suggested change
try:
data = json.loads(response)
except json.JSONDecodeError:
raise ConvertJsonError(f"Failed to parse AI smell response as JSON: {response[:200]}")
try:
data = json.loads(response)
if not isinstance(data, dict):
raise ConvertJsonError(f"Parsed JSON is not a dictionary: {type(data)}")
except json.JSONDecodeError:
raise ConvertJsonError(f"Failed to parse AI smell response as JSON: {response[:200]}")

Comment on lines +226 to +230
def _score_bar(cls, score: int, width: int = 10) -> str:
"""Generate a simple ASCII progress bar for a 0-10 score."""
filled = round(score)
empty = width - filled
return f"[{'█' * filled}{'░' * empty}]"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the LLM returns a score outside the expected 0-10 range (e.g., negative or greater than 10), _score_bar can produce malformed progress bars or raise errors. Clamping the score to [0, width] ensures the progress bar is always rendered correctly.

Suggested change
def _score_bar(cls, score: int, width: int = 10) -> str:
"""Generate a simple ASCII progress bar for a 0-10 score."""
filled = round(score)
empty = width - filled
return f"[{'█' * filled}{'░' * empty}]"
@classmethod
def _score_bar(cls, score: float, width: int = 10) -> str:
"""Generate a simple ASCII progress bar for a 0-10 score."""
filled = max(0, min(width, round(score)))
empty = width - filled
return f"[{'█' * filled}{'░' * empty}]"

Comment thread examples/llm_ai_smell_example.py Outdated
Comment on lines +122 to +126
# Use executor.eval_text for quick single-text evaluation
# This is a simplified example showing the checker's usage
print("\nDocument snippet (high AI smell):")
print(SAMPLE_DOC_HIGH_AI_SMELL[:200] + "...")
print("\nExpected: AI_SMELL_DETECTED with high scores on adjective_violence and detail_vacuum")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The example script instantiates LocalExecutor but never actually calls it to run the evaluation on the sample documents. To make the example fully functional and demonstrate how to use the checker, you should include the actual execution calls (even if commented out or wrapped in a try-except block).

Suggested change
# Use executor.eval_text for quick single-text evaluation
# This is a simplified example showing the checker's usage
print("\nDocument snippet (high AI smell):")
print(SAMPLE_DOC_HIGH_AI_SMELL[:200] + "...")
print("\nExpected: AI_SMELL_DETECTED with high scores on adjective_violence and detail_vacuum")
# Use executor.eval_text for quick single-text evaluation
# This is a simplified example showing the checker's usage
print("\nDocument snippet (high AI smell):")
print(SAMPLE_DOC_HIGH_AI_SMELL[:200] + "...")
print("\nExpected: AI_SMELL_DETECTED with high scores on adjective_violence and detail_vacuum")
# To run the evaluation (requires a valid API key):
# try:
# result = executor.eval_text(SAMPLE_DOC_HIGH_AI_SMELL)
# print("\nActual Evaluation Result:")
# print(result.reason[0])
# except Exception as e:
# print(f"\nCould not run evaluation: {e}")

e06084 added 4 commits June 18, 2026 18:47
Adopt Gemini code-assist suggestions:
- Cast total_score to float() with ValueError/TypeError fallback
- Use 'or {}' for dimensions/evidence to handle null values
- Use str() for verdict to handle null
- Cast per-dimension scores to float() before comparisons
- Clamp _score_bar input with int(round()) + max/min guard
@e06084

e06084 commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks for the thorough review!

All suggestions have been addressed in the latest commit:

High priority (both fixed):

  • total_score now cast to float() with ValueError/TypeError fallback
  • dimensions/evidence use or {} pattern; verdict wrapped in str(... or "")
  • Per-dimension scores also cast to float() before _score_bar and >= 5 comparison

Medium priority (all fixed):

  • response.strip() now runs before the markdown code-block stripping
  • Added isinstance(data, dict) guard after json.loads()
  • _score_bar signature updated to float, clamp uses max(0, min(width, int(round(score))))
  • Example script now includes commented-out actual execution calls

@e06084 e06084 merged commit 5e84a18 into dev Jun 18, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant