feat: add LLMAISmell checker for AI-generated requirement document detection#442
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an AI Smell Detector (LLMAISmell) for requirement documents, evaluating them across five dimensions using an LLM, and includes an example script and unit tests. The review feedback focuses on enhancing the robustness of the LLM response parser, specifically by safely casting scores and dictionary fields to handle string or null values, stripping whitespace before markdown cleanup, validating that the parsed JSON is a dictionary, clamping progress bar scores, and ensuring the example script actually executes the evaluation.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| if response.startswith("```json"): | ||
| response = response[7:] | ||
| elif response.startswith("```"): | ||
| response = response[3:] | ||
| if response.endswith("```"): | ||
| response = response[:-3] | ||
| response = response.strip() |
There was a problem hiding this comment.
The markdown code block stripping logic is executed before response.strip(). If the LLM response contains leading or trailing whitespace/newlines (e.g., '\njson\n...\n\n'), the startswith and endswith checks will fail to detect and strip the markdown code blocks. Stripping the response before checking resolves this issue.
| if response.startswith("```json"): | |
| response = response[7:] | |
| elif response.startswith("```"): | |
| response = response[3:] | |
| if response.endswith("```"): | |
| response = response[:-3] | |
| response = response.strip() | |
| response = response.strip() | |
| if response.startswith(chr(96) * 3 + "json"): | |
| response = response[7:] | |
| elif response.startswith(chr(96) * 3): | |
| response = response[3:] | |
| if response.endswith(chr(96) * 3): | |
| response = response[:-3] | |
| response = response.strip() |
| try: | ||
| data = json.loads(response) | ||
| except json.JSONDecodeError: | ||
| raise ConvertJsonError(f"Failed to parse AI smell response as JSON: {response[:200]}") |
There was a problem hiding this comment.
If the LLM returns a valid JSON that is not an object/dictionary (e.g., a list or a string), json.loads(response) will succeed but subsequent .get() calls on data will raise an AttributeError. It is safer to explicitly verify that the parsed JSON is a dictionary.
| try: | |
| data = json.loads(response) | |
| except json.JSONDecodeError: | |
| raise ConvertJsonError(f"Failed to parse AI smell response as JSON: {response[:200]}") | |
| try: | |
| data = json.loads(response) | |
| if not isinstance(data, dict): | |
| raise ConvertJsonError(f"Parsed JSON is not a dictionary: {type(data)}") | |
| except json.JSONDecodeError: | |
| raise ConvertJsonError(f"Failed to parse AI smell response as JSON: {response[:200]}") |
| def _score_bar(cls, score: int, width: int = 10) -> str: | ||
| """Generate a simple ASCII progress bar for a 0-10 score.""" | ||
| filled = round(score) | ||
| empty = width - filled | ||
| return f"[{'█' * filled}{'░' * empty}]" |
There was a problem hiding this comment.
If the LLM returns a score outside the expected 0-10 range (e.g., negative or greater than 10), _score_bar can produce malformed progress bars or raise errors. Clamping the score to [0, width] ensures the progress bar is always rendered correctly.
| def _score_bar(cls, score: int, width: int = 10) -> str: | |
| """Generate a simple ASCII progress bar for a 0-10 score.""" | |
| filled = round(score) | |
| empty = width - filled | |
| return f"[{'█' * filled}{'░' * empty}]" | |
| @classmethod | |
| def _score_bar(cls, score: float, width: int = 10) -> str: | |
| """Generate a simple ASCII progress bar for a 0-10 score.""" | |
| filled = max(0, min(width, round(score))) | |
| empty = width - filled | |
| return f"[{'█' * filled}{'░' * empty}]" |
| # Use executor.eval_text for quick single-text evaluation | ||
| # This is a simplified example showing the checker's usage | ||
| print("\nDocument snippet (high AI smell):") | ||
| print(SAMPLE_DOC_HIGH_AI_SMELL[:200] + "...") | ||
| print("\nExpected: AI_SMELL_DETECTED with high scores on adjective_violence and detail_vacuum") |
There was a problem hiding this comment.
The example script instantiates LocalExecutor but never actually calls it to run the evaluation on the sample documents. To make the example fully functional and demonstrate how to use the checker, you should include the actual execution calls (even if commented out or wrapped in a try-except block).
| # Use executor.eval_text for quick single-text evaluation | |
| # This is a simplified example showing the checker's usage | |
| print("\nDocument snippet (high AI smell):") | |
| print(SAMPLE_DOC_HIGH_AI_SMELL[:200] + "...") | |
| print("\nExpected: AI_SMELL_DETECTED with high scores on adjective_violence and detail_vacuum") | |
| # Use executor.eval_text for quick single-text evaluation | |
| # This is a simplified example showing the checker's usage | |
| print("\nDocument snippet (high AI smell):") | |
| print(SAMPLE_DOC_HIGH_AI_SMELL[:200] + "...") | |
| print("\nExpected: AI_SMELL_DETECTED with high scores on adjective_violence and detail_vacuum") | |
| # To run the evaluation (requires a valid API key): | |
| # try: | |
| # result = executor.eval_text(SAMPLE_DOC_HIGH_AI_SMELL) | |
| # print("\nActual Evaluation Result:") | |
| # print(result.reason[0]) | |
| # except Exception as e: | |
| # print(f"\nCould not run evaluation: {e}") |
Adopt Gemini code-assist suggestions:
- Cast total_score to float() with ValueError/TypeError fallback
- Use 'or {}' for dimensions/evidence to handle null values
- Use str() for verdict to handle null
- Cast per-dimension scores to float() before comparisons
- Clamp _score_bar input with int(round()) + max/min guard
|
Thanks for the thorough review! All suggestions have been addressed in the latest commit: High priority (both fixed):
Medium priority (all fixed):
|
Summary
Adds
LLMAISmell, a new LLM-based checker that detects AI-generated writing patterns in requirement documents.Background
Requirement docs written (or heavily assisted) by AI tend to share recognizable patterns: hollow truisms, repetitive rephrasing, inflated claims, lack of concrete detail, and buzzword overuse. This checker quantifies those patterns so reviewers can flag and push back on low-quality PRDs.
What's New
dingo/model/llm/llm_ai_smell.py— main checker, registered asLLMAISmellEvaluates 5 dimensions (each scored 0–10):
correct_nonsenseinfinite_mirrorrainbow_fartdetail_vacuumadjective_violenceScoring:
AI_SMELL_DETECTEDAI_SMELL_CLEANOnly requires
contentfield — drop in any document text and it works.Files
dingo/model/llm/llm_ai_smell.pytest/scripts/model/llm/test_llm_ai_smell.pyexamples/llm_ai_smell_example.pyUsage