OTA-1927: Eval cluster update prompts#2908
Conversation
|
@fao89: This pull request references OTA-1927 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds end-to-end infrastructure for cluster-updates evaluation tests. A new YAML configuration file ( ChangesCluster-Updates Evaluation Setup
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning, 1 inconclusive)
✅ Passed checks (12 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@eval/README.md`:
- Around line 54-63: The cluster-updates example commands in the eval/README.md
reference system_cluster_updates.yaml which uses https://localhost:8080, but the
local setup starts OLS at http://localhost:8080, causing a TLS mismatch. Add a
clarifying note in the README near these example commands explaining that for
local runs, users need to either modify the api_base setting in
system_cluster_updates.yaml to use http instead of https, or provide
instructions pointing to a separate local cluster-updates configuration preset
that uses HTTP. This will prevent users from encountering immediate
connection/TLS failures when attempting to run these commands locally.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8d9c1f51-f25b-40f6-b7f8-eba854c9da4a
📒 Files selected for processing (3)
eval/README.mdeval/eval_data_cluster_updates.yamleval/system_cluster_updates.yaml
e1c10db to
47bc7ce
Compare
|
/cc @sriroopar @rioloc |
|
| - Clear recommendation should be provided | ||
| - conversation_group_id: conv_800 | ||
| tag: cluster-updates-scenarios | ||
| turns: |
There was a problem hiding this comment.
Thank you very much for your PR Fabricio,:)
a major bug is that turn metrics is not set up for everyturn which will in turn not capture the metrics we may want to analyze. rest looks okay, dropped a couple minor mismatches in a comment.
|
/cc @wking |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/scripts/test-cluster-updates.sh`:
- Line 34: Separate the variable assignment from the export statement on the
line that sets OPENAI_API_KEY to avoid masking errors from the cat command.
First assign the output of cat "$OPENAI_PROVIDER_KEY_PATH" to a temporary
variable or directly capture it, then check that the command succeeded before
exporting OPENAI_API_KEY. This ensures that if the cat command fails due to a
missing file or permission issues, the error is immediately visible rather than
causing a cryptic authentication error later.
- Around line 24-25: The export statements on lines 24 and 25 combine variable
assignment with command substitution, which masks failures if the underlying
commands fail. Separate the command substitution from the export statement for
both ARCH and OS variables. First, assign the result of the command substitution
to the variable without exporting (e.g., ARCH=$(case $(uname -m) in ... esac)),
then add error checking to verify the command succeeded (e.g., using [ -z
"$ARCH" ] or checking the exit code with ||), and only then export the variable.
If the command fails, exit with an error message to prevent incorrect values
from being used when constructing OPERATOR_SDK_DL_URL on line 26.
- Around line 60-63: The export statement combined with the mktemp -d command
substitution masks failures. If mktemp -d fails, the export still succeeds with
an empty or invalid value. Separate the command substitution from the export by
first assigning the mktemp -d result to ARTIFACT_DIR variable, add error
checking to ensure mktemp succeeded before proceeding, and then export the
variable separately. This ensures that if mktemp -d fails (due to permission
issues or lack of disk space), the script properly detects and handles the error
instead of continuing with an invalid ARTIFACT_DIR path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5c5dea0a-d4ff-446e-98f7-eb2bcb4814a1
📒 Files selected for processing (6)
Makefileeval/README.mdeval/eval_data_cluster_updates.yamleval/system_cluster_updates.yamltests/e2e/evaluation/test_cluster_updates.pytests/scripts/test-cluster-updates.sh
✅ Files skipped from review due to trivial changes (1)
- eval/README.md
🚧 Files skipped from review as they are similar to previous changes (1)
- eval/system_cluster_updates.yaml
db695e0 to
d0cc961
Compare
4bb0e70 to
cc7c8aa
Compare
Fix 'set_session cannot be used inside a transaction' error that occurred when storing multi-turn conversation history in PostgreSQL cache. Problem: - insert_or_append() and delete() methods set autocommit=False to start a transaction, then set it back to True in the finally block - If an exception occurs, the connection may still be in a transaction when autocommit=True is called - psycopg2 internally calls set_session() when changing autocommit, which fails if a transaction is active Solution: - Check connection transaction status before setting autocommit=True - Rollback any active transaction before changing autocommit setting - Ensures clean transition from transactional to autocommit mode Impact: - Multi-turn conversations now work correctly with PostgreSQL cache - No functional change for single-turn conversations - Evaluation tests can now test context retention and progressive refinement Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
|
/retest-required |
Add comprehensive MCP test scenarios to evaluation dataset for validating OpenShift cluster update workflow AI responses. These scenarios establish quality benchmarks for LLM outputs across different update phases. Test Scenarios Added (conv_798-802): - Precheck: Pre-upgrade validation and readiness assessment Comprehensive analysis of cluster health, available updates, and upgrade blockers before initiating updates - Precheck-Specific: Targeted upgrade path validation Validates specific version availability and upgrade feasibility for planned update targets - No-Updates: Cluster health assessment at latest version Health monitoring and operational status when no updates are available in current channel - Progress: Real-time upgrade progress monitoring Tracks upgrade progress with component status, timeline analysis, and ETA calculations during active updates - Troubleshoot: Upgrade failure diagnosis and remediation Root cause analysis and conservative troubleshooting guidance for failed or stuck upgrade scenarios Each scenario includes: - Complete analysis prompts with constraints and requirements - Full ClusterVersion YAML data as attachments - Full ClusterOperator YAML data as attachments - Expected responses with Summary and TL;DR sections - Real cluster data from production-like scenarios These scenarios mirror the CONSOLE-5118 OLS integration workflow phases and provide the evaluation baseline for cluster update AI assistance. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Fabricio Aguiar <fabricio.aguiar@gmail.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
|
@fao89: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add comprehensive MCP test scenarios to evaluation dataset for validating OpenShift cluster update workflow AI responses. These scenarios establish quality benchmarks for LLM outputs across different update phases.
Test Scenarios Added (conv_798-802):
Precheck: Pre-upgrade validation and readiness assessment Comprehensive analysis of cluster health, available updates, and upgrade blockers before initiating updates
Precheck-Specific: Targeted upgrade path validation Validates specific version availability and upgrade feasibility for planned update targets
No-Updates: Cluster health assessment at latest version Health monitoring and operational status when no updates are available in current channel
Progress: Real-time upgrade progress monitoring Tracks upgrade progress with component status, timeline analysis, and ETA calculations during active updates
Troubleshoot: Upgrade failure diagnosis and remediation Root cause analysis and conservative troubleshooting guidance for failed or stuck upgrade scenarios
Each scenario includes:
These scenarios mirror the CONSOLE-5118 OLS integration workflow phases and provide the evaluation baseline for cluster update AI assistance.
Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com
Ref: openshift/console#16131
Summary by CodeRabbit
make test-cluster-updatesand a new end-to-end cluster-updates evaluation test, plus a CI script to run the suite and validate generated artifacts and error-free summaries.