OTA-1927: Eval cluster update prompts by fao89 · Pull Request #2908 · openshift/lightspeed-service

fao89 · 2026-04-29T15:43:15Z

Add comprehensive MCP test scenarios to evaluation dataset for validating OpenShift cluster update workflow AI responses. These scenarios establish quality benchmarks for LLM outputs across different update phases.

Test Scenarios Added (conv_798-802):

Precheck: Pre-upgrade validation and readiness assessment Comprehensive analysis of cluster health, available updates, and upgrade blockers before initiating updates
Precheck-Specific: Targeted upgrade path validation Validates specific version availability and upgrade feasibility for planned update targets
No-Updates: Cluster health assessment at latest version Health monitoring and operational status when no updates are available in current channel
Progress: Real-time upgrade progress monitoring Tracks upgrade progress with component status, timeline analysis, and ETA calculations during active updates
Troubleshoot: Upgrade failure diagnosis and remediation Root cause analysis and conservative troubleshooting guidance for failed or stuck upgrade scenarios

Each scenario includes:

Complete analysis prompts with constraints and requirements
Full ClusterVersion YAML data as attachments
Full ClusterOperator YAML data as attachments
Expected responses with Summary and TL;DR sections
Real cluster data from production-like scenarios

These scenarios mirror the CONSOLE-5118 OLS integration workflow phases and provide the evaluation baseline for cluster update AI assistance.

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

Ref: openshift/console#16131

Summary by CodeRabbit

Documentation
- Expanded evaluation setup and usage guidance, including datasets, “What’s Included” details for cluster-updates (tags and conversation ranges), and direct links to the evaluation tool.
Chores
- Added a dedicated cluster-updates evaluation configuration with judge/metric settings and tuned output, CSV, and logging/telemetry behavior.
Tests / CI
- Added make test-cluster-updates and a new end-to-end cluster-updates evaluation test, plus a CI script to run the suite and validate generated artifacts and error-free summaries.

openshift-ci-robot · 2026-04-29T15:43:20Z

@fao89: This pull request references OTA-1927 which is a valid jira issue.

Details

In response to this:

Add comprehensive MCP test scenarios to evaluation dataset for validating OpenShift cluster update workflow AI responses. These scenarios establish quality benchmarks for LLM outputs across different update phases.

Test Scenarios Added (conv_798-802):

Precheck: Pre-upgrade validation and readiness assessment Comprehensive analysis of cluster health, available updates, and upgrade blockers before initiating updates

Precheck-Specific: Targeted upgrade path validation Validates specific version availability and upgrade feasibility for planned update targets

No-Updates: Cluster health assessment at latest version Health monitoring and operational status when no updates are available in current channel

Progress: Real-time upgrade progress monitoring Tracks upgrade progress with component status, timeline analysis, and ETA calculations during active updates

Troubleshoot: Upgrade failure diagnosis and remediation Root cause analysis and conservative troubleshooting guidance for failed or stuck upgrade scenarios

Each scenario includes:

Complete analysis prompts with constraints and requirements

Full ClusterVersion YAML data as attachments

Full ClusterOperator YAML data as attachments

Expected responses with Summary and TL;DR sections

Real cluster data from production-like scenarios

These scenarios mirror the CONSOLE-5118 OLS integration workflow phases and provide the evaluation baseline for cluster update AI assistance.

Co-Authored-By: Claude Sonnet 4.5 noreply@anthropic.com

Ref: openshift/console#16131

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-04-29T15:44:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign bparees for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2026-06-16T17:41:51Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds end-to-end infrastructure for cluster-updates evaluation tests. A new YAML configuration file (eval/system_cluster_updates.yaml) defines the LightSpeed evaluation framework parameters: OpenAI judge LLM settings, API query configuration, turn- and conversation-level metrics, output/visualization options, and logging control. A pytest test harness (tests/e2e/evaluation/test_cluster_updates.py) bootstraps dependencies, discovers the OLS endpoint, runs the evaluation subprocess, and validates artifact output. A CI shell script (tests/scripts/test-cluster-updates.sh) orchestrates the full pipeline: installing dependencies and operator-sdk, deploying OLS, running the evaluation suite, and managing cleanup. The README documents usage commands, dataset details, test categories, and both system configuration presets. A Makefile target wires the test into the build system.

Changes

Cluster-Updates Evaluation Setup

Layer / File(s)	Summary
Evaluation system configuration `eval/system_cluster_updates.yaml`	Defines LightSpeed evaluation configuration with OpenAI judge LLM (gpt-4-turbo, temperature, token/timeout limits), query-style API targeting local HTTPS server with optional tool and system-prompt overrides, turn-level default correctness metric plus optional GEval criteria for Kubernetes condition interpretation and output format compliance (Summary/TL;DR sections), conversation-level optional DeepEval metrics (completeness, relevancy, knowledge retention disabled by default), CSV output columns and result directory settings, visualization figure sizing and enabled graph types, and environment/logging configuration to suppress telemetry and control per-package log levels.
README: setup and usage documentation `eval/README.md`	Adds evaluation framework prerequisites (Python 3.11+) and setup link to Lightspeed evaluation tool; documents run commands for full, short, and cluster-updates evaluation variants with tag-based filtering example; expands "What's Included" section with explicit dataset file listings (short, full, cluster-updates), maps test-category tags to conversation ranges for cluster-updates, and provides detailed descriptions of both `system.yaml` and `system_cluster_updates.yaml` with their respective metrics and cluster-specific settings.
Pytest evaluation test harness `tests/e2e/evaluation/test_cluster_updates.py`	Implements pytest module that ensures lightspeed-eval binary installation, discovers OLS base URL from pytest config or environment, extracts optional bearer token from pytest client fixture, loads system configuration and overrides API base URL, writes temporary config file, runs lightspeed-eval subprocess against fixed eval data YAML and output directory, and validates success by asserting subprocess exit status, CSV/JSON artifact presence, and zero error count in summary JSON.
CI shell script and orchestration `tests/scripts/test-cluster-updates.sh`	Adds CI orchestration script with strict error handling that installs project/test dependencies, sources shared helper functions, detects host OS/arch and installs operator-sdk v1.36.1, reads OpenAI API key from environment, defines run_suites() function to deploy OLS and execute cluster_updates evaluation suite with OpenAI provider configuration, performs cleanup, manages artifact directories (creates temp directory with LOCAL_MODE=1 when outside Prow), and executes the full flow with cleanup-on-exit trap.
Makefile target and build integration `Makefile`	Updates PHONY target list and adds test-cluster-updates make target that invokes pytest with lseval and evaluation extras against tests/e2e/evaluation, writes JUnit XML to ARTIFACT_DIR keyed by SUITE_ID, and sets eval output mode to cluster_updates with results directory in ARTIFACT_DIR.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
No-Sensitive-Data-In-Logs	❌ Error	The code logs subprocess output (stdout/stderr) that may contain sensitive API keys. In tests/e2e/evaluation/test_cluster_updates.py lines 100-104, the stdout and stderr from the lightspeed-eval su...	Filter sensitive data from subprocess output before printing. Either avoid printing stderr/stdout from subprocesses that receive API_KEY env vars, or redact sensitive patterns like "API_KEY=..." from output before logging.
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title refers to cluster update evaluation but the changeset primarily adds test infrastructure, configuration files, and documentation for cluster-updates evaluation, not just the prompts themselves.	Consider a more specific title like 'Add cluster-updates evaluation test infrastructure and data' or 'OTA-1927: Add cluster-updates evaluation tests and configuration' to accurately reflect the comprehensive nature of the changes.

✅ Passed checks (12 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	This PR contains no Ginkgo tests. The check for "Stable and Deterministic Test Names" applies to Ginkgo test patterns (It(), Describe(), etc.), but this PR only adds Python pytest tests and bash sc...
Test Structure And Quality	✅ Passed	Custom check for Ginkgo test code quality is not applicable to this PR. The repository is Python-based using pytest for testing, not a Go project using Ginkgo. No Go or Ginkgo tests exist.
Microshift Test Compatibility	✅ Passed	PR adds pytest and shell script tests, not Ginkgo e2e tests. The custom check applies only to Ginkgo tests (It(), Describe(), etc.), which are absent here.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	This PR does not add any Ginkgo e2e tests. All new tests are Python pytest tests (test_cluster_updates.py) and shell scripts (test-cluster-updates.sh), not Go/Ginkgo-based tests. The custom check i...
Topology-Aware Scheduling Compatibility	✅ Passed	This PR adds test infrastructure and evaluation configuration only—no deployment manifests, operators, controllers, or scheduling constraints were introduced or modified.
Ote Binary Stdout Contract	✅ Passed	OTE Binary Stdout Contract check is not applicable: PR adds Python pytest tests and shell CI scripts, not OTE (Go) binaries. Print statements are inside pytest test functions (allowed).
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	No Ginkgo e2e tests added in this PR. Changes include Python pytest tests, shell scripts, and config files without IPv4 assumptions or IPv6-incompatible networking patterns.
No-Weak-Crypto	✅ Passed	No weak cryptographic algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons detected in PR-modified files.
Container-Privileges	✅ Passed	No privileged container configurations found in any PR files. The modified files contain test infrastructure, documentation, and evaluation data—not Kubernetes manifests with container security con...

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@eval/README.md`:
- Around line 54-63: The cluster-updates example commands in the eval/README.md
reference system_cluster_updates.yaml which uses https://localhost:8080, but the
local setup starts OLS at http://localhost:8080, causing a TLS mismatch. Add a
clarifying note in the README near these example commands explaining that for
local runs, users need to either modify the api_base setting in
system_cluster_updates.yaml to use http instead of https, or provide
instructions pointing to a separate local cluster-updates configuration preset
that uses HTTP. This will prevent users from encountering immediate
connection/TLS failures when attempting to run these commands locally.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8d9c1f51-f25b-40f6-b7f8-eba854c9da4a

📥 Commits

Reviewing files that changed from the base of the PR and between a8aa7a8 and 2064cd9.

📒 Files selected for processing (3)

eval/README.md
eval/eval_data_cluster_updates.yaml
eval/system_cluster_updates.yaml

fao89 · 2026-06-17T17:38:15Z

/cc @sriroopar @rioloc

sriroopar · 2026-06-17T20:23:47Z

Turn metrics need to be defined for every turn as necessary.
provider name needs to be standardized to openai.
https should be replaced with http.
all conversations have single tag - but readme suggests otherwise.

sriroopar · 2026-06-17T20:25:47Z

+      - Clear recommendation should be provided
+- conversation_group_id: conv_800
+  tag: cluster-updates-scenarios
+  turns:


Thank you very much for your PR Fabricio,:)

a major bug is that turn metrics is not set up for everyturn which will in turn not capture the metrics we may want to analyze. rest looks okay, dropped a couple minor mismatches in a comment.

fao89 · 2026-06-18T11:43:04Z

/cc @wking

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/scripts/test-cluster-updates.sh`:
- Line 34: Separate the variable assignment from the export statement on the
line that sets OPENAI_API_KEY to avoid masking errors from the cat command.
First assign the output of cat "$OPENAI_PROVIDER_KEY_PATH" to a temporary
variable or directly capture it, then check that the command succeeded before
exporting OPENAI_API_KEY. This ensures that if the cat command fails due to a
missing file or permission issues, the error is immediately visible rather than
causing a cryptic authentication error later.
- Around line 24-25: The export statements on lines 24 and 25 combine variable
assignment with command substitution, which masks failures if the underlying
commands fail. Separate the command substitution from the export statement for
both ARCH and OS variables. First, assign the result of the command substitution
to the variable without exporting (e.g., ARCH=$(case $(uname -m) in ... esac)),
then add error checking to verify the command succeeded (e.g., using [ -z
"$ARCH" ] or checking the exit code with ||), and only then export the variable.
If the command fails, exit with an error message to prevent incorrect values
from being used when constructing OPERATOR_SDK_DL_URL on line 26.
- Around line 60-63: The export statement combined with the mktemp -d command
substitution masks failures. If mktemp -d fails, the export still succeeds with
an empty or invalid value. Separate the command substitution from the export by
first assigning the mktemp -d result to ARTIFACT_DIR variable, add error
checking to ensure mktemp succeeded before proceeding, and then export the
variable separately. This ensures that if mktemp -d fails (due to permission
issues or lack of disk space), the script properly detects and handles the error
instead of continuing with an invalid ARTIFACT_DIR path.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5c5dea0a-d4ff-446e-98f7-eb2bcb4814a1

📥 Commits

Reviewing files that changed from the base of the PR and between 5cc6965 and ce2dccf.

📒 Files selected for processing (6)

Makefile
eval/README.md
eval/eval_data_cluster_updates.yaml
eval/system_cluster_updates.yaml
tests/e2e/evaluation/test_cluster_updates.py
tests/scripts/test-cluster-updates.sh

✅ Files skipped from review due to trivial changes (1)

eval/README.md

🚧 Files skipped from review as they are similar to previous changes (1)

eval/system_cluster_updates.yaml

Fix 'set_session cannot be used inside a transaction' error that occurred when storing multi-turn conversation history in PostgreSQL cache. Problem: - insert_or_append() and delete() methods set autocommit=False to start a transaction, then set it back to True in the finally block - If an exception occurs, the connection may still be in a transaction when autocommit=True is called - psycopg2 internally calls set_session() when changing autocommit, which fails if a transaction is active Solution: - Check connection transaction status before setting autocommit=True - Rollback any active transaction before changing autocommit setting - Ensures clean transition from transactional to autocommit mode Impact: - Multi-turn conversations now work correctly with PostgreSQL cache - No functional change for single-turn conversations - Evaluation tests can now test context retention and progressive refinement Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

fao89 · 2026-06-26T16:03:15Z

/retest-required

Add comprehensive MCP test scenarios to evaluation dataset for validating OpenShift cluster update workflow AI responses. These scenarios establish quality benchmarks for LLM outputs across different update phases. Test Scenarios Added (conv_798-802): - Precheck: Pre-upgrade validation and readiness assessment Comprehensive analysis of cluster health, available updates, and upgrade blockers before initiating updates - Precheck-Specific: Targeted upgrade path validation Validates specific version availability and upgrade feasibility for planned update targets - No-Updates: Cluster health assessment at latest version Health monitoring and operational status when no updates are available in current channel - Progress: Real-time upgrade progress monitoring Tracks upgrade progress with component status, timeline analysis, and ETA calculations during active updates - Troubleshoot: Upgrade failure diagnosis and remediation Root cause analysis and conservative troubleshooting guidance for failed or stuck upgrade scenarios Each scenario includes: - Complete analysis prompts with constraints and requirements - Full ClusterVersion YAML data as attachments - Full ClusterOperator YAML data as attachments - Expected responses with Summary and TL;DR sections - Real cluster data from production-like scenarios These scenarios mirror the CONSOLE-5118 OLS integration workflow phases and provide the evaluation baseline for cluster update AI assistance. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Fabricio Aguiar <fabricio.aguiar@gmail.com> rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED

openshift-ci · 2026-06-26T19:23:33Z

@fao89: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 29, 2026

openshift-ci Bot requested review from blublinsky and raptorsun April 29, 2026 15:44

fao89 force-pushed the OTA-1927 branch from d564306 to 2064cd9 Compare June 16, 2026 17:41

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

Comment thread eval/README.md Outdated

fao89 force-pushed the OTA-1927 branch 2 times, most recently from e1c10db to 47bc7ce Compare June 16, 2026 17:55

openshift-ci Bot requested review from cambelem and sriroopar June 17, 2026 17:37

openshift-ci Bot requested a review from rioloc June 17, 2026 17:38

sriroopar reviewed Jun 17, 2026

View reviewed changes

fao89 force-pushed the OTA-1927 branch from 47bc7ce to 5cc6965 Compare June 18, 2026 11:38

openshift-ci Bot requested a review from wking June 18, 2026 11:43

fao89 force-pushed the OTA-1927 branch from 5cc6965 to ce2dccf Compare June 18, 2026 17:13

openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 18, 2026

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread tests/scripts/test-cluster-updates.sh Outdated

Comment thread tests/scripts/test-cluster-updates.sh Outdated

Comment thread tests/scripts/test-cluster-updates.sh

fao89 force-pushed the OTA-1927 branch from ce2dccf to 771a779 Compare June 23, 2026 12:53

openshift-ci Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 23, 2026

fao89 force-pushed the OTA-1927 branch 6 times, most recently from db695e0 to d0cc961 Compare June 26, 2026 08:00

fao89 force-pushed the OTA-1927 branch 2 times, most recently from 4bb0e70 to cc7c8aa Compare June 26, 2026 11:31

fao89 force-pushed the OTA-1927 branch from cc7c8aa to efe3573 Compare June 26, 2026 13:58

fao89 requested a review from sriroopar June 26, 2026 13:58

fao89 force-pushed the OTA-1927 branch from efe3573 to b05a3ad Compare June 26, 2026 17:44

Uh oh!

Conversation

fao89 commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Apr 29, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci Bot commented Apr 29, 2026

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Pre-merge checks failed

❌ Failed checks (1 error, 1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fao89 commented Jun 17, 2026

Uh oh!

sriroopar commented Jun 17, 2026

Uh oh!

sriroopar Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

fao89 commented Jun 18, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fao89 commented Jun 26, 2026

Uh oh!

openshift-ci Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fao89 commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented Apr 29, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading