OLS-3311 Bundle embedding models for hermetic build by blublinsky · Pull Request #2961 · openshift/lightspeed-service

blublinsky · 2026-06-23T09:17:50Z

Description

Summary
Decouples the OLS service image from the lightspeed-rag-content image by bundling
embedding model configs directly in this repo and fetching model binaries via Cachi2
generic prefetch for hermetic builds.

Removes the lightspeed-rag-content multi-stage dependency from Containerfile
Adds sentence-transformers/all-mpnet-base-v2 (BYOK/FAISS) and
ibm-granite/granite-embedding-30m-english (OKP/Solr) model configs and tokenizers
Pre-populates the HuggingFace cache at build time so models can be loaded by ID
with TRANSFORMERS_OFFLINE=1
Adds artifacts.lock.yaml for Cachi2 generic artifact prefetch of model binaries
Updates Tekton pipelines to include generic prefetch stage
Adds make get-embeddings target for local (non-hermetic) development

Build Modes

Hermetic (Konflux/Cachi2): Binaries prefetched to /cachi2/output/deps/generic/,
copied into place during image build
Local (non-hermetic): Run make get-embeddings before docker build; build
fails fast with clear error if binaries are missing

Test plan

Unit tests pass (1058 passed)
Ruff lint clean
MyPy type check clean
Verify Cachi2 prefetch works in Konflux PR pipeline
Verify container image loads both models offline (TRANSFORMERS_OFFLINE=1)

Type of change

Related Tickets & Documents

Related Issue #
https://redhat.atlassian.net/browse/OLS-3311
Closes #
https://redhat.atlassian.net/browse/OLS-3311

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Chores
- Bundle embedding model assets directly in the container for more reliable offline startup.
- Add a locked manifest for embedding weight downloads.
- Replace the old retrieval flow with dedicated embedding download targets and remove the temporary copy approach.
CI/CD
- Adjust dependency prefetch inputs by adding a generic entry before existing entries.
Configuration
- Update the example to point to the specific embedding model subdirectory.
- Refine ignore rules for embedding model artifacts (including versioned folders).

coderabbitai · 2026-06-23T09:18:04Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds tracked embedding model configs, pins weight artifacts, updates build and pipeline inputs, and switches runtime and Makefile flows to provision embedding files from the build context and HuggingFace downloads.

Changes

Embedding Model Provisioning Overhaul

Layer / File(s)	Summary
Model config files and ignores `embeddings_model/all-mpnet-base-v2/`, `embeddings_model/granite-embedding-30m-english/`, `.gitignore`, `.woke.yaml`	Adds model config JSON for both embedding models and narrows ignore rules so metadata stays tracked while weight files remain excluded from tool processing.
Artifact lock and prefetch inputs `artifacts.lock.yaml`, `.tekton/lightspeed-service-pull-request.yaml`, `.tekton/lightspeed-service-push.yaml`	Adds pinned artifact metadata for two model weight files and inserts a generic prefetch input into both Tekton pipeline definitions.
Runtime image weight handling `Containerfile`	Sets HuggingFace cache location, copies embedding model files from the build context, overlays prefetched weights when available, downloads missing weights, and primes the local HuggingFace cache layout for both models.
Makefile targets and example path `Makefile`, `examples/olsconfig.yaml`	Replaces the old image-copy target with embedding download targets and updates the example configuration to point to the `all-mpnet-base-v2` model directory.

Sequence Diagram

sequenceDiagram
  participant TektonPipeline as Tekton Pipeline
  participant Cachi2 as Generic Prefetch
  participant ContainerBuild as Container Build
  participant HuggingFace as HuggingFace Hub
  participant RuntimeImage as Runtime Image

  TektonPipeline->>Cachi2: prefetch inputs from artifacts.lock.yaml
  Cachi2-->>TektonPipeline: cached model weights
  TektonPipeline->>ContainerBuild: build image with cache volume
  ContainerBuild->>ContainerBuild: copy embeddings_model/
  alt prefetched weights available
    ContainerBuild->>ContainerBuild: overlay .safetensors files
  else missing weights
    ContainerBuild->>HuggingFace: curl resolve URLs
    HuggingFace-->>ContainerBuild: model.safetensors
  end
  ContainerBuild->>RuntimeImage: create HuggingFace cache refs
  ContainerBuild->>RuntimeImage: symlink local model snapshots

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: bundling embedding models to support a hermetic build.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

Containerfile (1)
78-79: 🩺 Stability & Availability | 🔵 Trivial

TRANSFORMERS_OFFLINE is already set in application code; consider making it container-level for unconditional offline mode.

The application sets TRANSFORMERS_OFFLINE=1 at runtime (line 59 of runner.py), but only when embeddings_model_path is configured. Moving this to the container's ENV block would ensure offline mode is enforced regardless of configuration state, improving reliability in disconnected environments. Note: HF_HUB_OFFLINE is not used in this codebase.
Proposed Containerfile update (conditional approach)
     LANG=en_US.UTF-8 \
     LLAMA_INDEX_CACHE_DIR=/tmp/llama_index \
     TIKTOKEN_CACHE_DIR=/app-root/.tiktoken_cache \
-    HF_HOME=/app-root/.cache/huggingface
+    HF_HOME=/app-root/.cache/huggingface \
+    TRANSFORMERS_OFFLINE=1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Containerfile` around lines 78 - 79, Add TRANSFORMERS_OFFLINE=1 to the
Containerfile's ENV block alongside the existing environment variables like
TIKTOKEN_CACHE_DIR and HF_HOME. This ensures offline mode is enforced at the
container level unconditionally, rather than being conditionally set in
runner.py based on configuration state. This improves reliability in
disconnected environments by guaranteeing the offline mode is always active
regardless of how the application is configured.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Makefile`:
- Around line 149-155: The curl commands in the get-embeddings-byok and
get-embeddings-okp targets lack error handling safeguards, allowing HTTP errors
(404/403) to be written to the model.safetensors files while appearing
successful. Add the --fail flag to both curl commands to make them fail on HTTP
errors, and optionally add --retry and --connect-timeout flags to handle
transient failures and make the downloads fail fast and predictably instead of
silently writing corrupt files.
- Around line 147-155: The targets get-embeddings, get-embeddings-byok, and
get-embeddings-okp are phony targets that should be declared in the .PHONY
directive to prevent Make from treating them as file targets. Add these three
target names to the .PHONY declaration in the Makefile (typically near the
beginning of the file where other phony targets are declared) so that Make
correctly handles them as non-file targets regardless of whether files with
those names exist.

---

Nitpick comments:
In `@Containerfile`:
- Around line 78-79: Add TRANSFORMERS_OFFLINE=1 to the Containerfile's ENV block
alongside the existing environment variables like TIKTOKEN_CACHE_DIR and
HF_HOME. This ensures offline mode is enforced at the container level
unconditionally, rather than being conditionally set in runner.py based on
configuration state. This improves reliability in disconnected environments by
guaranteeing the offline mode is always active regardless of how the application
is configured.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 1947a343-4686-4ebf-92a0-9e4b82c866fa

📥 Commits

Reviewing files that changed from the base of the PR and between 724097c and 2a64543.

📒 Files selected for processing (22)

.gitignore
.tekton/lightspeed-service-pull-request.yaml
.tekton/lightspeed-service-push.yaml
Containerfile
Makefile
artifacts.lock.yaml
embeddings_model/all-mpnet-base-v2/1_Pooling/config.json
embeddings_model/all-mpnet-base-v2/config.json
embeddings_model/all-mpnet-base-v2/config_sentence_transformers.json
embeddings_model/all-mpnet-base-v2/modules.json
embeddings_model/all-mpnet-base-v2/sentence_bert_config.json
embeddings_model/all-mpnet-base-v2/special_tokens_map.json
embeddings_model/all-mpnet-base-v2/tokenizer.json
embeddings_model/all-mpnet-base-v2/tokenizer_config.json
embeddings_model/granite-embedding-30m-english/1_Pooling/config.json
embeddings_model/granite-embedding-30m-english/config.json
embeddings_model/granite-embedding-30m-english/modules.json
embeddings_model/granite-embedding-30m-english/sentence_bert_config.json
embeddings_model/granite-embedding-30m-english/special_tokens_map.json
embeddings_model/granite-embedding-30m-english/tokenizer.json
embeddings_model/granite-embedding-30m-english/tokenizer_config.json
examples/olsconfig.yaml

blublinsky · 2026-06-23T12:02:38Z

/retest ci/prow/ols-evaluation

vimalk78 · 2026-06-23T12:07:41Z

are we totally removing rag content from ols service?

blublinsky · 2026-06-23T12:38:16Z

are we totally removing rag content from ols service?

Yes, FAISS based OCP one is being replaced by OKP
The remaining one is BYOK

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Containerfile`:
- Around line 94-96: The urllib.request.urlretrieve call in the python3 -c block
lacks socket timeout and retry logic, making the build vulnerable to network
issues. Add a socket timeout parameter to the urlretrieve call and implement
retry logic with exponential backoff around the download attempt to match the
resilience provided by the curl-based approach in the Makefile get-embeddings
targets. This will prevent transient network hiccups from failing the entire
build.
- Around line 86-96: The Containerfile does not validate SHA-256 checksums for
the safetensors files despite them being pinned in artifacts.lock.yaml, and the
urllib.request.urlretrieve fallback contradicts hermetic-build guarantees by
allowing unverified network downloads. Extract the pinned SHA-256 hashes from
artifacts.lock.yaml for both all-mpnet-base-v2-model.safetensors and
granite-embedding-30m-english-model.safetensors, then modify the RUN block to
compute and verify the SHA-256 hash of each file after the cachi2 copy or
download using sha256sum, comparing against the pinned values and failing the
build if they do not match. Remove the urllib.request.urlretrieve fallback
entirely and replace it with an explicit error that exits the build if the
expected file is not found in the cachi2 output directory.
- Around line 99-108: The cache-priming RUN block that loops through model_dir
variables (all-mpnet-base-v2 and granite-embedding-30m-english) creates a
non-standard Hugging Face cache layout. The refs/main file contains the literal
string "local" instead of a commit hash, and the symlinks are absolute instead
of relative paths pointing to a blobs directory. Either verify that the fallback
case returning "local:sentence-transformers/all-mpnet-base-v2" is never used in
production and remove the cache-priming block, or correct the cache structure by
replacing the literal "local" string with a proper commit hash in refs/main and
changing the symlinks to relative paths that point to a standard blobs directory
layout, aligning with the official Hugging Face Hub cache specification.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2b6c3dca-84b8-489d-8c83-318e05ac92c1

📥 Commits

Reviewing files that changed from the base of the PR and between bcf19fb and 792e44b.

📒 Files selected for processing (23)

.gitignore
.tekton/lightspeed-service-pull-request.yaml
.tekton/lightspeed-service-push.yaml
.woke.yaml
Containerfile
Makefile
artifacts.lock.yaml
embeddings_model/all-mpnet-base-v2/1_Pooling/config.json
embeddings_model/all-mpnet-base-v2/config.json
embeddings_model/all-mpnet-base-v2/config_sentence_transformers.json
embeddings_model/all-mpnet-base-v2/modules.json
embeddings_model/all-mpnet-base-v2/sentence_bert_config.json
embeddings_model/all-mpnet-base-v2/special_tokens_map.json
embeddings_model/all-mpnet-base-v2/tokenizer.json
embeddings_model/all-mpnet-base-v2/tokenizer_config.json
embeddings_model/granite-embedding-30m-english/1_Pooling/config.json
embeddings_model/granite-embedding-30m-english/config.json
embeddings_model/granite-embedding-30m-english/modules.json
embeddings_model/granite-embedding-30m-english/sentence_bert_config.json
embeddings_model/granite-embedding-30m-english/special_tokens_map.json
embeddings_model/granite-embedding-30m-english/tokenizer.json
embeddings_model/granite-embedding-30m-english/tokenizer_config.json
examples/olsconfig.yaml

✅ Files skipped from review due to trivial changes (13)

embeddings_model/granite-embedding-30m-english/sentence_bert_config.json
examples/olsconfig.yaml
embeddings_model/all-mpnet-base-v2/sentence_bert_config.json
embeddings_model/granite-embedding-30m-english/1_Pooling/config.json
embeddings_model/all-mpnet-base-v2/config_sentence_transformers.json
embeddings_model/all-mpnet-base-v2/tokenizer_config.json
embeddings_model/all-mpnet-base-v2/modules.json
embeddings_model/granite-embedding-30m-english/modules.json
embeddings_model/all-mpnet-base-v2/special_tokens_map.json
embeddings_model/all-mpnet-base-v2/config.json
artifacts.lock.yaml
embeddings_model/granite-embedding-30m-english/config.json
.gitignore

🚧 Files skipped from review as they are similar to previous changes (7)

.tekton/lightspeed-service-push.yaml
embeddings_model/granite-embedding-30m-english/special_tokens_map.json
.woke.yaml
embeddings_model/granite-embedding-30m-english/tokenizer_config.json
embeddings_model/all-mpnet-base-v2/1_Pooling/config.json
Makefile
.tekton/lightspeed-service-pull-request.yaml

coderabbitai · 2026-06-23T18:04:28Z

+RUN if [ -d /cachi2/output/deps/generic ]; then \
+    cp /cachi2/output/deps/generic/all-mpnet-base-v2-model.safetensors embeddings_model/all-mpnet-base-v2/model.safetensors && \
+    cp /cachi2/output/deps/generic/granite-embedding-30m-english-model.safetensors embeddings_model/granite-embedding-30m-english/model.safetensors ; \
+    fi && \
+    for f in \
+      "embeddings_model/all-mpnet-base-v2/model.safetensors|https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/model.safetensors" \
+      "embeddings_model/granite-embedding-30m-english/model.safetensors|https://huggingface.co/ibm-granite/granite-embedding-30m-english/resolve/main/model.safetensors" \
+    ; do \
+      path="${f%%|*}" && url="${f##*|}" && \
+      [ -f "$path" ] || python3 -c "import urllib.request,sys; urllib.request.urlretrieve(sys.argv[1], sys.argv[2])" "$url" "$path" ; \
+    done


🔒 Security & Privacy | 🟠 Major | ⚡ Quick win

Pinned SHA-256 checksums are never verified, and the HTTP fallback undermines the hermetic-build guarantee.

Two concerns on this block:

No checksum validation. artifacts.lock.yaml pins sha256 for both safetensors, but neither the cachi2 copy (Lines 87-88) nor the HTTP download (Line 95) validates the bytes against those hashes. The PR summary states the Containerfile "validates checksums," but it does not — this leaves a supply-chain integrity gap for the bundled weights.

Network fallback contradicts the stated build modes. In a hermetic Konflux/Cachi2 build, egress is blocked, so the urllib fallback cannot run; in local builds the PR objective says the build should "fail fast with a clear error if binaries are missing," yet this code instead silently downloads unverified weights from HuggingFace. The fallback both defeats hermeticity and diverges from the documented behavior.

Recommend validating against the pinned hashes and making the missing-binary case an explicit failure rather than a network fetch.

🛡️ Sketch: verify checksums and fail fast instead of fetching

RUN if [ -d /cachi2/output/deps/generic ]; then \ cp /cachi2/output/deps/generic/all-mpnet-base-v2-model.safetensors embeddings_model/all-mpnet-base-v2/model.safetensors && \ cp /cachi2/output/deps/generic/granite-embedding-30m-english-model.safetensors embeddings_model/granite-embedding-30m-english/model.safetensors ; \ fi && \ for f in \ - "embeddings_model/all-mpnet-base-v2/model.safetensors|https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/model.safetensors" \ - "embeddings_model/granite-embedding-30m-english/model.safetensors|https://huggingface.co/ibm-granite/granite-embedding-30m-english/resolve/main/model.safetensors" \ + "embeddings_model/all-mpnet-base-v2/model.safetensors|78c0197b6159d92658e319bc1d72e4c73a9a03dd03815e70e555c5ef05615658" \ + "embeddings_model/granite-embedding-30m-english/model.safetensors|3b1fcdc9c5eb954f603bc386474e321505ff29c6c67f21e3aa8db3d2d1a533cf" \ ; do \ - path="${f%%|*}" && url="${f##*|}" && \ - [ -f "$path" ] || python3 -c "import urllib.request,sys; urllib.request.urlretrieve(sys.argv[1], sys.argv[2])" "$url" "$path" ; \ + path="${f%%|*}" && want="${f##*|}" && \ + { [ -f "$path" ] || { echo "Missing $path; run 'make get-embeddings' or provide cachi2 artifacts" >&2; exit 1; } ; } && \ + echo "${want} ${path}" | sha256sum -c - ; \ done

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

RUN if [ -d /cachi2/output/deps/generic ]; then \

cp /cachi2/output/deps/generic/all-mpnet-base-v2-model.safetensors embeddings_model/all-mpnet-base-v2/model.safetensors && \

cp /cachi2/output/deps/generic/granite-embedding-30m-english-model.safetensors embeddings_model/granite-embedding-30m-english/model.safetensors ; \

fi && \

for f in \

"embeddings_model/all-mpnet-base-v2/model.safetensors|https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/model.safetensors" \

"embeddings_model/granite-embedding-30m-english/model.safetensors|https://huggingface.co/ibm-granite/granite-embedding-30m-english/resolve/main/model.safetensors" \

; do \

path="${f%%|*}" && url="${f##*|}" && \

[ -f "$path" ] || python3 -c "import urllib.request,sys; urllib.request.urlretrieve(sys.argv[1], sys.argv[2])" "$url" "$path" ; \

done

RUN if [ -d /cachi2/output/deps/generic ]; then \

cp /cachi2/output/deps/generic/all-mpnet-base-v2-model.safetensors embeddings_model/all-mpnet-base-v2/model.safetensors && \

cp /cachi2/output/deps/generic/granite-embedding-30m-english-model.safetensors embeddings_model/granite-embedding-30m-english/model.safetensors ; \

fi && \

for f in \

"embeddings_model/all-mpnet-base-v2/model.safetensors|78c0197b6159d92658e319bc1d72e4c73a9a03dd03815e70e555c5ef05615658" \

"embeddings_model/granite-embedding-30m-english/model.safetensors|3b1fcdc9c5eb954f603bc386474e321505ff29c6c67f21e3aa8db3d2d1a533cf" \

; do \

path="${f%%|*}" && want="${f##*|}" && \

{ [ -f "$path" ] || { echo "Missing $path; run 'make get-embeddings' or provide cachi2 artifacts" >&2; exit 1; } ; } && \

echo "${want} ${path}" | sha256sum -c - ; \

done

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Containerfile` around lines 86 - 96, The Containerfile does not validate SHA-256 checksums for the safetensors files despite them being pinned in artifacts.lock.yaml, and the urllib.request.urlretrieve fallback contradicts hermetic-build guarantees by allowing unverified network downloads. Extract the pinned SHA-256 hashes from artifacts.lock.yaml for both all-mpnet-base-v2-model.safetensors and granite-embedding-30m-english-model.safetensors, then modify the RUN block to compute and verify the SHA-256 hash of each file after the cachi2 copy or download using sha256sum, comparing against the pinned values and failing the build if they do not match. Remove the urllib.request.urlretrieve fallback entirely and replace it with an explicit error that exits the build if the expected file is not found in the cachi2 output directory.

checksum is done by konflux prefetch task, no need here.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Containerfile`:
- Around line 101-111: Add an integration test that validates offline model
loading functionality works with the pre-populated HuggingFace cache. The test
should verify that models can be successfully loaded with the
TRANSFORMERS_OFFLINE=1 environment variable set, and specifically confirm that
the fallback model ID (local:sentence-transformers/all-mpnet-base-v2) loads
correctly using the cache structure created by the Containerfile cache priming
block shown in the diff. This test should confirm the end-to-end contract
between the cache structure created by the RUN command (which sets up the model
directories, snapshots, and refs/main pointers) and the actual model loading
behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ef95f73b-a667-499c-82e9-f79cc958adb5

📥 Commits

Reviewing files that changed from the base of the PR and between 792e44b and 1b18b4e.

📒 Files selected for processing (23)

.gitignore
.tekton/lightspeed-service-pull-request.yaml
.tekton/lightspeed-service-push.yaml
.woke.yaml
Containerfile
Makefile
artifacts.lock.yaml
embeddings_model/all-mpnet-base-v2/1_Pooling/config.json
embeddings_model/all-mpnet-base-v2/config.json
embeddings_model/all-mpnet-base-v2/config_sentence_transformers.json
embeddings_model/all-mpnet-base-v2/modules.json
embeddings_model/all-mpnet-base-v2/sentence_bert_config.json
embeddings_model/all-mpnet-base-v2/special_tokens_map.json
embeddings_model/all-mpnet-base-v2/tokenizer.json
embeddings_model/all-mpnet-base-v2/tokenizer_config.json
embeddings_model/granite-embedding-30m-english/1_Pooling/config.json
embeddings_model/granite-embedding-30m-english/config.json
embeddings_model/granite-embedding-30m-english/modules.json
embeddings_model/granite-embedding-30m-english/sentence_bert_config.json
embeddings_model/granite-embedding-30m-english/special_tokens_map.json
embeddings_model/granite-embedding-30m-english/tokenizer.json
embeddings_model/granite-embedding-30m-english/tokenizer_config.json
examples/olsconfig.yaml

✅ Files skipped from review due to trivial changes (14)

embeddings_model/all-mpnet-base-v2/sentence_bert_config.json
examples/olsconfig.yaml
embeddings_model/granite-embedding-30m-english/modules.json
embeddings_model/all-mpnet-base-v2/modules.json
embeddings_model/granite-embedding-30m-english/sentence_bert_config.json
embeddings_model/granite-embedding-30m-english/config.json
embeddings_model/all-mpnet-base-v2/config.json
embeddings_model/all-mpnet-base-v2/1_Pooling/config.json
embeddings_model/all-mpnet-base-v2/tokenizer_config.json
embeddings_model/all-mpnet-base-v2/special_tokens_map.json
.woke.yaml
embeddings_model/granite-embedding-30m-english/tokenizer_config.json
.gitignore
embeddings_model/all-mpnet-base-v2/config_sentence_transformers.json

🚧 Files skipped from review as they are similar to previous changes (6)

.tekton/lightspeed-service-pull-request.yaml
embeddings_model/granite-embedding-30m-english/1_Pooling/config.json
embeddings_model/granite-embedding-30m-english/special_tokens_map.json
artifacts.lock.yaml
Makefile
.tekton/lightspeed-service-push.yaml

coderabbitai · 2026-06-23T21:11:03Z

+# Pre-populate HuggingFace cache so models can be loaded by ID with TRANSFORMERS_OFFLINE=1
+RUN for model_dir in all-mpnet-base-v2 granite-embedding-30m-english; do \
+    case "$model_dir" in \
+      all-mpnet-base-v2) hf_id="sentence-transformers--all-mpnet-base-v2" ;; \
+      granite-embedding-30m-english) hf_id="ibm-granite--granite-embedding-30m-english" ;; \
+    esac && \
+    repo_dir="/app-root/.cache/huggingface/hub/models--${hf_id}" && \
+    mkdir -p "$repo_dir/snapshots/local" "$repo_dir/refs" && \
+    echo "local" > "$repo_dir/refs/main" && \
+    ln -sf "/app-root/embeddings_model/${model_dir}"/* "$repo_dir/snapshots/local/" ; \
+    done


🩺 Stability & Availability | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Read-only verification script: confirm fallback path and cache priming expectations align. set -euo pipefail # 1) Confirm fallback model ID used when embeddings_model_path is unset rg -n 'local:sentence-transformers/all-mpnet-base-v2|_get_embed_model|embeddings_model_path' \ ols/src/rag_index/index_loader.py tests/unit/rag_index/test_index_loader.py -C2 # 2) Confirm runtime env wiring relevant to offline behavior rg -n 'HF_HOME|TRANSFORMERS_OFFLINE|TRANSFORMERS_CACHE|configure_hugging_face_envs' \ Containerfile ols/utils/environments.py -C2 # 3) Confirm cache-priming block exists exactly as expected in image build recipe sed -n '96,116p' Containerfile

Repository: openshift/lightspeed-service

Length of output: 3917

Add an explicit integration test validating offline model loading with the pre-populated cache.

The fallback model ID local:sentence-transformers/all-mpnet-base-v2 and environment configuration for TRANSFORMERS_OFFLINE=1 are correctly implemented, and the cache priming block in the Containerfile properly synthesizes the expected HuggingFace hub structure. However, there is no integration test confirming that models can actually be loaded offline using this pre-populated cache. Since the PR marks offline model loading as pending verification, add a test that validates the end-to-end contract: the fallback model loads successfully with TRANSFORMERS_OFFLINE=1 and the cache structure created by the Containerfile.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Containerfile` around lines 101 - 111, Add an integration test that validates offline model loading functionality works with the pre-populated HuggingFace cache. The test should verify that models can be successfully loaded with the TRANSFORMERS_OFFLINE=1 environment variable set, and specifically confirm that the fallback model ID (local:sentence-transformers/all-mpnet-base-v2) loads correctly using the cache structure created by the Containerfile cache priming block shown in the diff. This test should confirm the end-to-end contract between the cache structure created by the RUN command (which sets up the model directories, snapshots, and refs/main pointers) and the actual model loading behavior.

blublinsky · 2026-06-24T07:13:46Z

/retest

raptorsun · 2026-06-24T09:01:09Z

      - product_docs_index_path: "./vector_db/user_application_docs/version_1"
        product_docs_index_id: user-application-docs-version_1
-    embeddings_model_path: "./embeddings_model"
+    embeddings_model_path: "./embeddings_model/all-mpnet-base-v2"


a reminder for corresponding change in operator.

raptorsun · 2026-06-24T09:05:42Z

/lgtm
looks good to me.
we may need update the operator to reflect the change of embedding model's path to make sure the bundle work with new embedding model locations.

raptorsun · 2026-06-24T12:38:25Z

/lgtm

blublinsky · 2026-06-24T13:58:24Z

/approve

openshift-ci · 2026-06-24T14:02:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: blublinsky

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [blublinsky]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2026-06-24T14:21:06Z

@blublinsky: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci Bot requested review from onmete and xrajesh June 23, 2026 09:18

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

Comment thread Makefile

Comment thread Makefile

blublinsky force-pushed the embedding-hermetic-build branch 3 times, most recently from 9ee8229 to bcf19fb Compare June 23, 2026 10:28

blublinsky force-pushed the embedding-hermetic-build branch from bcf19fb to 792e44b Compare June 23, 2026 17:53

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

blublinsky force-pushed the embedding-hermetic-build branch from 792e44b to 1b18b4e Compare June 23, 2026 21:07

coderabbitai Bot reviewed Jun 23, 2026

View reviewed changes

blublinsky force-pushed the embedding-hermetic-build branch from 1b18b4e to fee7518 Compare June 24, 2026 08:01

raptorsun reviewed Jun 24, 2026

View reviewed changes

openshift-ci Bot assigned raptorsun Jun 24, 2026

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 24, 2026

blublinsky force-pushed the embedding-hermetic-build branch from fee7518 to 4cb74db Compare June 24, 2026 10:23

openshift-ci Bot removed the lgtm Indicates that a PR is ready to be merged. label Jun 24, 2026

OLS-3311 Bundle embedding models for hermetic build

25a520f

blublinsky force-pushed the embedding-hermetic-build branch from 4cb74db to 25a520f Compare June 24, 2026 12:23

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 24, 2026

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 24, 2026

openshift-merge-bot Bot merged commit 8605212 into openshift:main Jun 24, 2026
12 checks passed

coderabbitai Bot mentioned this pull request Jun 26, 2026

Removing granite embedding dependency temporaily for EC fix. #2969

Open

14 tasks

Uh oh!

Conversation

blublinsky commented Jun 23, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

blublinsky commented Jun 23, 2026

Uh oh!

vimalk78 commented Jun 23, 2026

Uh oh!

blublinsky commented Jun 23, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

raptorsun Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

blublinsky commented Jun 24, 2026

Uh oh!

raptorsun Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

raptorsun commented Jun 24, 2026

Uh oh!

raptorsun commented Jun 24, 2026

Uh oh!

blublinsky commented Jun 24, 2026

Uh oh!

openshift-ci Bot commented Jun 24, 2026

Uh oh!

openshift-ci Bot commented Jun 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

blublinsky commented Jun 23, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 23, 2026 •

edited

Loading