OLS-3311 Bundle embedding models for hermetic build#2961
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds tracked embedding model configs, pins weight artifacts, updates build and pipeline inputs, and switches runtime and Makefile flows to provision embedding files from the build context and HuggingFace downloads. ChangesEmbedding Model Provisioning Overhaul
Sequence DiagramsequenceDiagram
participant TektonPipeline as Tekton Pipeline
participant Cachi2 as Generic Prefetch
participant ContainerBuild as Container Build
participant HuggingFace as HuggingFace Hub
participant RuntimeImage as Runtime Image
TektonPipeline->>Cachi2: prefetch inputs from artifacts.lock.yaml
Cachi2-->>TektonPipeline: cached model weights
TektonPipeline->>ContainerBuild: build image with cache volume
ContainerBuild->>ContainerBuild: copy embeddings_model/
alt prefetched weights available
ContainerBuild->>ContainerBuild: overlay .safetensors files
else missing weights
ContainerBuild->>HuggingFace: curl resolve URLs
HuggingFace-->>ContainerBuild: model.safetensors
end
ContainerBuild->>RuntimeImage: create HuggingFace cache refs
ContainerBuild->>RuntimeImage: symlink local model snapshots
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
Containerfile (1)
78-79: 🩺 Stability & Availability | 🔵 TrivialTRANSFORMERS_OFFLINE is already set in application code; consider making it container-level for unconditional offline mode.
The application sets
TRANSFORMERS_OFFLINE=1at runtime (line 59 ofrunner.py), but only whenembeddings_model_pathis configured. Moving this to the container'sENVblock would ensure offline mode is enforced regardless of configuration state, improving reliability in disconnected environments. Note:HF_HUB_OFFLINEis not used in this codebase.Proposed Containerfile update (conditional approach)
LANG=en_US.UTF-8 \ LLAMA_INDEX_CACHE_DIR=/tmp/llama_index \ TIKTOKEN_CACHE_DIR=/app-root/.tiktoken_cache \ - HF_HOME=/app-root/.cache/huggingface + HF_HOME=/app-root/.cache/huggingface \ + TRANSFORMERS_OFFLINE=1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Containerfile` around lines 78 - 79, Add TRANSFORMERS_OFFLINE=1 to the Containerfile's ENV block alongside the existing environment variables like TIKTOKEN_CACHE_DIR and HF_HOME. This ensures offline mode is enforced at the container level unconditionally, rather than being conditionally set in runner.py based on configuration state. This improves reliability in disconnected environments by guaranteeing the offline mode is always active regardless of how the application is configured.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Makefile`:
- Around line 149-155: The curl commands in the get-embeddings-byok and
get-embeddings-okp targets lack error handling safeguards, allowing HTTP errors
(404/403) to be written to the model.safetensors files while appearing
successful. Add the --fail flag to both curl commands to make them fail on HTTP
errors, and optionally add --retry and --connect-timeout flags to handle
transient failures and make the downloads fail fast and predictably instead of
silently writing corrupt files.
- Around line 147-155: The targets get-embeddings, get-embeddings-byok, and
get-embeddings-okp are phony targets that should be declared in the .PHONY
directive to prevent Make from treating them as file targets. Add these three
target names to the .PHONY declaration in the Makefile (typically near the
beginning of the file where other phony targets are declared) so that Make
correctly handles them as non-file targets regardless of whether files with
those names exist.
---
Nitpick comments:
In `@Containerfile`:
- Around line 78-79: Add TRANSFORMERS_OFFLINE=1 to the Containerfile's ENV block
alongside the existing environment variables like TIKTOKEN_CACHE_DIR and
HF_HOME. This ensures offline mode is enforced at the container level
unconditionally, rather than being conditionally set in runner.py based on
configuration state. This improves reliability in disconnected environments by
guaranteeing the offline mode is always active regardless of how the application
is configured.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 1947a343-4686-4ebf-92a0-9e4b82c866fa
📒 Files selected for processing (22)
.gitignore.tekton/lightspeed-service-pull-request.yaml.tekton/lightspeed-service-push.yamlContainerfileMakefileartifacts.lock.yamlembeddings_model/all-mpnet-base-v2/1_Pooling/config.jsonembeddings_model/all-mpnet-base-v2/config.jsonembeddings_model/all-mpnet-base-v2/config_sentence_transformers.jsonembeddings_model/all-mpnet-base-v2/modules.jsonembeddings_model/all-mpnet-base-v2/sentence_bert_config.jsonembeddings_model/all-mpnet-base-v2/special_tokens_map.jsonembeddings_model/all-mpnet-base-v2/tokenizer.jsonembeddings_model/all-mpnet-base-v2/tokenizer_config.jsonembeddings_model/granite-embedding-30m-english/1_Pooling/config.jsonembeddings_model/granite-embedding-30m-english/config.jsonembeddings_model/granite-embedding-30m-english/modules.jsonembeddings_model/granite-embedding-30m-english/sentence_bert_config.jsonembeddings_model/granite-embedding-30m-english/special_tokens_map.jsonembeddings_model/granite-embedding-30m-english/tokenizer.jsonembeddings_model/granite-embedding-30m-english/tokenizer_config.jsonexamples/olsconfig.yaml
9ee8229 to
bcf19fb
Compare
|
/retest ci/prow/ols-evaluation |
|
are we totally removing rag content from ols service? |
Yes, FAISS based OCP one is being replaced by OKP |
bcf19fb to
792e44b
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Containerfile`:
- Around line 94-96: The urllib.request.urlretrieve call in the python3 -c block
lacks socket timeout and retry logic, making the build vulnerable to network
issues. Add a socket timeout parameter to the urlretrieve call and implement
retry logic with exponential backoff around the download attempt to match the
resilience provided by the curl-based approach in the Makefile get-embeddings
targets. This will prevent transient network hiccups from failing the entire
build.
- Around line 86-96: The Containerfile does not validate SHA-256 checksums for
the safetensors files despite them being pinned in artifacts.lock.yaml, and the
urllib.request.urlretrieve fallback contradicts hermetic-build guarantees by
allowing unverified network downloads. Extract the pinned SHA-256 hashes from
artifacts.lock.yaml for both all-mpnet-base-v2-model.safetensors and
granite-embedding-30m-english-model.safetensors, then modify the RUN block to
compute and verify the SHA-256 hash of each file after the cachi2 copy or
download using sha256sum, comparing against the pinned values and failing the
build if they do not match. Remove the urllib.request.urlretrieve fallback
entirely and replace it with an explicit error that exits the build if the
expected file is not found in the cachi2 output directory.
- Around line 99-108: The cache-priming RUN block that loops through model_dir
variables (all-mpnet-base-v2 and granite-embedding-30m-english) creates a
non-standard Hugging Face cache layout. The refs/main file contains the literal
string "local" instead of a commit hash, and the symlinks are absolute instead
of relative paths pointing to a blobs directory. Either verify that the fallback
case returning "local:sentence-transformers/all-mpnet-base-v2" is never used in
production and remove the cache-priming block, or correct the cache structure by
replacing the literal "local" string with a proper commit hash in refs/main and
changing the symlinks to relative paths that point to a standard blobs directory
layout, aligning with the official Hugging Face Hub cache specification.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2b6c3dca-84b8-489d-8c83-318e05ac92c1
📒 Files selected for processing (23)
.gitignore.tekton/lightspeed-service-pull-request.yaml.tekton/lightspeed-service-push.yaml.woke.yamlContainerfileMakefileartifacts.lock.yamlembeddings_model/all-mpnet-base-v2/1_Pooling/config.jsonembeddings_model/all-mpnet-base-v2/config.jsonembeddings_model/all-mpnet-base-v2/config_sentence_transformers.jsonembeddings_model/all-mpnet-base-v2/modules.jsonembeddings_model/all-mpnet-base-v2/sentence_bert_config.jsonembeddings_model/all-mpnet-base-v2/special_tokens_map.jsonembeddings_model/all-mpnet-base-v2/tokenizer.jsonembeddings_model/all-mpnet-base-v2/tokenizer_config.jsonembeddings_model/granite-embedding-30m-english/1_Pooling/config.jsonembeddings_model/granite-embedding-30m-english/config.jsonembeddings_model/granite-embedding-30m-english/modules.jsonembeddings_model/granite-embedding-30m-english/sentence_bert_config.jsonembeddings_model/granite-embedding-30m-english/special_tokens_map.jsonembeddings_model/granite-embedding-30m-english/tokenizer.jsonembeddings_model/granite-embedding-30m-english/tokenizer_config.jsonexamples/olsconfig.yaml
✅ Files skipped from review due to trivial changes (13)
- embeddings_model/granite-embedding-30m-english/sentence_bert_config.json
- examples/olsconfig.yaml
- embeddings_model/all-mpnet-base-v2/sentence_bert_config.json
- embeddings_model/granite-embedding-30m-english/1_Pooling/config.json
- embeddings_model/all-mpnet-base-v2/config_sentence_transformers.json
- embeddings_model/all-mpnet-base-v2/tokenizer_config.json
- embeddings_model/all-mpnet-base-v2/modules.json
- embeddings_model/granite-embedding-30m-english/modules.json
- embeddings_model/all-mpnet-base-v2/special_tokens_map.json
- embeddings_model/all-mpnet-base-v2/config.json
- artifacts.lock.yaml
- embeddings_model/granite-embedding-30m-english/config.json
- .gitignore
🚧 Files skipped from review as they are similar to previous changes (7)
- .tekton/lightspeed-service-push.yaml
- embeddings_model/granite-embedding-30m-english/special_tokens_map.json
- .woke.yaml
- embeddings_model/granite-embedding-30m-english/tokenizer_config.json
- embeddings_model/all-mpnet-base-v2/1_Pooling/config.json
- Makefile
- .tekton/lightspeed-service-pull-request.yaml
| RUN if [ -d /cachi2/output/deps/generic ]; then \ | ||
| cp /cachi2/output/deps/generic/all-mpnet-base-v2-model.safetensors embeddings_model/all-mpnet-base-v2/model.safetensors && \ | ||
| cp /cachi2/output/deps/generic/granite-embedding-30m-english-model.safetensors embeddings_model/granite-embedding-30m-english/model.safetensors ; \ | ||
| fi && \ | ||
| for f in \ | ||
| "embeddings_model/all-mpnet-base-v2/model.safetensors|https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/model.safetensors" \ | ||
| "embeddings_model/granite-embedding-30m-english/model.safetensors|https://huggingface.co/ibm-granite/granite-embedding-30m-english/resolve/main/model.safetensors" \ | ||
| ; do \ | ||
| path="${f%%|*}" && url="${f##*|}" && \ | ||
| [ -f "$path" ] || python3 -c "import urllib.request,sys; urllib.request.urlretrieve(sys.argv[1], sys.argv[2])" "$url" "$path" ; \ | ||
| done |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Pinned SHA-256 checksums are never verified, and the HTTP fallback undermines the hermetic-build guarantee.
Two concerns on this block:
- No checksum validation.
artifacts.lock.yamlpinssha256for both safetensors, but neither the cachi2 copy (Lines 87-88) nor the HTTP download (Line 95) validates the bytes against those hashes. The PR summary states the Containerfile "validates checksums," but it does not — this leaves a supply-chain integrity gap for the bundled weights. - Network fallback contradicts the stated build modes. In a hermetic Konflux/Cachi2 build, egress is blocked, so the
urllibfallback cannot run; in local builds the PR objective says the build should "fail fast with a clear error if binaries are missing," yet this code instead silently downloads unverified weights from HuggingFace. The fallback both defeats hermeticity and diverges from the documented behavior.
Recommend validating against the pinned hashes and making the missing-binary case an explicit failure rather than a network fetch.
🛡️ Sketch: verify checksums and fail fast instead of fetching
RUN if [ -d /cachi2/output/deps/generic ]; then \
cp /cachi2/output/deps/generic/all-mpnet-base-v2-model.safetensors embeddings_model/all-mpnet-base-v2/model.safetensors && \
cp /cachi2/output/deps/generic/granite-embedding-30m-english-model.safetensors embeddings_model/granite-embedding-30m-english/model.safetensors ; \
fi && \
for f in \
- "embeddings_model/all-mpnet-base-v2/model.safetensors|https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/model.safetensors" \
- "embeddings_model/granite-embedding-30m-english/model.safetensors|https://huggingface.co/ibm-granite/granite-embedding-30m-english/resolve/main/model.safetensors" \
+ "embeddings_model/all-mpnet-base-v2/model.safetensors|78c0197b6159d92658e319bc1d72e4c73a9a03dd03815e70e555c5ef05615658" \
+ "embeddings_model/granite-embedding-30m-english/model.safetensors|3b1fcdc9c5eb954f603bc386474e321505ff29c6c67f21e3aa8db3d2d1a533cf" \
; do \
- path="${f%%|*}" && url="${f##*|}" && \
- [ -f "$path" ] || python3 -c "import urllib.request,sys; urllib.request.urlretrieve(sys.argv[1], sys.argv[2])" "$url" "$path" ; \
+ path="${f%%|*}" && want="${f##*|}" && \
+ { [ -f "$path" ] || { echo "Missing $path; run 'make get-embeddings' or provide cachi2 artifacts" >&2; exit 1; } ; } && \
+ echo "${want} ${path}" | sha256sum -c - ; \
done📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| RUN if [ -d /cachi2/output/deps/generic ]; then \ | |
| cp /cachi2/output/deps/generic/all-mpnet-base-v2-model.safetensors embeddings_model/all-mpnet-base-v2/model.safetensors && \ | |
| cp /cachi2/output/deps/generic/granite-embedding-30m-english-model.safetensors embeddings_model/granite-embedding-30m-english/model.safetensors ; \ | |
| fi && \ | |
| for f in \ | |
| "embeddings_model/all-mpnet-base-v2/model.safetensors|https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/model.safetensors" \ | |
| "embeddings_model/granite-embedding-30m-english/model.safetensors|https://huggingface.co/ibm-granite/granite-embedding-30m-english/resolve/main/model.safetensors" \ | |
| ; do \ | |
| path="${f%%|*}" && url="${f##*|}" && \ | |
| [ -f "$path" ] || python3 -c "import urllib.request,sys; urllib.request.urlretrieve(sys.argv[1], sys.argv[2])" "$url" "$path" ; \ | |
| done | |
| RUN if [ -d /cachi2/output/deps/generic ]; then \ | |
| cp /cachi2/output/deps/generic/all-mpnet-base-v2-model.safetensors embeddings_model/all-mpnet-base-v2/model.safetensors && \ | |
| cp /cachi2/output/deps/generic/granite-embedding-30m-english-model.safetensors embeddings_model/granite-embedding-30m-english/model.safetensors ; \ | |
| fi && \ | |
| for f in \ | |
| "embeddings_model/all-mpnet-base-v2/model.safetensors|78c0197b6159d92658e319bc1d72e4c73a9a03dd03815e70e555c5ef05615658" \ | |
| "embeddings_model/granite-embedding-30m-english/model.safetensors|3b1fcdc9c5eb954f603bc386474e321505ff29c6c67f21e3aa8db3d2d1a533cf" \ | |
| ; do \ | |
| path="${f%%|*}" && want="${f##*|}" && \ | |
| { [ -f "$path" ] || { echo "Missing $path; run 'make get-embeddings' or provide cachi2 artifacts" >&2; exit 1; } ; } && \ | |
| echo "${want} ${path}" | sha256sum -c - ; \ | |
| done |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Containerfile` around lines 86 - 96, The Containerfile does not validate
SHA-256 checksums for the safetensors files despite them being pinned in
artifacts.lock.yaml, and the urllib.request.urlretrieve fallback contradicts
hermetic-build guarantees by allowing unverified network downloads. Extract the
pinned SHA-256 hashes from artifacts.lock.yaml for both
all-mpnet-base-v2-model.safetensors and
granite-embedding-30m-english-model.safetensors, then modify the RUN block to
compute and verify the SHA-256 hash of each file after the cachi2 copy or
download using sha256sum, comparing against the pinned values and failing the
build if they do not match. Remove the urllib.request.urlretrieve fallback
entirely and replace it with an explicit error that exits the build if the
expected file is not found in the cachi2 output directory.
There was a problem hiding this comment.
checksum is done by konflux prefetch task, no need here.
792e44b to
1b18b4e
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Containerfile`:
- Around line 101-111: Add an integration test that validates offline model
loading functionality works with the pre-populated HuggingFace cache. The test
should verify that models can be successfully loaded with the
TRANSFORMERS_OFFLINE=1 environment variable set, and specifically confirm that
the fallback model ID (local:sentence-transformers/all-mpnet-base-v2) loads
correctly using the cache structure created by the Containerfile cache priming
block shown in the diff. This test should confirm the end-to-end contract
between the cache structure created by the RUN command (which sets up the model
directories, snapshots, and refs/main pointers) and the actual model loading
behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ef95f73b-a667-499c-82e9-f79cc958adb5
📒 Files selected for processing (23)
.gitignore.tekton/lightspeed-service-pull-request.yaml.tekton/lightspeed-service-push.yaml.woke.yamlContainerfileMakefileartifacts.lock.yamlembeddings_model/all-mpnet-base-v2/1_Pooling/config.jsonembeddings_model/all-mpnet-base-v2/config.jsonembeddings_model/all-mpnet-base-v2/config_sentence_transformers.jsonembeddings_model/all-mpnet-base-v2/modules.jsonembeddings_model/all-mpnet-base-v2/sentence_bert_config.jsonembeddings_model/all-mpnet-base-v2/special_tokens_map.jsonembeddings_model/all-mpnet-base-v2/tokenizer.jsonembeddings_model/all-mpnet-base-v2/tokenizer_config.jsonembeddings_model/granite-embedding-30m-english/1_Pooling/config.jsonembeddings_model/granite-embedding-30m-english/config.jsonembeddings_model/granite-embedding-30m-english/modules.jsonembeddings_model/granite-embedding-30m-english/sentence_bert_config.jsonembeddings_model/granite-embedding-30m-english/special_tokens_map.jsonembeddings_model/granite-embedding-30m-english/tokenizer.jsonembeddings_model/granite-embedding-30m-english/tokenizer_config.jsonexamples/olsconfig.yaml
✅ Files skipped from review due to trivial changes (14)
- embeddings_model/all-mpnet-base-v2/sentence_bert_config.json
- examples/olsconfig.yaml
- embeddings_model/granite-embedding-30m-english/modules.json
- embeddings_model/all-mpnet-base-v2/modules.json
- embeddings_model/granite-embedding-30m-english/sentence_bert_config.json
- embeddings_model/granite-embedding-30m-english/config.json
- embeddings_model/all-mpnet-base-v2/config.json
- embeddings_model/all-mpnet-base-v2/1_Pooling/config.json
- embeddings_model/all-mpnet-base-v2/tokenizer_config.json
- embeddings_model/all-mpnet-base-v2/special_tokens_map.json
- .woke.yaml
- embeddings_model/granite-embedding-30m-english/tokenizer_config.json
- .gitignore
- embeddings_model/all-mpnet-base-v2/config_sentence_transformers.json
🚧 Files skipped from review as they are similar to previous changes (6)
- .tekton/lightspeed-service-pull-request.yaml
- embeddings_model/granite-embedding-30m-english/1_Pooling/config.json
- embeddings_model/granite-embedding-30m-english/special_tokens_map.json
- artifacts.lock.yaml
- Makefile
- .tekton/lightspeed-service-push.yaml
| # Pre-populate HuggingFace cache so models can be loaded by ID with TRANSFORMERS_OFFLINE=1 | ||
| RUN for model_dir in all-mpnet-base-v2 granite-embedding-30m-english; do \ | ||
| case "$model_dir" in \ | ||
| all-mpnet-base-v2) hf_id="sentence-transformers--all-mpnet-base-v2" ;; \ | ||
| granite-embedding-30m-english) hf_id="ibm-granite--granite-embedding-30m-english" ;; \ | ||
| esac && \ | ||
| repo_dir="/app-root/.cache/huggingface/hub/models--${hf_id}" && \ | ||
| mkdir -p "$repo_dir/snapshots/local" "$repo_dir/refs" && \ | ||
| echo "local" > "$repo_dir/refs/main" && \ | ||
| ln -sf "/app-root/embeddings_model/${model_dir}"/* "$repo_dir/snapshots/local/" ; \ | ||
| done |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Read-only verification script: confirm fallback path and cache priming expectations align.
set -euo pipefail
# 1) Confirm fallback model ID used when embeddings_model_path is unset
rg -n 'local:sentence-transformers/all-mpnet-base-v2|_get_embed_model|embeddings_model_path' \
ols/src/rag_index/index_loader.py tests/unit/rag_index/test_index_loader.py -C2
# 2) Confirm runtime env wiring relevant to offline behavior
rg -n 'HF_HOME|TRANSFORMERS_OFFLINE|TRANSFORMERS_CACHE|configure_hugging_face_envs' \
Containerfile ols/utils/environments.py -C2
# 3) Confirm cache-priming block exists exactly as expected in image build recipe
sed -n '96,116p' ContainerfileRepository: openshift/lightspeed-service
Length of output: 3917
Add an explicit integration test validating offline model loading with the pre-populated cache.
The fallback model ID local:sentence-transformers/all-mpnet-base-v2 and environment configuration for TRANSFORMERS_OFFLINE=1 are correctly implemented, and the cache priming block in the Containerfile properly synthesizes the expected HuggingFace hub structure. However, there is no integration test confirming that models can actually be loaded offline using this pre-populated cache. Since the PR marks offline model loading as pending verification, add a test that validates the end-to-end contract: the fallback model loads successfully with TRANSFORMERS_OFFLINE=1 and the cache structure created by the Containerfile.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Containerfile` around lines 101 - 111, Add an integration test that validates
offline model loading functionality works with the pre-populated HuggingFace
cache. The test should verify that models can be successfully loaded with the
TRANSFORMERS_OFFLINE=1 environment variable set, and specifically confirm that
the fallback model ID (local:sentence-transformers/all-mpnet-base-v2) loads
correctly using the cache structure created by the Containerfile cache priming
block shown in the diff. This test should confirm the end-to-end contract
between the cache structure created by the RUN command (which sets up the model
directories, snapshots, and refs/main pointers) and the actual model loading
behavior.
|
/retest |
1b18b4e to
fee7518
Compare
| - product_docs_index_path: "./vector_db/user_application_docs/version_1" | ||
| product_docs_index_id: user-application-docs-version_1 | ||
| embeddings_model_path: "./embeddings_model" | ||
| embeddings_model_path: "./embeddings_model/all-mpnet-base-v2" |
There was a problem hiding this comment.
a reminder for corresponding change in operator.
|
/lgtm |
fee7518 to
4cb74db
Compare
4cb74db to
25a520f
Compare
|
/lgtm |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: blublinsky The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@blublinsky: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Description
Summary
Decouples the OLS service image from the
lightspeed-rag-contentimage by bundlingembedding model configs directly in this repo and fetching model binaries via Cachi2
generic prefetch for hermetic builds.
lightspeed-rag-contentmulti-stage dependency from Containerfilesentence-transformers/all-mpnet-base-v2(BYOK/FAISS) andibm-granite/granite-embedding-30m-english(OKP/Solr) model configs and tokenizerswith
TRANSFORMERS_OFFLINE=1artifacts.lock.yamlfor Cachi2 generic artifact prefetch of model binariesmake get-embeddingstarget for local (non-hermetic) developmentBuild Modes
/cachi2/output/deps/generic/,copied into place during image build
make get-embeddingsbeforedocker build; buildfails fast with clear error if binaries are missing
Test plan
TRANSFORMERS_OFFLINE=1)Type of change
Related Tickets & Documents
https://redhat.atlassian.net/browse/OLS-3311
https://redhat.atlassian.net/browse/OLS-3311
Checklist before requesting a review
Testing
Summary by CodeRabbit