Skip to content

[Improvement] Separate S3 storage configuration for MLRun and Kubeflow Pipeline#295

Open
GiladShapira94 wants to merge 33 commits into
mlrun:developmentfrom
GiladShapira94:separate-data-kfp
Open

[Improvement] Separate S3 storage configuration for MLRun and Kubeflow Pipeline#295
GiladShapira94 wants to merge 33 commits into
mlrun:developmentfrom
GiladShapira94:separate-data-kfp

Conversation

@GiladShapira94
Copy link
Copy Markdown
Collaborator

@GiladShapira94 GiladShapira94 commented May 12, 2026

📝 Description

This PR separates the storage credential configuration into two distinct paths: storage.local.* for the bundled in-cluster SeaweedFS (always used by SeaweedFS IAM, the bucket-init job, and KFP Pipelines), and storage.s3.* for external AWS S3 (used only by MLRun and Jupyter when storage.mode: s3).

The default storage.mode is changed from s3 to local, reflecting that the default CE installation uses the bundled SeaweedFS rather than external AWS S3.

New dedicated _helpers.tpl partials (mlrun-ce.seaweedfs.s3.* and mlrun-ce.pipelines.s3.*) ensure Pipelines and SeaweedFS always resolve credentials from storage.local.* regardless of the active storage.mode, eliminating the previous credential cross-contamination when switching modes.


🛠️ Changes Made

  • charts/mlrun-ce/values.yaml:
    • Changed storage.mode default from s3local
    • Added new storage.local block (accessKey, secretKey, bucket) as the single source of truth for in-cluster SeaweedFS credentials
    • Cleared storage.s3.accessKey/secretKey/bucket defaults (now empty strings; only meaningful when mode: s3)
  • charts/mlrun-ce/templates/_helpers.tpl:
    • mlrun-ce.s3.accessKey/secretKey/bucket — now branches on storage.mode (local vs s3)
    • Added mlrun-ce.seaweedfs.s3.* helpers — always resolve from storage.local.*
    • Added mlrun-ce.pipelines.s3.* helpers — always delegate to mlrun-ce.seaweedfs.s3.*
    • mlrun-ce.artifactPath, mlrun-ce.featureStore.dataPrefix, mlrun-ce.model-endpoint.monitoring.* — replaced hardcoded global.infrastructure.aws.bucketName | default "mlrun" with mlrun-ce.s3.bucket
  • charts/mlrun-ce/templates/config/storage-secret.yamlAWS_ENDPOINT_URL_S3 now only injected when storage.mode: local; storage.s3 no longer sets a custom endpoint
  • charts/mlrun-ce/templates/config/storage-validation.yaml — added fail guard for storage.mode: local with missing storage.local.bucket
  • charts/mlrun-ce/templates/config/mlrun-env-configmap.yaml — updated comment describing per-mode env vars
  • charts/mlrun-ce/templates/pipelines/** — all pipeline templates now use mlrun-ce.pipelines.s3.* helpers
  • charts/mlrun-ce/templates/seaweedfs/** — bucket-init job and IAM config now use mlrun-ce.seaweedfs.s3.* helpers
  • charts/mlrun-ce/templates/NOTES.txt — S3 credentials display updated to reference storage.local.*
  • charts/mlrun-ce/Chart.yaml — version bumped 0.11.0-rc.360.11.0-rc.37
  • charts/mlrun-ce/README.md — version matrix updated to 0.11.0-rc.37

✅ Checklist

  • I have tested the changes in this PR
  • I confirmed whether my changes require a change in documentation and if so, I created another PR in MLRun for the relevant documentation.
  • I confirmed whether my changes require changes in QA tests, for example: credentials changes, resources naming change and if so, I updated the relevant Jira ticket for QA.
  • I increased the Chart version in charts/mlrun-ce/Chart.yaml.
  • I confirmed that the installation works both on a local Docker Desktop environment and on a real cluster when using the required prerequisites.
  • If needed, update https://github.com/mlrun/ce/blob/development/charts/mlrun-ce/README.md with the relevant installation instructions and version Matrix.
  • If needed, update the following values files for multi namespace support:

🧪 Testing

  • helm lint charts/mlrun-ce — run locally to catch syntax errors in refactored helpers
  • helm template mlrun charts/mlrun-ce -f charts/mlrun-ce/values.yaml — render all templates to verify helper resolution
  • Verify storage.mode: s3 path: render with --set storage.mode=s3,storage.s3.accessKey=foo,storage.s3.secretKey=bar,storage.s3.bucket=mybucket and confirm storage-secret does not contain AWS_ENDPOINT_URL_S3
  • Verify storage.mode: local path: render with defaults and confirm storage-secret contains AWS_ENDPOINT_URL_S3 pointing at the SeaweedFS service
  • Confirm pipelines secret mlpipeline-seaweedfs-artifact always uses storage.local.* regardless of storage.mode
  • End-to-end cluster install (required before merge)

🔗 References

  • Ticket link: CEML-707
  • External links:
  • Design docs links (Optional):

🚨 Breaking Changes?

  • Yes (explain below)
  • No

Consumers upgrading from a previous release must:

  • Rename storage.s3.accessKey/secretKey/bucketstorage.local.accessKey/secretKey/bucket if they were using the default SeaweedFS-backed installation (i.e., the old default mode: s3 pointed at SeaweedFS with seaweed/seaweed123/mlrun).

  • Set storage.mode: s3 explicitly if they were previously relying on the default mode: s3 to pass external AWS credentials — the new default is local.

  • Users who supply an external AWS S3 configuration no longer need to clear AWS_ENDPOINT_URL_S3 manually; the secret now omits it when mode: s3.


🔍️ Additional Notes

  • The three install-mode values files (admin_installation_values.yaml, non_admin_installation_values.yaml, non_admin_cluster_ip_installation_values.yaml) contain no storage.* overrides, so they correctly inherit the new defaults from values.yaml without modification.
  • KFP Pipelines is intentionally hardwired to SeaweedFS (storage.local.*) in all modes — this is by design and is documented in the updated helper comments.

Warnings

  1. Breaking change — existing storage.s3.* users: Anyone who previously used the default install (which was mode: s3 pointing at SeaweedFS with seaweed/seaweed123) must migrate their overrides to storage.local.*.

    Their upgrade path:

    --set storage.local.accessKey=<old s3.accessKey> \
    --set storage.local.secretKey=<old s3.secretKey> \
    --set storage.local.bucket=<old s3.bucket>
    

Comment thread charts/mlrun-ce/values.yaml
Comment thread charts/mlrun-ce/templates/_helpers.tpl
Comment thread charts/mlrun-ce/templates/config/storage-secret.yaml Outdated
@@ -1,6 +1,9 @@
{{- if and (eq .Values.storage.mode "s3") (not .Values.storage.s3.bucket) }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage.s3.{accessKey,secretKey} default to empty strings but only bucket is validated. Switching to s3 mode without creds will silently produce an unusable Secret. Please also fail-fast when accessKey/secretKey are empty (unless global.infrastructure.aws.s3NonAnonymous is true).

Comment thread charts/mlrun-ce/templates/config/storage-validation.yaml Outdated
Comment thread charts/mlrun-ce/templates/_helpers.tpl
…a-kfp

# Conflicts:
#	charts/mlrun-ce/Chart.yaml
#	charts/mlrun-ce/README.md
@GiladShapira94 GiladShapira94 requested a review from yaelgen June 3, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants