Skip to content

[CONTP-1598] Add GKE Autopilot e2e test#3158

Open
tbavelier wants to merge 8 commits into
mainfrom
tbavelier/contp-1598-autopilot-e2e
Open

[CONTP-1598] Add GKE Autopilot e2e test#3158
tbavelier wants to merge 8 commits into
mainfrom
tbavelier/contp-1598-autopilot-e2e

Conversation

@tbavelier

@tbavelier tbavelier commented Jun 18, 2026

Copy link
Copy Markdown
Member

What does this PR do?

Adds a manually triggered GKE Autopilot E2E test path for the Datadog Operator.

The change introduces:

  • a GKE/Pulumi provisioner that uses the shared datadog-agent e2e framework GCP/GKE resources
  • a GKE Autopilot DatadogAgent manifest with experimental.agent.datadoghq.com/autopilot: "true", log collection with containerCollectAll, and NPM enabled
  • a TestGKEAutopilotSuite validating the operator, node agent, cluster agent, system-probe container presence, and fakeintake ingestion for Kubernetes metrics, logs, and NPM connection payloads
  • Autopilot pod-template adjustments for GKE WorkloadAllowlist compatibility, including use of the GCR Agent registry
  • a make e2e-gke-autopilot-tests target
  • a manual GitLab job extending a GCP-specific e2e base

Motivation

We already exercise the operator on kind, but GKE Autopilot has provider-specific constraints around privileged workloads, host paths, image allowlists, and pod connectivity. This adds a CI entry point for validating the operator's Autopilot path against real GKE Autopilot infrastructure before wiring it into release branch/tag automation.

Additional Notes

This job is manual-only for now. Release branch/tag automation can be added once the job is stable.

The CI setup follows the existing datadog-agent and helm-charts e2e patterns:

  • GCP infrastructure is provisioned through the shared Pulumi/e2e framework.
  • Pulumi state still uses the existing S3 backend.
  • AWS SSM is used as the CI secret source.
  • ECR auth is still used to pull the operator image built by the existing e2e image pipeline.

Minimum Agent Versions

No minimum runtime Agent or Cluster Agent version change. This is test/CI coverage only.

  • Agent: N/A
  • Cluster Agent: N/A

Describe your test plan

Manual CI validation:

Local validation:

  • go test ./internal/controller/datadogagent/experimental -count=1
  • go test ./internal/controller/datadogagent -run Test_AutopilotOverrides -count=1
  • cd test/e2e && GOWORK=off go test ./tests/k8s_suite -run '^$' -count=1 --tags=e2e
  • make lint-e2e

Checklist

  • PR has at least one valid label: bug, enhancement, refactoring, documentation, tooling, and/or dependencies
  • PR has a milestone or the qa/skip-qa label
  • All commits are signed (see: signing commits)

@datadog-datadog-prod-us1-2

datadog-datadog-prod-us1-2 Bot commented Jun 18, 2026

Copy link
Copy Markdown

Pipelines  Code Coverage

Fix all issues with BitsAI

⚠️ Warnings

🚦 1 Pipeline job failed

DataDog/datadog-operator | build_operator_image_fips_arm64   View in Datadog   GitLab

ℹ️ Info

🎯 Code Coverage (details)
Patch Coverage: 81.82%
Overall Coverage: 44.49% (+0.22%)

Useful? React with 👍 / 👎

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: c9c7026 | Docs | Datadog PR Page | Give us feedback!

@tbavelier tbavelier changed the title Add GKE Autopilot e2e test [CONTP-1598] Add GKE Autopilot e2e test Jun 18, 2026
@codecov-commenter

codecov-commenter commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.03%. Comparing base (c3faf04) to head (c3828de).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #3158   +/-   ##
=======================================
  Coverage   44.03%   44.03%           
=======================================
  Files         377      377           
  Lines       30713    30713           
=======================================
  Hits        13525    13525           
  Misses      16300    16300           
  Partials      888      888           
Flag Coverage Δ
unittests 44.03% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c3faf04...c3828de. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tbavelier tbavelier force-pushed the tbavelier/contp-1598-autopilot-e2e branch from 88326dd to 5563b55 Compare June 18, 2026 09:55
@tbavelier tbavelier marked this pull request as ready for review June 24, 2026 13:00
@tbavelier tbavelier requested a review from a team June 24, 2026 13:00

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c9c702631d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread .gitlab-ci.yml
extends: .e2e_gcp_base
stage: e2e
needs:
- "trigger_e2e_operator_image"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Make the GKE image need optional when trigger is skipped

On README-only branches, trigger_e2e_operator_image is excluded by .on_run_e2e_base's *.md when: never rule, but this new manual job is still added because its rules fall through to when: manual. Since the need on trigger_e2e_operator_image is mandatory, those pipelines reference a job that is not present; mirror the trigger's skip rules here or mark this need optional.

Useful? React with 👍 / 👎.

return image
}

return images.FromString(image).WithRegistry(images.GCRContainerRegistry).ToString()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve digest-pinned image references

When Autopilot users pin a node-agent image as name:tag@sha256:<digest> via the normal overrides, this unconditional registry rewrite passes that reference through images.FromString, whose colon split treats only tag@sha256 as the tag and drops the digest value. The resulting image becomes something like gcr.io/datadoghq/agent:7.80.2@sha256, so the DaemonSet cannot pull the pinned image; skip digest references here or parse them with a container-reference parser before changing only the registry.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants