Skip to content

CI: 030_Classification_Optimization kills the CI kernel and gates the nmisp_py auto-sync #440

@kwlee2025cpp

Description

@kwlee2025cpp

Symptom

The test ipynb on Google Colab job (conda_env_test.ymltest_ipynb_colab) fails while executing 15_optimization/030_Classification_Optimization.ipynb:

nbclient.exceptions.DeadKernelError: Kernel died
E0000 ... cuda_platform.cc:52] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)

Root cause

  • The cuInit line is a benign warning — there is no GPU on the GitHub runner, so TensorFlow/Keras logs it and falls back to CPU. It is not the cause.
  • 030 trains a TensorFlow/Keras model (cell ~101) and a PyTorch model (cells ~105–108) in the same kernel. Loading both DL frameworks plus training on a CPU-only ~7 GB runner exhausts memory / trips a CPU-only crash → the kernel dies.
  • This is a CI-environment limitation, not a logic bug; the notebook runs fine on Colab (more RAM, a GPU).

Impact (the important part)

update_nmisp_py declares needs: [test_ipynb_linux, test_ipynb_native, test_ipynb_colab]. Because test_ipynb_colab stays red on 030, the nmisp_py auto-sync never runs, so the Colab helper package (kwlee2025cpp/nmisp_py) silently goes stale — which is what broke the slider notebooks in Colab until a manual push.

Options

  1. Skip 030 (and likely 035/036 Keras-MNIST) in CI — via TEST_IPYNB_IGNORE_FOLDER or a pytest skip/marker. Simplest; accepts that DL-training notebooks aren't CI-testable on free runners.
  2. CI-guard the heavy cells — wrap training in if not os.getenv('CI'): or drop to epochs=1 / tiny data under CI, keeping some smoke coverage.
  3. Decouple update_nmisp_py from test_ipynb_colab so a heavy-notebook failure can't silently staleness the Colab package.

Related

A proposed static test (a notebook importing an nmisp_py helper module must include the Google Colab clone cell) would run in this same job — it passes today, but stays masked while the job is red on 030.

🤖 Filed via Claude Code while debugging the Colab helper-sync gap.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions