Symptom
The test ipynb on Google Colab job (conda_env_test.yml → test_ipynb_colab) fails while executing 15_optimization/030_Classification_Optimization.ipynb:
nbclient.exceptions.DeadKernelError: Kernel died
E0000 ... cuda_platform.cc:52] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
Root cause
- The
cuInit line is a benign warning — there is no GPU on the GitHub runner, so TensorFlow/Keras logs it and falls back to CPU. It is not the cause.
030 trains a TensorFlow/Keras model (cell ~101) and a PyTorch model (cells ~105–108) in the same kernel. Loading both DL frameworks plus training on a CPU-only ~7 GB runner exhausts memory / trips a CPU-only crash → the kernel dies.
- This is a CI-environment limitation, not a logic bug; the notebook runs fine on Colab (more RAM, a GPU).
Impact (the important part)
update_nmisp_py declares needs: [test_ipynb_linux, test_ipynb_native, test_ipynb_colab]. Because test_ipynb_colab stays red on 030, the nmisp_py auto-sync never runs, so the Colab helper package (kwlee2025cpp/nmisp_py) silently goes stale — which is what broke the slider notebooks in Colab until a manual push.
Options
- Skip
030 (and likely 035/036 Keras-MNIST) in CI — via TEST_IPYNB_IGNORE_FOLDER or a pytest skip/marker. Simplest; accepts that DL-training notebooks aren't CI-testable on free runners.
- CI-guard the heavy cells — wrap training in
if not os.getenv('CI'): or drop to epochs=1 / tiny data under CI, keeping some smoke coverage.
- Decouple
update_nmisp_py from test_ipynb_colab so a heavy-notebook failure can't silently staleness the Colab package.
Related
A proposed static test (a notebook importing an nmisp_py helper module must include the Google Colab clone cell) would run in this same job — it passes today, but stays masked while the job is red on 030.
🤖 Filed via Claude Code while debugging the Colab helper-sync gap.
Symptom
The
test ipynb on Google Colabjob (conda_env_test.yml→test_ipynb_colab) fails while executing15_optimization/030_Classification_Optimization.ipynb:Root cause
cuInitline is a benign warning — there is no GPU on the GitHub runner, so TensorFlow/Keras logs it and falls back to CPU. It is not the cause.030trains a TensorFlow/Keras model (cell ~101) and a PyTorch model (cells ~105–108) in the same kernel. Loading both DL frameworks plus training on a CPU-only ~7 GB runner exhausts memory / trips a CPU-only crash → the kernel dies.Impact (the important part)
update_nmisp_pydeclaresneeds: [test_ipynb_linux, test_ipynb_native, test_ipynb_colab]. Becausetest_ipynb_colabstays red on030, thenmisp_pyauto-sync never runs, so the Colab helper package (kwlee2025cpp/nmisp_py) silently goes stale — which is what broke the slider notebooks in Colab until a manual push.Options
030(and likely035/036Keras-MNIST) in CI — viaTEST_IPYNB_IGNORE_FOLDERor a pytest skip/marker. Simplest; accepts that DL-training notebooks aren't CI-testable on free runners.if not os.getenv('CI'):or drop toepochs=1/ tiny data under CI, keeping some smoke coverage.update_nmisp_pyfromtest_ipynb_colabso a heavy-notebook failure can't silently staleness the Colab package.Related
A proposed static test (a notebook importing an
nmisp_pyhelper module must include the Google Colab clone cell) would run in this same job — it passes today, but stays masked while the job is red on030.🤖 Filed via Claude Code while debugging the Colab helper-sync gap.