Prototype: phased shutdown for MTP (#5345)#8580
Conversation
Introduce a two-phase shutdown model for Microsoft.Testing.Platform so
test sessions get a deterministic drain window before being aborted:
- Extend internal ITestApplicationCancellationTokenSource with
DrainingToken (graceful cancel) and AbortingToken (forceful abort),
plus an Abort() entry point. The existing CancellationToken is kept
as a back-compat alias for DrainingToken so current consumers keep
observing graceful cancellation without changes.
- Rewrite CTRLPlusCCancellationTokenSource with a Running -> Draining
-> Aborting state machine:
* 1st Ctrl+C enters Draining, starts a 30s grace timer that
escalates to Aborting on elapse.
* 2nd Ctrl+C escalates to Aborting immediately, starts a 10s
abort-timeout safety net that FailFasts via IEnvironment.
* 3rd Ctrl+C is no longer intercepted - the runtime terminates
the process (matches docker compose / kubectl / npm UX).
- Add 6 unit tests covering initial state, single/idempotent cancel,
abort, grace-period escalation, and zero-grace escalation. Passes
on net9.0 and net462.
CLI options (--shutdown-grace-period, --shutdown-abort-timeout), env-
var propagation to controlled hosts, TerminalOutputDevice UX updates,
and migration of existing token consumers are intentionally deferred
to follow-up PRs and tracked in the RFC.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Prototype implementation of a two-phase shutdown model for Microsoft.Testing.Platform (MTP), introducing distinct “Draining” vs “Aborting” cancellation semantics to address Ctrl+C behavior discussed in #5345 while keeping back-compat for existing token consumers.
Changes:
- Extend
ITestApplicationCancellationTokenSourcewithDrainingToken,AbortingToken, and anAbort()escalation API (withCancellationTokenpreserved as a Draining alias). - Rewrite
CTRLPlusCCancellationTokenSourceto implement aRunning → Draining → Abortingphase model with Ctrl+C escalation and grace/abort timers. - Add unit tests covering the new phase/token behavior.
Show a summary per file
| File | Description |
|---|---|
| src/Platform/Microsoft.Testing.Platform/Services/ITestApplicationCancellationTokenSource.cs | Adds two-phase shutdown tokens and Abort() API while preserving legacy token alias. |
| src/Platform/Microsoft.Testing.Platform/Services/CTRLPlusCCancellationTokenSource.cs | Implements the phased shutdown state machine, Ctrl+C escalation logic, and timers. |
| test/UnitTests/Microsoft.Testing.Platform.UnitTests/Hosts/CommonHostTests.cs | Updates local test stub to satisfy the extended cancellation token source interface. |
| test/UnitTests/Microsoft.Testing.Platform.UnitTests/Services/CTRLPlusCCancellationTokenSourceTests.cs | Adds new unit tests validating draining/aborting semantics and grace escalation. |
Copilot's findings
- Files reviewed: 4/4 changed files
- Comments generated: 5
| case 2: | ||
| // 2nd Ctrl+C: escalate to abort. | ||
| e.Cancel = true; | ||
| EnterAborting(); |
There was a problem hiding this comment.
System.Collections.Hashtable[3299924450]
| public void Dispose() | ||
| { | ||
| _drainingCts.Dispose(); | ||
| _abortingCts.Dispose(); | ||
| } |
There was a problem hiding this comment.
System.Collections.Hashtable[3299924462]
| private void EnterAborting() | ||
| { | ||
| if (Interlocked.Exchange(ref _phase, PhaseAborting) == PhaseAborting) | ||
| { | ||
| return; | ||
| } | ||
|
|
||
| public void Cancel() | ||
| => _cancellationTokenSource.Cancel(); | ||
| try | ||
| { | ||
| _abortingCts.Cancel(); | ||
| } |
There was a problem hiding this comment.
System.Collections.Hashtable[3299924473]
| private static void ScheduleEscalation(TimeSpan delay, Action action) | ||
| { | ||
| // Fire-and-forget timer. We don't dispose: the host is shutting down anyway, | ||
| // and a short-lived CTS is cheaper than holding a Timer reference we'd need | ||
| // to manage across the phase machine. | ||
| var timerCts = new CancellationTokenSource(delay); | ||
| timerCts.Token.Register(action); | ||
| } |
There was a problem hiding this comment.
System.Collections.Hashtable[3299924478]
| while (!source.AbortingToken.IsCancellationRequested && !waitCts.IsCancellationRequested) | ||
| { | ||
| await Task.Delay(10, TestContext.CancellationToken).ConfigureAwait(false); | ||
| } | ||
|
|
There was a problem hiding this comment.
System.Collections.Hashtable[3299924488]
When the test-application cancellation token is signalled and shutdown takes longer than expected, MTP now periodically prints a 'Still waiting for: ...' warning listing the extensions and consumers that have not yet returned. This makes a hanging Ctrl+C observable without users having to inspect the process state. Adds a new internal IShutdownProgressReporter service plus a default ShutdownProgressReporter implementation. The reporter wraps the three known blocking await sites: * ITestSessionLifetimeHandler.OnTestSessionFinishingAsync (both non-consumer and consumer passes in CommonTestHost) * IAsyncConsumerDataProcessor.DrainDataAsync per consumer in AsynchronousMessageBus The watchdog only starts after the cancellation token fires, waits a quiet window (3s) to avoid noise on clean shutdowns, then polls every second. Output goes through IOutputDevice as a WarningMessageOutputDeviceData. Refs #5345. Complementary to #8580 - the same trackers will feed the eventual `force-killed because X didn't drain` message once the phased-shutdown RFC lands. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Dispose() now detaches the Console.CancelKeyPress handler and disposes any pending escalation CancellationTokenSources so callbacks don't fire after the source is disposed and we don't keep the instance alive through the static console event. - OnConsoleCancelKeyPressed guards against post-Dispose invocations and routes the 2nd Ctrl+C through Abort() rather than EnterAborting() directly, keeping the Running -> Draining -> Aborting invariant. - EnterAborting() now calls EnterDraining() first so the legacy CancellationToken / DrainingToken are always signaled when AbortingToken is canceled, and uses CompareExchange(Draining -> Aborting) instead of an unconditional Exchange. - ScheduleEscalation retains the timer CancellationTokenSource so it can be disposed in Dispose(). - Test GracePeriodElapse_EscalatesToAborting is now event-driven via a TaskCompletionSource registered on AbortingToken (no more Task.Delay polling loop). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@copilot resolve the merge conflicts in this pull request |
…ased-shutdown # Conflicts: # src/Platform/Microsoft.Testing.Platform/Services/CTRLPlusCCancellationTokenSource.cs # test/UnitTests/Microsoft.Testing.Platform.UnitTests/Services/CTRLPlusCCancellationTokenSourceTests.cs Co-authored-by: Evangelink <11340282+Evangelink@users.noreply.github.com>
Done. Merged
|
Summary
Prototype of a two-phase graceful shutdown for Microsoft.Testing.Platform, exploring the design discussed in #5345 (see this comment).
Status: draft / RFC — opened for early design feedback. CLI options and most consumer migrations are intentionally deferred.
Motivation
Today MTP exposes a single
ITestApplicationCancellationTokenSource.CancellationToken. On Ctrl+C, every consumer (tests, extensions, hosts) receives the same signal at the same time, and there is no contract distinguishing "please wind down gracefully" from "stop right now". This conflates the two shutdown modes that virtually every other host-style runtime treats as distinct phases (systemdEXTEND_TIMEOUT_USEC, macOSNSTerminateLater, AWS Lambda ExtensionsSHUTDOWN, KubernetespreStop/terminationGracePeriodSeconds, docker compose 2-press SIGTERM→SIGKILL, VitestteardownTimeout, …).What's in this PR
A working prototype of Design A from the offline analysis:
DrainingToken(graceful) andAbortingToken(forceful) on the internalITestApplicationCancellationTokenSourceCancellationTokenis preserved as an alias ofDrainingTokenso the ~14 current consumers keep working unchangedAbort()APIRunning → Draining → AbortingusingInterlocked.CompareExchangefor atomic phase transitionsIEnvironment.FailFast(bannedEnvironment.FailFastreplaced with the injectable wrapper)Files
src/Platform/Microsoft.Testing.Platform/Services/ITestApplicationCancellationTokenSource.cs— extended interfacesrc/Platform/Microsoft.Testing.Platform/Services/CTRLPlusCCancellationTokenSource.cs— full rewritetest/UnitTests/Microsoft.Testing.Platform.UnitTests/Hosts/CommonHostTests.cs— inline mock updatedtest/UnitTests/Microsoft.Testing.Platform.UnitTests/Services/CTRLPlusCCancellationTokenSourceTests.cs— 6 new unit testsVerification
net8.0/net9.0/netstandard2.0net9.0net462(confirms no DIM/runtime issues for theMSTest.TestAdapterconsumer of the netstandard2.0 build)CommonHost/Cancellation/StopPolic*tests still pass — no regressionsIntentionally deferred (follow-ups)
To keep this PR reviewable, the following are explicitly NOT in scope and will land as separate PRs once the core direction is approved:
--shutdown-grace-periodand--shutdown-abort-timeoutviaPlatformCommandLineProvider+HelpInfoTestsacceptance assertionsTestHostOrchestratorHost/TestHostControllersTestHostcontrolled childrenTerminalOutputDeviceUX line: "Cancelling test session… (press Ctrl+C again to force quit)"DrainingToken/AbortingTokenusageHotReloadextension's ownCancelKeyPresshandlerIShutdownParticipantack/extend protocol) — separate RFCOpen design questions
HostOptions.ShutdownTimeout+ a newAbortTimeout?IEnvironmentinjection be threaded explicitly fromTestHostBuilder.CommonServicesnow, or stay defaulted toSystemEnvironment?Process.Killourselves?Full RFC (motivation, prior-art table, rollout plan, alternatives) is available in my session notes and will be posted as a comment on #5345 once the prototype direction is acknowledged.
Refs #5345