Skip to content

Add documentation for non-argument module inputs#4210

Open
itrujnara wants to merge 12 commits into
nf-core:mainfrom
itrujnara:main
Open

Add documentation for non-argument module inputs#4210
itrujnara wants to merge 12 commits into
nf-core:mainfrom
itrujnara:main

Conversation

@itrujnara

@itrujnara itrujnara commented May 13, 2026

Copy link
Copy Markdown
Contributor

This PR adds a new section in module input/output specifications to describe best (and discouraged) practices for module inputs that are not tool arguments. This change aims to make existing practices explicit and promote consistency rather than create new rules. It is based on my perception of the consensus, so please let me know if anything is inaccurate.
The changes have been preliminarily checked for compliance with the documentation style guide with GPT-5.4-mini.

@netlify /docs/specifications/components/modules/input-output-options

@netlify

netlify Bot commented May 13, 2026

Copy link
Copy Markdown

Deploy Preview for nf-core-docs ready!

Name Link
🔨 Latest commit 5d1655f
🔍 Latest deploy log https://app.netlify.com/projects/nf-core-docs/deploys/6a154ec586f2e800087135d6
😎 Deploy Preview https://deploy-preview-4210--nf-core-docs.netlify.app/docs/specifications/components/modules/input-output-options
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@netlify

netlify Bot commented May 13, 2026

Copy link
Copy Markdown

Deploy Preview for nf-core-main-site ready!

Name Link
🔨 Latest commit 5d1655f
🔍 Latest deploy log https://app.netlify.com/projects/nf-core-main-site/deploys/6a154ec569afe90008ab301a
😎 Deploy Preview https://deploy-preview-4210--nf-core-main-site.netlify.app/docs/specifications/components/modules/input-output-options
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.


Input channel `val` declarations MAY be used to control behaviours of the module that cannot be expressed with arguments of the underlying tool.

- If a module implements multiple subcommands of a tool, the subcommand SHOULD be provided through a channel.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the rule that subcommands should be their own module?

@itrujnara itrujnara May 13, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally yes, but there are some modules where a subcommand or subsubcommand is passed as an input (specifically, per Codex and confirmed manually: HELITRONSCANNER_SCAN, LONGSTITCH, COOLER_CLOAD, MIDAS_RUN, QSV_CAT, UMICOLLAPSE, SAMBAMBA_DEPTH, VAMB_BIN, ANNOSINE). I am not sure whether this is an accepted practice or unnoticed abuse. I don't feel too strongly about this point, so I am happy to leave it out if there is strong opposition.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say unnoticed deviations

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait a bit for differential opinions, but if others agree with you I'll remove it since I don't want to start a civil war here

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree, this sounds like things that slipped through review.

The only case I can imagine this being valid if it all subcommands share exactly the same options and arguments ,(so they apply to every subcommand), but then this would be very unlikely as they you might as well just control this via another argument.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, changed it. I used SHOULD, since there might be very specific cases where this makes sense.

@itrujnara

Copy link
Copy Markdown
Contributor Author

@nf-core-bot fix linting

@itrujnara

Copy link
Copy Markdown
Contributor Author

@nf-core-bot fix linting

@itrujnara

Copy link
Copy Markdown
Contributor Author

@nf-core-bot fix linting

@muffato muffato left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sold on removing all getExtension(). I feel this proposal would just shift the name parsing to (sub-)workflows.

That said, I agree with removing ambiguity from the modules. Modules should offer choice when there is choice (instead of picking one). And when modules are trying too hard to guess, when there's a risk they may get it wrong, modules should require the caller to be explicit about what they want.

Here are some further comments.

@mahesh-panchal mahesh-panchal left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this. 👍🏽

Input channel `val` declarations MAY be used to control behaviours of the module that cannot be expressed with arguments of the underlying tool.

- If a module can output multiple file formats, the output format SHOULD be provided through a channel. The module SHOULD NOT infer the output format from the input path.
:::info{title="Rationale" collapse}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:::info{title="Rationale" collapse}
:::info{title="Rationale" collapse}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an example moved to the previous section


- If a module can output multiple file formats, the output format SHOULD be provided through a channel. The module SHOULD NOT infer the output format from the input path.
:::info{title="Rationale" collapse}
Modules can encounter numerous input name scenarios. Custom string operations necessarily make assumptions about the name of the file (for example that the name of a compressed file has at least two dots). Providing an explicit format input returns full control to the pipeline developer and reduces the risk of unexpected behaviour.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Modules can encounter numerous input name scenarios. Custom string operations necessarily make assumptions about the name of the file (for example that the name of a compressed file has at least two dots). Providing an explicit format input returns full control to the pipeline developer and reduces the risk of unexpected behaviour.
Modules can encounter numerous input name scenarios.
Custom string operations necessarily make assumptions about the name of the file (for example that the name of a compressed file has at least two dots).
Providing an explicit format input returns full control to the pipeline developer and reduces the risk of unexpected behaviour.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand this point - what do you mean by 'nuermous input name scenarios'? and 'custom string operations' - for what? Is this sentence meant to be a justification here why you should have an explicit extension val channel to control this (as in the alternative methods cenarios are bad)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stems from my observations while reworking the bgzip modules. Some implementations assumed that the input name would always be something like variants.vcf.gz, so the module could "fish out" the output extension by splitting on dots and taking the second item from the end. However, it is not guaranteed that it will always be the case, and files could have names like database.gz, where database is likely not the desired extension. At least in my opinion, the module should be agnostic to the exact file name and use inputs to delegate the details to the pipeline. I hope I have explained it clearly enough in the draft.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HRm OK, I think you've made the explainations a bit too generic...

So does it ultimately boil down to:

If a module can emit mutually exclusive output formats of the same output file (e.g. bam/sam/cram), this SHOULD be explicitly defined using a dedicated input val channel?


## Non-argument `val` channel inputs

Input channel `val` declarations MAY be used to control behaviours of the module that cannot be expressed with arguments of the underlying tool.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Input channel `val` declarations MAY be used to control behaviours of the module that cannot be expressed with arguments of the underlying tool.
Input channel `val` declarations MAY be used to control optional but critical behaviours of the module that cannot be expressed with arguments of the underlying tool.

Is this what you mean?

At the moment reading this I'm not sure how this differs from the much more concise point 1 of the previous section (although we could have a an example):

Mandatory non-file inputs are options that the tool MUST have to be able to be run.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not quite what I mean. This section aims to codify existing practice for inputs like val action, val compress, and val out_fmt, which are common in modules, but lack appropriate documentation. These input are not really "an argument to the tool", but are still an important argument to the module.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I think I'm starting to follow now.

I guess then just a single additional sentence of:

'This can include parameters to control execution of the tool, outside of the tool parameters itself - such as an optional post-execution compression step (to adhere to nf-core specification XYZ)' or something like this?

Comment on lines +72 to +75

- If the output format of a module is necessarily the same as the input format, the module MAY infer the output format from the input path.

- If a module contains an optional pipe (for example: compression, sorting), the pipe SHOULD be controlled with a Boolean input channel.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK the more I read about this, this feels mostly more about how to define output names, in which case I think the title of the section should be changed accordingly and rephrased to be more active of what you SHOULD do (a lot of this seems to be more defensive).

E.g., a seciton on output naming would be more like:

  1. Should use ${prefix}
  2. Avoid a default prefix as far as possible (pipeline developers should control this)
  3. Should not include custom strings except if necessary to define extension (controlled by input channel)

Or something like that

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not meant to be a section on outputs, but rather on inputs. The current documentation largely suggests that every input to the module should be an input to the tool, which is clearly not the case. The involvement of output names is due to the drama of val out_fmt vs. ext.suffix (the former being the preferred practice according to maintainer discussions).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically what I want here is to move ideas from random Slack threads to the documentation

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current documentation largely suggests that every input to the module should be an input to the tool, which is clearly not the case.

I don't think this is necessarily true: https://nf-co.re/docs/specifications/components/modules/input-output-options#required-val-channel-inputs ('essential for executuion of the tool' does not necessarily reuqire this to be an input to the tool itself - but we could make this clearer if yo uwanted?)

@itrujnara

Copy link
Copy Markdown
Contributor Author

@jfy133 could you please clarify whether you suggest to:

  1. keep the new section, but phares it differently, or
  2. scrap the new section and instead add information to the section above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants