Add documentation for non-argument module inputs#4210
Conversation
…tput options documentation
✅ Deploy Preview for nf-core-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
✅ Deploy Preview for nf-core-main-site ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
|
||
| Input channel `val` declarations MAY be used to control behaviours of the module that cannot be expressed with arguments of the underlying tool. | ||
|
|
||
| - If a module implements multiple subcommands of a tool, the subcommand SHOULD be provided through a channel. |
There was a problem hiding this comment.
isn't the rule that subcommands should be their own module?
There was a problem hiding this comment.
Generally yes, but there are some modules where a subcommand or subsubcommand is passed as an input (specifically, per Codex and confirmed manually: HELITRONSCANNER_SCAN, LONGSTITCH, COOLER_CLOAD, MIDAS_RUN, QSV_CAT, UMICOLLAPSE, SAMBAMBA_DEPTH, VAMB_BIN, ANNOSINE). I am not sure whether this is an accepted practice or unnoticed abuse. I don't feel too strongly about this point, so I am happy to leave it out if there is strong opposition.
There was a problem hiding this comment.
I would say unnoticed deviations
There was a problem hiding this comment.
I'll wait a bit for differential opinions, but if others agree with you I'll remove it since I don't want to start a civil war here
There was a problem hiding this comment.
Yeah I agree, this sounds like things that slipped through review.
The only case I can imagine this being valid if it all subcommands share exactly the same options and arguments ,(so they apply to every subcommand), but then this would be very unlikely as they you might as well just control this via another argument.
There was a problem hiding this comment.
Alright, changed it. I used SHOULD, since there might be very specific cases where this makes sense.
|
@nf-core-bot fix linting |
…for `val` channel inputs
…put options documentation
|
@nf-core-bot fix linting |
|
@nf-core-bot fix linting |
muffato
left a comment
There was a problem hiding this comment.
I'm not entirely sold on removing all getExtension(). I feel this proposal would just shift the name parsing to (sub-)workflows.
That said, I agree with removing ambiguity from the modules. Modules should offer choice when there is choice (instead of picking one). And when modules are trying too hard to guess, when there's a risk they may get it wrong, modules should require the caller to be explicit about what they want.
Here are some further comments.
Co-authored-by: Matthieu Muffato <cortexspam-github@yahoo.fr>
mahesh-panchal
left a comment
There was a problem hiding this comment.
I'm good with this. 👍🏽
| Input channel `val` declarations MAY be used to control behaviours of the module that cannot be expressed with arguments of the underlying tool. | ||
|
|
||
| - If a module can output multiple file formats, the output format SHOULD be provided through a channel. The module SHOULD NOT infer the output format from the input path. | ||
| :::info{title="Rationale" collapse} |
There was a problem hiding this comment.
| :::info{title="Rationale" collapse} | |
| :::info{title="Rationale" collapse} |
There was a problem hiding this comment.
This could be an example moved to the previous section
|
|
||
| - If a module can output multiple file formats, the output format SHOULD be provided through a channel. The module SHOULD NOT infer the output format from the input path. | ||
| :::info{title="Rationale" collapse} | ||
| Modules can encounter numerous input name scenarios. Custom string operations necessarily make assumptions about the name of the file (for example that the name of a compressed file has at least two dots). Providing an explicit format input returns full control to the pipeline developer and reduces the risk of unexpected behaviour. |
There was a problem hiding this comment.
| Modules can encounter numerous input name scenarios. Custom string operations necessarily make assumptions about the name of the file (for example that the name of a compressed file has at least two dots). Providing an explicit format input returns full control to the pipeline developer and reduces the risk of unexpected behaviour. | |
| Modules can encounter numerous input name scenarios. | |
| Custom string operations necessarily make assumptions about the name of the file (for example that the name of a compressed file has at least two dots). | |
| Providing an explicit format input returns full control to the pipeline developer and reduces the risk of unexpected behaviour. |
There was a problem hiding this comment.
I don't fully understand this point - what do you mean by 'nuermous input name scenarios'? and 'custom string operations' - for what? Is this sentence meant to be a justification here why you should have an explicit extension val channel to control this (as in the alternative methods cenarios are bad)?
There was a problem hiding this comment.
This stems from my observations while reworking the bgzip modules. Some implementations assumed that the input name would always be something like variants.vcf.gz, so the module could "fish out" the output extension by splitting on dots and taking the second item from the end. However, it is not guaranteed that it will always be the case, and files could have names like database.gz, where database is likely not the desired extension. At least in my opinion, the module should be agnostic to the exact file name and use inputs to delegate the details to the pipeline. I hope I have explained it clearly enough in the draft.
There was a problem hiding this comment.
HRm OK, I think you've made the explainations a bit too generic...
So does it ultimately boil down to:
If a module can emit mutually exclusive output formats of the same output file (e.g. bam/sam/cram), this SHOULD be explicitly defined using a dedicated input val channel?
|
|
||
| ## Non-argument `val` channel inputs | ||
|
|
||
| Input channel `val` declarations MAY be used to control behaviours of the module that cannot be expressed with arguments of the underlying tool. |
There was a problem hiding this comment.
| Input channel `val` declarations MAY be used to control behaviours of the module that cannot be expressed with arguments of the underlying tool. | |
| Input channel `val` declarations MAY be used to control optional but critical behaviours of the module that cannot be expressed with arguments of the underlying tool. |
Is this what you mean?
At the moment reading this I'm not sure how this differs from the much more concise point 1 of the previous section (although we could have a an example):
Mandatory non-file inputs are options that the tool MUST have to be able to be run.
There was a problem hiding this comment.
This is not quite what I mean. This section aims to codify existing practice for inputs like val action, val compress, and val out_fmt, which are common in modules, but lack appropriate documentation. These input are not really "an argument to the tool", but are still an important argument to the module.
There was a problem hiding this comment.
Ah I think I'm starting to follow now.
I guess then just a single additional sentence of:
'This can include parameters to control execution of the tool, outside of the tool parameters itself - such as an optional post-execution compression step (to adhere to nf-core specification XYZ)' or something like this?
|
|
||
| - If the output format of a module is necessarily the same as the input format, the module MAY infer the output format from the input path. | ||
|
|
||
| - If a module contains an optional pipe (for example: compression, sorting), the pipe SHOULD be controlled with a Boolean input channel. |
There was a problem hiding this comment.
OK the more I read about this, this feels mostly more about how to define output names, in which case I think the title of the section should be changed accordingly and rephrased to be more active of what you SHOULD do (a lot of this seems to be more defensive).
E.g., a seciton on output naming would be more like:
- Should use
${prefix} - Avoid a default prefix as far as possible (pipeline developers should control this)
- Should not include custom strings except if necessary to define extension (controlled by input channel)
Or something like that
There was a problem hiding this comment.
Also technically this also overlaps with this section actually: https://nf-co.re/docs/specifications/components/modules/naming-conventions#command-file-output-naming
There was a problem hiding this comment.
This is not meant to be a section on outputs, but rather on inputs. The current documentation largely suggests that every input to the module should be an input to the tool, which is clearly not the case. The involvement of output names is due to the drama of val out_fmt vs. ext.suffix (the former being the preferred practice according to maintainer discussions).
There was a problem hiding this comment.
Basically what I want here is to move ideas from random Slack threads to the documentation
There was a problem hiding this comment.
The current documentation largely suggests that every input to the module should be an input to the tool, which is clearly not the case.
I don't think this is necessarily true: https://nf-co.re/docs/specifications/components/modules/input-output-options#required-val-channel-inputs ('essential for executuion of the tool' does not necessarily reuqire this to be an input to the tool itself - but we could make this clearer if yo uwanted?)
|
@jfy133 could you please clarify whether you suggest to:
|
This PR adds a new section in module input/output specifications to describe best (and discouraged) practices for module inputs that are not tool arguments. This change aims to make existing practices explicit and promote consistency rather than create new rules. It is based on my perception of the consensus, so please let me know if anything is inaccurate.
The changes have been preliminarily checked for compliance with the documentation style guide with GPT-5.4-mini.
@netlify /docs/specifications/components/modules/input-output-options