feat: Add data-lineage overview#1383
Conversation
✅ Deploy Preview for seqera-docs ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
christopher-hakkaart
left a comment
There was a problem hiding this comment.
I've made a range of suggestions to make it more concise.
I wasn't sure the exact status of "Lineage ID" vs "lineage ID". Please confirm and I can update those.
Co-authored-by: Chris Hakkaart <chris.hakkaart@seqera.io> Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com>
Co-authored-by: Chris Hakkaart <chris.hakkaart@seqera.io> Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com>
Co-authored-by: Chris Hakkaart <chris.hakkaart@seqera.io> Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com>
Co-authored-by: Chris Hakkaart <chris.hakkaart@seqera.io> Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com>
Co-authored-by: Chris Hakkaart <chris.hakkaart@seqera.io> Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com>
Co-authored-by: Chris Hakkaart <chris.hakkaart@seqera.io> Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com>
Co-authored-by: Chris Hakkaart <chris.hakkaart@seqera.io> Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com>
Co-authored-by: Chris Hakkaart <chris.hakkaart@seqera.io> Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com>
Co-authored-by: Chris Hakkaart <chris.hakkaart@seqera.io> Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com>
Co-authored-by: Chris Hakkaart <chris.hakkaart@seqera.io> Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com>
Co-authored-by: Chris Hakkaart <chris.hakkaart@seqera.io> Signed-off-by: Rob Newman <61608+robnewman@users.noreply.github.com>
christopher-hakkaart
left a comment
There was a problem hiding this comment.
Looking good - approved, but please check the latest suggestions are correct.
gavinelder
left a comment
There was a problem hiding this comment.
Note Lineage requires additional IAM permissions beyond our current documented set for the AWS credentials used with Seqera Platform.
If a customer is using an existing AWS Batch Queue or AWS Cloud Compute Environments with custom IAM roles, they will also need to update the relevant service role policies.
For the service role they need to add
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ListObjectsInBucket",
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::seqera-lineage-<workspace-id>"
},
{
"Sid": "AllObjectActions",
"Effect": "Allow",
"Action": "s3:*Object",
"Resource": "arn:aws:s3:::seqera-lineage-<workspace-id>/*"
},
{
"Sid": "AllowObjectTagging",
"Effect": "Allow",
"Action": [
"s3:PutObjectTagging",
"s3:GetObjectTagging"
],
"Resource": "arn:aws:s3:::seqera-lineage-<workspace-id>/*"
}
]
}
The Seqera Platform integration credentials require the following additional permissions.
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"sqs:CreateQueue",
"sqs:GetQueueAttributes",
"sqs:SetQueueAttributes",
"sqs:GetQueueUrl",
"sqs:ReceiveMessage",
"sqs:DeleteMessage"
],
"Resource": "arn:aws:sqs:*:*:seqera-lineage-*"
},
{
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:GetBucketNotificationConfiguration",
"s3:PutBucketNotificationConfiguration",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::seqera-lineage-*"
}
]
}
|
The new page was not included in any navigation sidebar, so was impossible to find. I've just pushed a commit to add it to the docs sidebar, hope that's ok! |
Merge changes that had conflicts earlier. Co-authored-by: Chris Hakkaart <chris.hakkaart@seqera.io> Signed-off-by: Chris Hakkaart <chris.hakkaart@seqera.io>
Updated tags formatting for clarity and consistency. Signed-off-by: Chris Hakkaart <chris.hakkaart@seqera.io>
Added warnings about the experimental nature of data lineage. Signed-off-by: Chris Hakkaart <chris.hakkaart@seqera.io>
Signed-off-by: Chris Hakkaart <chris.hakkaart@seqera.io>
Clarified instructions for enabling data lineage in Nextflow. Signed-off-by: Chris Hakkaart <chris.hakkaart@seqera.io>
ewels
left a comment
There was a problem hiding this comment.
Couple of minor comments, but generally LGTM!
Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
Signed-off-by: Justine Geffen <justinegeffen@users.noreply.github.com>
|
@robnewman, I'm happy with this editorially. Thank you! |
|
|
||
| Assign lineage labels to output files using the `label` directive in your Nextflow process definitions. Labels appear in lineage records and are searchable across your workspace. | ||
|
|
||
| Both Seqera Platform labels and Nextflow lineage labels propagate to lineage records. Seqera Platform excludes resource labels as they relate to underlying compute resources, not the data itself. |
There was a problem hiding this comment.
Do Seqera Platform labels propagate to lineage records?
I tried this and I don't think they do propagate to the record (only the Nextflow lineage labels were there as far as I can see)
There was a problem hiding this comment.
Per @bentsherman , provided the Nextflow version used is 26.04, then Seqera Platform labels propagate to the lineage WorkflowRun record.
There was a problem hiding this comment.
And according to @swampie, the default Nextflow version in v26.1 will be 26.04, so it will be supported.
|
@robnewman could we move the extra policy required for lineage into the AWS Batch and AWS Cloud docs pages? Having them in a separate place like now means customers need to remember to visit the lineage page and extract the policies required whenever they create a new user, which is prone to errors and makes management harder for us. Happy to review or help. |
Sure - please create a PR! |
Add overview page for why data lineage is important and how to access.
@netlify /platform-cloud/data/data-lineage