Skip to content

rbd: retry lock owner listing with resized buffer#1279

Open
hsjsstn wants to merge 1 commit into
ceph:masterfrom
hsjsstn:fix-rbd-lock-get-owners-erange
Open

rbd: retry lock owner listing with resized buffer#1279
hsjsstn wants to merge 1 commit into
ceph:masterfrom
hsjsstn:fix-rbd-lock-get-owners-erange

Conversation

@hsjsstn

@hsjsstn hsjsstn commented Jun 5, 2026

Copy link
Copy Markdown

Update lock owner listing retry behavior:

  • use retry.WithSizes in LockGetOwners
  • reallocate the lock owner buffer when rbd_lock_get_owners returns ERANGE
  • preserve the existing ENOENT behavior

No new test was added because reproducing this path requires librbd to return ERANGE from rbd_lock_get_owners,
which depends on having more lock owners than the initial buffer size.

This is not a new API, so ceph_preview and make api-update do not apply.

fixes #1269

Checklist

  • Added tests for features and functional changes
  • Public functions and types are documented
  • Standard formatting is applied to Go code
  • Is this a new API? Added a new file that begins with //go:build ceph_preview
  • Ran make api-update to record new APIs

New or infrequent contributors may want to review the go-ceph Developer's Guide including the section on how we track API Status and the API Stability Plan.

The go-ceph project uses mergify. View the mergify command guide for information on how to interact with mergify. Add a comment with @Mergifyio rebase to rebase your PR when github indicates that the PR is out of date with the base branch.

@anoopcs9 anoopcs9 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a minor comment, otherwise lgtm.

Comment thread rbd/locks.go Outdated
return retry.Size(int(maxLockOwners)).If(err == errRange)
})
if err != nil {
if err == ErrNotExist {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getError(-C.ENOENT) always returns ErrNotFound, so after getErrorIfNegative(), the error value will be ErrNotFound. The comparison err == ErrNotExist works only because ErrNotExist is itself initialized via getError(-C.ENOENT), making them the same pointer.

I wonder if we should use ErrNotFound instead, but at the same time it is confusing because the naming is unfortunate; it says "RBD image notfound", which is semantically wrong for lock owners. @phlogistonjohn wdyt?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For wrapper APIs, I generally prefer consistency with the underlying C API. So if a C api returns -EFOO we should return ErrFoo. Does that help?

@phlogistonjohn

Copy link
Copy Markdown
Collaborator

Hi, thanks a bunch for the contribution. In these days of agents and stuff we're trying to be a bit more cautious with checking for certain things and that includes the "DCO" - your signed off by line. The Signed-off-by should contain should contain your real name.
Take a look at https://github.com/ceph/ceph/blob/main/SubmittingPatches.rst for more details. Thanks!

@hsjsstn hsjsstn force-pushed the fix-rbd-lock-get-owners-erange branch from ee5ced1 to 2d7c263 Compare June 11, 2026 04:33
@hsjsstn

hsjsstn commented Jun 11, 2026

Copy link
Copy Markdown
Author

Thanks for the review! I updated to compare against ErrNotFound directly, and signed off the commit with my real name.

@anoopcs9

Copy link
Copy Markdown
Collaborator

Thanks for the review! I updated to compare against ErrNotFound directly, and signed off the commit with my real name.

@hsjsstn I see another Signed-off-by: tag (from GitHub probably) in the commit message. Could you please remove it?

Signed-off-by: Soojin Han <sjhan034@naver.com>
@hsjsstn hsjsstn force-pushed the fix-rbd-lock-get-owners-erange branch from 2d7c263 to a106f41 Compare June 11, 2026 06:24
@hsjsstn

hsjsstn commented Jun 11, 2026

Copy link
Copy Markdown
Author

Thanks for the review! I updated to compare against ErrNotFound directly, and signed off the commit with my real name.

@hsjsstn I see another Signed-off-by: tag (from GitHub probably) in the commit message. Could you please remove it?

Sorry, I forgot to remove that earlier. Fixed now!

@anoopcs9 anoopcs9 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, I think even ErrNotFound is also misleading as it says "RBD image not found: . .". I wish we didn't have that early catch for ENOENT in getError() from rbd/errors.go. May be we switch to checking ret instead of err as before?

Otherwise lgtm.

@phlogistonjohn Any further thoughts?

@phlogistonjohn

phlogistonjohn commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

On a second thought, I think even ErrNotFound is also misleading as it says "RBD image not found: . .". I wish we didn't have that early catch for ENOENT in getError() from rbd/errors.go. May be we switch to checking ret instead of err as before?

Can you elaborate on that a bit? I'm not entirely sure what your concern is... I am probably bouncing between to many projects lately to think deeply about it.

@anoopcs9

Copy link
Copy Markdown
Collaborator

On a second thought, I think even ErrNotFound is also misleading as it says "RBD image not found: . .". I wish we didn't have that early catch for ENOENT in getError() from rbd/errors.go. May be we switch to checking ret instead of err as before?

Can you elaborate on that a bit? I'm not entirely sure what your concern is... I am probably bouncing between to many projects lately to think deeply about it.

Here's how the errors are defined in rbd/errors.go:

ErrNotFound = fmt.Errorf("RBD image not found: %w", errutil.GetError("rbd", -C.ENOENT))
ErrNotExist = getError(-C.ENOENT)

and getError() has an early return for ENOENT:

func getError(err C.int) error {
	if err != 0 {
		if err == -C.ENOENT {
			return ErrNotFound
		}
		return errutil.GetError("rbd", int(err))
	}
	return nil
}

Now the very first version of the change compared against ErrNotExist, which is essentially the same as ErrNotFound. The updated version, however, uses the latter. According to the definition, ErrNotFound specifically refers to a missing RBD image, so I think it is also a mismatch - even though the comparison is internal. I would like to verify whether it makes sense to revert to the existing approach of explicitly checking against -C.ENOENT, which is more straightforward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Possible infinite loop in LockGetOwners when ERANGE is returned

3 participants