Re: [PATCH v6.5-rc1 1/2] fs: dlm: introduce DLM_PLOCK_FL_NO_REPLY flag

Alexander Aring <aahringo@xxxxxxxxxx> · Fri, 14 Jul 2023 19:21:47 -0400

Hi,

On Fri, Jul 14, 2023 at 9:54 AM Andreas Gruenbacher <agruenba@xxxxxxxxxx> wrote:
>
> On Thu, Jul 13, 2023 at 4:40 PM Alexander Aring <aahringo@xxxxxxxxxx> wrote:
> > This patch introduces a new flag DLM_PLOCK_FL_NO_REPLY in case an dlm
> > plock operation should not send a reply back. Currently this is kind of
> > being handled in DLM_PLOCK_FL_CLOSE, but DLM_PLOCK_FL_CLOSE has more
> > meanings that it will remove all waiters for a specific nodeid/owner
> > values in by doing a unlock operation. In case of an error in dlm user
> > space software e.g. dlm_controld we get an reply with an error back.
> > This cannot be matched because there is no op to match in recv_list. We
> > filter now on DLM_PLOCK_FL_NO_REPLY in case we had an error back as
> > reply. In newer dlm_controld version it will never send a result back
> > when DLM_PLOCK_FL_NO_REPLY is set. This filter is a workaround to handle
> > older dlm_controld versions.
>
> I don't think this makes sense. If dlm_controld understands a
> particular request, it already knows whether or not that request
> should receive a reply. On the other hand, if dlm_controld doesn't
> understand a particular request, it should communicate that fact back
> to the kernel so that the kernel will know. The kernel knows which
> kinds of requests should and shouldn't receive replies, so when it is
> sent a reply it doesn't expect, it knows that dlm_controld didn't
> understand the request and is either outdated or plain broken. The
> kernel doesn't need to pipe a flag through dlm_controld for figuring
> that out.
>

It's already part of UAPI that a flag signals that a reply is not
expected [0]. If this flag is set and current user space software did
not understand the request (or any possible future user space error
handling) it will send a reply back which cannot be matched and with
the current match logic it will match the wrong one.
I would say it is broken and probably we don't care because we never
assume that an error will happen, we just need to be careful to not
add any other possible reply for errors in future.

The bigger problem is to introduce new ops [1] (not flags) on the
kernel side which does not send a reply back. Older user space
software will not understand the optype and will send a reply, newer
software will not send a reply (because it understands the optype).

Therefore I think we should never introduce a new optype which does
not send a reply back. The new DLM_PLOCK_OP_CANCEL was trying to do
that, that's why I tried to fix this behaviour which is another ugly
workaround passing flags around.

- Alex

[0] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/dlm/plock.c#n395
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/dlm_plock.h?h=v6.5-rc1#n21