Re: [PATCHv2 dlm/next] fs: dlm: avoid F_SETLKW plock op lookup collisions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Thu, May 25, 2023 at 11:02 AM Andreas Gruenbacher
<agruenba@xxxxxxxxxx> wrote:
>
> On Wed, May 24, 2023 at 6:02 PM Alexander Aring <aahringo@xxxxxxxxxx> wrote:
> > This patch fixes a possible plock op collisions when using F_SETLKW lock
> > requests and fsid, number and owner are not enough to identify a result
> > for a pending request. The ltp testcases [0] and [1] are examples when
> > this is not enough in case of using classic posix locks with threads and
> > open filedescriptor posix locks.
> >
> > The idea to fix the issue here is to place all lock request in order. In
> > case of non F_SETLKW lock request (indicated if wait is set or not) the
> > lock requests are ordered inside the recv_list. If a result comes back
> > the right plock op can be found by the first plock_op in recv_list which
> > has not info.wait set. This can be done only by non F_SETLKW plock ops as
> > dlm_controld always reads a specific plock op (list_move_tail() from
> > send_list to recv_mlist) and write the result immediately back.
> >
> > This behaviour is for F_SETLKW not possible as multiple waiters can be
> > get a result back in an random order. To avoid a collisions in cases
> > like [0] or [1] this patch adds more fields to compare the plock
> > operations as the lock request is the same. This is also being made in
> > NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We
> > still can't find the exact lock request for a specific result if the
> > lock request is the same, but if this is the case we don't care the
> > order how the identical lock requests get their result back to grant the
> > lock.
>
> When the recv_list contains multiple indistinguishable requests, this
> can only be because they originated from multiple threads of the same
> process. In that case, I agree that it doesn't matter which of those
> requests we "complete" in dev_write() as long as we only complete one
> request. We do need to compare the additional request fields in
> dev_write() to find a suitable request, so that makes sense as well.
> We need to compare all of the fields that identify a request (optype,
> ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the
> "right" request (or in case there is more than one identical request,
> a "suitable" request).
>

In my "definition" why this works is as you said the "identical
request". There is a more deeper definition of "when is a request
identical" and in my opinion it is here as: "A request A is identical
to request B when they get granted under the same 'time'" which is all
the fields you mentioned.

Even with cancellation (F_SETLKW only) it does not matter which
"identical request" you cancel because the kernel and user
(dlm_controld) makes no relation between a lock request instance. You
need to have at least the same amount of "results" coming back from
user space as the amount you are waiting for a result for the same
"identical request".

> The above patch description doesn't match the code anymore, and the
> code doesn't fully revert the recv_list splitting of the previous
> version.
>

This isn't a revert. Is it a new patch version, I did drop the
recv_setlkw_list here, dropping in means of removing the
recv_setlkw_list and handling everything in the recv_list. Although
there might be a performance impact by splitting the requests in two
lists as we don't need to jump over all F_SETLKW requests.

> > [0] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl40.c
> > [1] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl41.c
> > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/lockd/lockd.h?h=v6.4-rc1#n373
> > [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/lockd/svclock.c?h=v6.4-rc1#n731
> >
> > Cc: stable@xxxxxxxxxxxxxxx
> > Signed-off-by: Alexander Aring <aahringo@xxxxxxxxxx>
> > ---
> > change since v2:
> >  - don't split recv_list into recv_setlkw_list
> >
> >  fs/dlm/plock.c | 43 ++++++++++++++++++++++++++++++-------------
> >  1 file changed, 30 insertions(+), 13 deletions(-)
> >
> > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> > index 31bc601ee3d8..53d17dbbb716 100644
> > --- a/fs/dlm/plock.c
> > +++ b/fs/dlm/plock.c
> > @@ -391,7 +391,7 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count,
> >                 if (op->info.flags & DLM_PLOCK_FL_CLOSE)
> >                         list_del(&op->list);
> >                 else
> > -                       list_move(&op->list, &recv_list);
> > +                       list_move_tail(&op->list, &recv_list);
>
> ^ This should be obsolete, but it won't hurt, either.
>

No it is necessary, I tested it and looked deeper into the reason.
dlm_controld handles the lock requests in an ordered way over a
select() mechanism, but it will not always write a result back when
it's read the request out. This is the case for F_SETLKW but also for
all other plock op requests, such as F_GETLK. Instead of writing the
result back it will send it to corosync and the corosync select()
mechanism will write the result back. Corosync will keep the order to
write the result back. Due the fact that it's going through corosync
multiple non F_SETLKW can be queued up in recv_list and need to be
appended on the tail to later find the first entry which is non
F_SETLKW to find the result.

This ordered lock request read and write the result back (for non
F_SETLKW ops) is not part of UAPI of dlm plock and dlm_controld did it
always this way.

> >                 memcpy(&info, &op->info, sizeof(info));
> >         }
> >         spin_unlock(&ops_lock);
> > @@ -430,19 +430,36 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
> >                 return -EINVAL;
> >
> >         spin_lock(&ops_lock);
> > -       list_for_each_entry(iter, &recv_list, list) {
> > -               if (iter->info.fsid == info.fsid &&
> > -                   iter->info.number == info.number &&
> > -                   iter->info.owner == info.owner) {
> > -                       list_del_init(&iter->list);
> > -                       memcpy(&iter->info, &info, sizeof(info));
> > -                       if (iter->data)
> > -                               do_callback = 1;
> > -                       else
> > -                               iter->done = 1;
> > -                       op = iter;
> > -                       break;
> > +       if (info.wait) {
>
> We should be able to use the same list_for_each_entry() loop for
> F_SETLKW requests (which have info.wait set) as for all other requests
> as far as I can see.
>

We can't match non F_SETLKW operations on all fields because F_GETLK
will change some fields when it's handled in user space. This is the
whole reason why the ordered handling is done here.

However there can be matched more fields but because F_GETLK we
require that this mechanism works in the above mentioned ordered way.
Those fields are checked by WARN_ON() that we get aware about changes
and "things" doesn't work anymore as they should.

> > +               list_for_each_entry(iter, &recv_list, list) {
> > +                       if (iter->info.fsid == info.fsid &&
> > +                           iter->info.number == info.number &&
> > +                           iter->info.owner == info.owner &&
> > +                           iter->info.pid == info.pid &&
> > +                           iter->info.start == info.start &&
> > +                           iter->info.end == info.end &&
> > +                           iter->info.ex == info.ex &&
> > +                           iter->info.wait) {
> > +                               op = iter;
> > +                               break;
> > +                       }
> >                 }
> > +       } else {
> > +               list_for_each_entry(iter, &recv_list, list) {
> > +                       if (!iter->info.wait) {
> > +                               op = iter;
> > +                               break;
> > +                       }
> > +               }
> > +       }
> > +
> > +       if (op) {
> > +               list_del_init(&op->list);
> > +               memcpy(&op->info, &info, sizeof(info));
> > +               if (op->data)
> > +                       do_callback = 1;
> > +               else
> > +                       op->done = 1;
> >         }
>
> Can't this code just remain in the list_for_each_entry() loop?
>

It can, but we need two of them then in each loop because two loops
are necessary (see above).

- Alex





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux