On Wed, May 24, 2023 at 6:02 PM Alexander Aring <aahringo@xxxxxxxxxx> wrote: > This patch fixes a possible plock op collisions when using F_SETLKW lock > requests and fsid, number and owner are not enough to identify a result > for a pending request. The ltp testcases [0] and [1] are examples when > this is not enough in case of using classic posix locks with threads and > open filedescriptor posix locks. > > The idea to fix the issue here is to place all lock request in order. In > case of non F_SETLKW lock request (indicated if wait is set or not) the > lock requests are ordered inside the recv_list. If a result comes back > the right plock op can be found by the first plock_op in recv_list which > has not info.wait set. This can be done only by non F_SETLKW plock ops as > dlm_controld always reads a specific plock op (list_move_tail() from > send_list to recv_mlist) and write the result immediately back. > > This behaviour is for F_SETLKW not possible as multiple waiters can be > get a result back in an random order. To avoid a collisions in cases > like [0] or [1] this patch adds more fields to compare the plock > operations as the lock request is the same. This is also being made in > NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We > still can't find the exact lock request for a specific result if the > lock request is the same, but if this is the case we don't care the > order how the identical lock requests get their result back to grant the > lock. When the recv_list contains multiple indistinguishable requests, this can only be because they originated from multiple threads of the same process. In that case, I agree that it doesn't matter which of those requests we "complete" in dev_write() as long as we only complete one request. We do need to compare the additional request fields in dev_write() to find a suitable request, so that makes sense as well. We need to compare all of the fields that identify a request (optype, ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the "right" request (or in case there is more than one identical request, a "suitable" request). The above patch description doesn't match the code anymore, and the code doesn't fully revert the recv_list splitting of the previous version. > [0] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl40.c > [1] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl41.c > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/lockd/lockd.h?h=v6.4-rc1#n373 > [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/lockd/svclock.c?h=v6.4-rc1#n731 > > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Alexander Aring <aahringo@xxxxxxxxxx> > --- > change since v2: > - don't split recv_list into recv_setlkw_list > > fs/dlm/plock.c | 43 ++++++++++++++++++++++++++++++------------- > 1 file changed, 30 insertions(+), 13 deletions(-) > > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c > index 31bc601ee3d8..53d17dbbb716 100644 > --- a/fs/dlm/plock.c > +++ b/fs/dlm/plock.c > @@ -391,7 +391,7 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count, > if (op->info.flags & DLM_PLOCK_FL_CLOSE) > list_del(&op->list); > else > - list_move(&op->list, &recv_list); > + list_move_tail(&op->list, &recv_list); ^ This should be obsolete, but it won't hurt, either. > memcpy(&info, &op->info, sizeof(info)); > } > spin_unlock(&ops_lock); > @@ -430,19 +430,36 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count, > return -EINVAL; > > spin_lock(&ops_lock); > - list_for_each_entry(iter, &recv_list, list) { > - if (iter->info.fsid == info.fsid && > - iter->info.number == info.number && > - iter->info.owner == info.owner) { > - list_del_init(&iter->list); > - memcpy(&iter->info, &info, sizeof(info)); > - if (iter->data) > - do_callback = 1; > - else > - iter->done = 1; > - op = iter; > - break; > + if (info.wait) { We should be able to use the same list_for_each_entry() loop for F_SETLKW requests (which have info.wait set) as for all other requests as far as I can see. > + list_for_each_entry(iter, &recv_list, list) { > + if (iter->info.fsid == info.fsid && > + iter->info.number == info.number && > + iter->info.owner == info.owner && > + iter->info.pid == info.pid && > + iter->info.start == info.start && > + iter->info.end == info.end && > + iter->info.ex == info.ex && > + iter->info.wait) { > + op = iter; > + break; > + } > } > + } else { > + list_for_each_entry(iter, &recv_list, list) { > + if (!iter->info.wait) { > + op = iter; > + break; > + } > + } > + } > + > + if (op) { > + list_del_init(&op->list); > + memcpy(&op->info, &info, sizeof(info)); > + if (op->data) > + do_callback = 1; > + else > + op->done = 1; > } Can't this code just remain in the list_for_each_entry() loop? > spin_unlock(&ops_lock); > > -- > 2.31.1 > Andreas