On Tue, May 30, 2023 at 4:08 PM Alexander Aring <aahringo@xxxxxxxxxx> wrote: > Hi, > > On Tue, May 30, 2023 at 7:01 AM Andreas Gruenbacher <agruenba@xxxxxxxxxx> wrote: > > > > On Tue, May 30, 2023 at 12:19 AM Alexander Aring <aahringo@xxxxxxxxxx> wrote: > > > Hi, > > > > > > On Thu, May 25, 2023 at 11:02 AM Andreas Gruenbacher > > > <agruenba@xxxxxxxxxx> wrote: > > > > > > > > On Wed, May 24, 2023 at 6:02 PM Alexander Aring <aahringo@xxxxxxxxxx> wrote: > > > > > This patch fixes a possible plock op collisions when using F_SETLKW lock > > > > > requests and fsid, number and owner are not enough to identify a result > > > > > for a pending request. The ltp testcases [0] and [1] are examples when > > > > > this is not enough in case of using classic posix locks with threads and > > > > > open filedescriptor posix locks. > > > > > > > > > > The idea to fix the issue here is to place all lock request in order. In > > > > > case of non F_SETLKW lock request (indicated if wait is set or not) the > > > > > lock requests are ordered inside the recv_list. If a result comes back > > > > > the right plock op can be found by the first plock_op in recv_list which > > > > > has not info.wait set. This can be done only by non F_SETLKW plock ops as > > > > > dlm_controld always reads a specific plock op (list_move_tail() from > > > > > send_list to recv_mlist) and write the result immediately back. > > > > > > > > > > This behaviour is for F_SETLKW not possible as multiple waiters can be > > > > > get a result back in an random order. To avoid a collisions in cases > > > > > like [0] or [1] this patch adds more fields to compare the plock > > > > > operations as the lock request is the same. This is also being made in > > > > > NFS to find an result for an asynchronous F_SETLKW lock request [2][3]. We > > > > > still can't find the exact lock request for a specific result if the > > > > > lock request is the same, but if this is the case we don't care the > > > > > order how the identical lock requests get their result back to grant the > > > > > lock. > > > > > > > > When the recv_list contains multiple indistinguishable requests, this > > > > can only be because they originated from multiple threads of the same > > > > process. In that case, I agree that it doesn't matter which of those > > > > requests we "complete" in dev_write() as long as we only complete one > > > > request. We do need to compare the additional request fields in > > > > dev_write() to find a suitable request, so that makes sense as well. > > > > We need to compare all of the fields that identify a request (optype, > > > > ex, wait, pid, nodeid, fsid, number, start, end, owner) to find the > > > > "right" request (or in case there is more than one identical request, > > > > a "suitable" request). > > > > > > > > > > In my "definition" why this works is as you said the "identical > > > request". There is a more deeper definition of "when is a request > > > identical" and in my opinion it is here as: "A request A is identical > > > to request B when they get granted under the same 'time'" which is all > > > the fields you mentioned. > > > > > > Even with cancellation (F_SETLKW only) it does not matter which > > > "identical request" you cancel because the kernel and user > > > (dlm_controld) makes no relation between a lock request instance. You > > > need to have at least the same amount of "results" coming back from > > > user space as the amount you are waiting for a result for the same > > > "identical request". > > > > That's not incorrect per se, but cancellations create an additional > > difficulty: they can either succeed or fail. To indicate that a > > cancellation has succeeded, a new type of message can be introduced > > (say, "CANCELLED"), and it's obvious that a CANCELLED message can only > > belong to a locking request that is being cancelled. When cancelling a > > locking request fails, the kernel will see a "locking request granted" > > message though, and when multiple identical locking requests are > > queued and only some of them have been cancelled, it won't be obvious > > which locking request a "locking request granted" message should be > > assigned to anymore. You really don't want to mix things up in that > > case. > > > > This complication makes it a whole lot more difficult to reason about > > the correctness of the code. All that complexity is avoidable by > > sticking with a fixed mapping of requests and replies (i.e., a unique > > request identifier). > > > > To put it differently, you can shoot yourself in the foot and still > > hop along on the other leg, but it may not be the best of all possible > > ideas. > > > > It makes things more complicated, I agree and the reason why this > works now is because there are a lot of "dependencies". I would love > to have an unique identifier to make it possible that we can follow an > instance handle of the original lock request. > > * an unique identifier which also works with the async lock request of > lockd case. What's the lockd case you're referring to here, and why is it relevant for the problem at hand? > > > > The above patch description doesn't match the code anymore, and the > > > > code doesn't fully revert the recv_list splitting of the previous > > > > version. > > > > > > > > > > This isn't a revert. Is it a new patch version, I did drop the > > > recv_setlkw_list here, dropping in means of removing the > > > recv_setlkw_list and handling everything in the recv_list. Although > > > there might be a performance impact by splitting the requests in two > > > lists as we don't need to jump over all F_SETLKW requests. > > > > > > > > [0] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl40.c > > > > > [1] https://gitlab.com/netcoder/ltp/-/blob/dlm_fcntl_owner_testcase/testcases/kernel/syscalls/fcntl/fcntl41.c > > > > > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/lockd/lockd.h?h=v6.4-rc1#n373 > > > > > [3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/lockd/svclock.c?h=v6.4-rc1#n731 > > > > > > > > > > Cc: stable@xxxxxxxxxxxxxxx > > > > > Signed-off-by: Alexander Aring <aahringo@xxxxxxxxxx> > > > > > --- > > > > > change since v2: > > > > > - don't split recv_list into recv_setlkw_list > > > > > > > > > > fs/dlm/plock.c | 43 ++++++++++++++++++++++++++++++------------- > > > > > 1 file changed, 30 insertions(+), 13 deletions(-) > > > > > > > > > > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c > > > > > index 31bc601ee3d8..53d17dbbb716 100644 > > > > > --- a/fs/dlm/plock.c > > > > > +++ b/fs/dlm/plock.c > > > > > @@ -391,7 +391,7 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count, > > > > > if (op->info.flags & DLM_PLOCK_FL_CLOSE) > > > > > list_del(&op->list); > > > > > else > > > > > - list_move(&op->list, &recv_list); > > > > > + list_move_tail(&op->list, &recv_list); > > > > > > > > ^ This should be obsolete, but it won't hurt, either. > > > > > > > > > > No it is necessary, I tested it and looked deeper into the reason. > > > dlm_controld handles the lock requests in an ordered way over a > > > select() mechanism, but it will not always write a result back when > > > it's read the request out. This is the case for F_SETLKW but also for > > > all other plock op requests, such as F_GETLK. Instead of writing the > > > result back it will send it to corosync and the corosync select() > > > mechanism will write the result back. Corosync will keep the order to > > > write the result back. Due the fact that it's going through corosync > > > multiple non F_SETLKW can be queued up in recv_list and need to be > > > appended on the tail to later find the first entry which is non > > > F_SETLKW to find the result. > > > > > > This ordered lock request read and write the result back (for non > > > F_SETLKW ops) is not part of UAPI of dlm plock and dlm_controld did it > > > always this way. > > > > This sounds pretty confused. Let's look at > > > > As I said, yes it is a lot of specific handling of user space why this > is actually working. > > > > > > memcpy(&info, &op->info, sizeof(info)); > > > > > } > > > > > spin_unlock(&ops_lock); > > > > > @@ -430,19 +430,36 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count, > > > > > return -EINVAL; > > > > > > > > > > spin_lock(&ops_lock); > > > > > - list_for_each_entry(iter, &recv_list, list) { > > > > > - if (iter->info.fsid == info.fsid && > > > > > - iter->info.number == info.number && > > > > > - iter->info.owner == info.owner) { > > > > > - list_del_init(&iter->list); > > > > > - memcpy(&iter->info, &info, sizeof(info)); > > > > > - if (iter->data) > > > > > - do_callback = 1; > > > > > - else > > > > > - iter->done = 1; > > > > > - op = iter; > > > > > - break; > > > > > + if (info.wait) { > > > > > > .> > We should be able to use the same list_for_each_entry() loop for > > > > F_SETLKW requests (which have info.wait set) as for all other requests > > > > as far as I can see. > > > > > > > > > > We can't match non F_SETLKW operations on all fields because F_GETLK > > > will change some fields when it's handled in user space. This is the > > > whole reason why the ordered handling is done here. > > > > I know that F_GETLK uses the l_type field to indicate the outcome of > > the operation. But that happens in dlm_posix_get() when processing the > > reply from dlm_controld; it doesn't affect info.optype or any other > > fields in struct dlm_plock_info. So we actually can compare all of the > > key fields in struct dlm_plock_info. > > > > F_GETLK also uses start, end, etc. to tell the caller about the region > which is in conflict. The region which is in conflict is returned into > the result info struct. See [0] [1]. > Is this more clear now that other fields are affected? Ah, that sucks. Thanks, Andreas