Alexander Aring <aahringo@xxxxxxxxxx> writes: > Hi, > > On Fri, Mar 1, 2024 at 11:25 AM Paulo Alcantara <pc@xxxxxxxxxxxxx> wrote: >> >> Hi Zorro, >> >> The problem is that cifs.ko is returning -EACCES from fcntl(2) called >> in do_test_equal_file_lock() but it is expecting -EAGAIN to be >> returned, so it hangs in wait4(2): >> >> ... >> [pid 14846] fcntl(3, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = -1 EACCES (Permission denied) >> [pid 14846] wait4(-1, >> >> The man page says: >> >> F_SETLK (struct flock *) >> Acquire a lock (when l_type is F_RDLCK or F_WRLCK) or release a >> lock (when l_type is F_UNLCK) on the bytes specified by the >> l_whence, l_start, and l_len fields of lock. If a conflicting >> lock is held by another process, this call returns -1 and sets >> errno to EACCES or EAGAIN. (The error returned in this case >> differs across implementations, so POSIX requires a portable ap‐ >> plication to check for both errors.) >> >> so fcntl_lock_corner_tests should also handle -EACCES. >> > > yes, that is a bug in the test but in my opinion there is still an > issue. The mentioned fcntl(F_SETLK) above is just a sanity check to > print out if something is not correct and it will print out that > something is not correct and fails. Yes, I agree it might be a cifs.ko issue. However, it's still important making sure that the test exits gracefully and then report an error rather than hanging. > The problem is that wait() below, the child processes are not > returning and are in a blocking state which should not be the case. > > What the test is doing is the following: > > parent: > > 1. lock(A) # should be successful to acquire Client successfully acquires it. > child: > thread0: > 2. lock(A) # should block > thread1: > 3. lock(A) # should block OK - both threads are blocked. > parent: > > 5. sleep(3) #wait until child are in blocking state of lock(A) OK. > 5. unlock(A) # both threads of the child should unlock and exit At this point, both threads are woken up and one of them acquires the lock and returns. The other thread gets blocked again because it finds a conflicting lock that was taken from the other thread. The child then never exits because it is waiting in pthread_join(). > 6. sleep 3 # wait for pending unlock op (not really sure if it's necessary) > ... > 7. trylock(A) # mentioned sanity check Client returns -EACCES because one of the child threads acquired the lock. > The unlock(A) should unblock the child threads, it is important to > mention that this test does a lock corner test and the lock(A) in both > threads ends in a ->lock() call with a "struct file_lock" that has > mostly the same fields. We had issues with that in gfs2 and a lookup > function to find the right request with an async complete handler of > the lock operation. Alex, thanks for the explanation! As we've talked, there might be a missing check of fl_owner or some sort of protocol limitation while checking for lock conflicts. Steve, any thoughts on this?