On Fri, Mar 01, 2024 at 08:59:01PM -0300, Paulo Alcantara wrote: > Alexander Aring <aahringo@xxxxxxxxxx> writes: > > > Hi, > > > > On Fri, Mar 1, 2024 at 11:25 AM Paulo Alcantara <pc@xxxxxxxxxxxxx> wrote: > >> > >> Hi Zorro, > >> > >> The problem is that cifs.ko is returning -EACCES from fcntl(2) called > >> in do_test_equal_file_lock() but it is expecting -EAGAIN to be > >> returned, so it hangs in wait4(2): > >> > >> ... > >> [pid 14846] fcntl(3, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=1}) = -1 EACCES (Permission denied) > >> [pid 14846] wait4(-1, > >> > >> The man page says: > >> > >> F_SETLK (struct flock *) > >> Acquire a lock (when l_type is F_RDLCK or F_WRLCK) or release a > >> lock (when l_type is F_UNLCK) on the bytes specified by the > >> l_whence, l_start, and l_len fields of lock. If a conflicting > >> lock is held by another process, this call returns -1 and sets > >> errno to EACCES or EAGAIN. (The error returned in this case > >> differs across implementations, so POSIX requires a portable ap‐ > >> plication to check for both errors.) > >> > >> so fcntl_lock_corner_tests should also handle -EACCES. > >> > > > > yes, that is a bug in the test but in my opinion there is still an > > issue. The mentioned fcntl(F_SETLK) above is just a sanity check to > > print out if something is not correct and it will print out that > > something is not correct and fails. > > Yes, I agree it might be a cifs.ko issue. However, it's still important > making sure that the test exits gracefully and then report an error > rather than hanging. Thanks for all of you look into it! If the C program can deal with issue (report error rather than hang), that would be good. Or how about give the fcntl testing process a (long enough) timeout number, to avoid it block the whole fstests test running, and report error if it exits unnormally. Thanks, Zorro > > > The problem is that wait() below, the child processes are not > > returning and are in a blocking state which should not be the case. > > > > What the test is doing is the following: > > > > parent: > > > > 1. lock(A) # should be successful to acquire > > Client successfully acquires it. > > > child: > > thread0: > > 2. lock(A) # should block > > thread1: > > 3. lock(A) # should block > > OK - both threads are blocked. > > > parent: > > > > 5. sleep(3) #wait until child are in blocking state of lock(A) > > OK. > > > 5. unlock(A) # both threads of the child should unlock and exit > > At this point, both threads are woken up and one of them acquires the > lock and returns. The other thread gets blocked again because it finds > a conflicting lock that was taken from the other thread. The child then > never exits because it is waiting in pthread_join(). > > > 6. sleep 3 # wait for pending unlock op (not really sure if it's necessary) > > ... > > 7. trylock(A) # mentioned sanity check > > Client returns -EACCES because one of the child threads acquired the > lock. > > > The unlock(A) should unblock the child threads, it is important to > > mention that this test does a lock corner test and the lock(A) in both > > threads ends in a ->lock() call with a "struct file_lock" that has > > mostly the same fields. We had issues with that in gfs2 and a lookup > > function to find the right request with an async complete handler of > > the lock operation. > > Alex, thanks for the explanation! As we've talked, there might be a > missing check of fl_owner or some sort of protocol limitation while > checking for lock conflicts. > > Steve, any thoughts on this? >