Re: Leaked POSIX lock warning and crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jul 9, 2018 at 2:10 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
> On Mon, 2018-07-09 at 16:15 +0800, Eddie Horng wrote:
>> 2018-07-09 14:30 GMT+08:00 Amir Goldstein <amir73il@xxxxxxxxx>:
>> > I have no clue.
>> > Is the leaked lock and crash on the client or the server?
>> > If you can get an strace from the process that gets the Leaked message
>> > maybe it will give us a clue to the sort of file descriptor of the leaked
>> > file and how it was opened.
>> > Alternatively print the inode numbers and file types of flock calls to see
>> > where we have a mismatch.
>> >
>> > Thanks,
>> > Amir.
>>
>> Both the leaked lock and crash are on the server.
>>
>> I can emulate one of the lock failure case with a reproducer run along with
>> android building. The reproducer's behavior and result are very similar with
>> out/.lock generated by android build to control only one build process can
>> run on at the same time. In the first time (out/.lock is not exist),
>> flock works but a
>> "Leaked ..." message is supposed caused by it. After a round of build
>> completed, do a second build, the out/.lock is now failed to be locked.
>> The reproducer open and flock another file under out/ can reproduce the case.
>> Can this scenario help us to debug?
>>
>> process 1:                                                          process 2:
>> $ ~/flock/a.out /mnt/n/out/mylock
>> flock succeed, press any key to continue...
>>
>>     $ cd /mnt/n && make -j12  # (build android)
>> close succeed
>> $ ~/flock/a.out /mnt/n/out/mylock
>> failed to lock file '/mnt/n/out/mylock': Resource temporarily unavailable
>> close succeed
>>
>> reproducer:
>> #include <stdio.h>
>> #include <sys/types.h>
>> #include <sys/stat.h>
>> #include <fcntl.h>
>> #include <unistd.h>
>> #include <sys/file.h>
>> #include <errno.h>
>> #include <string.h>
>>
>> int main(int argc, void **argv) {
>> char *filename=argv[1];
>> int fd = open(filename, O_RDWR|O_CREAT, 0666);
>> int flock_result = flock(fd, LOCK_EX | LOCK_NB);
>> int err;
>> if (flock_result != 0) {
>>       printf("failed to lock file '%s': %s\n", filename, strerror(errno));
>>       goto out;
>>     }
>>     printf("flock succeed, press any key to continue...\n");
>>     getchar();
>>
>> out:
>>     err = close(fd);
>>     if (err == 0)
>>     printf("close succeed\n");
>>     else
>>     printf("failed to close %d: %s\n", fd, strerror(errno));
>> }
>>
>
> This setup is pretty complicated. IIUC, you are exporting overlayfs via
> knfsd and then using the NFS client's flock emulation to map flock locks
> to POSIX ones. I think you probably want to simplify this reproducer a
> bit.
>
> Is it possible to reproduce this on a setup that doesn't have overlayfs
> involved, just to rule it in or out as a factor here?
>
> There are also a number of tracepoints in the posix locking code. It
> might be interesting to turn on the ones for posix_lock_inode and
> locks_remove_posix and and then run the reproducer to get a better idea
> of what's happening to those locks.
>

Thanks for the suggestions Jeff.

Eddie,

This is NFS v4. Right?
Do you wait until Android build completes before closing the
first reproducer fd?

I suspect you can replace the effect of Android build with
drop_caches on the server.

Jeff,

Does knfsd hold a reference on the file/dentry/inode when
a lock is taken?

Assuming this is indeed a bug reproduced only with NFS+overlayfs
it sounds like overlay decode file handle fails to return the same
inode that knfd holds with the lock.

Thanks,
Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystems Devel]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux