On Mon, 2018-07-09 at 16:15 +0800, Eddie Horng wrote: > 2018-07-09 14:30 GMT+08:00 Amir Goldstein <amir73il@xxxxxxxxx>: > > I have no clue. > > Is the leaked lock and crash on the client or the server? > > If you can get an strace from the process that gets the Leaked message > > maybe it will give us a clue to the sort of file descriptor of the leaked > > file and how it was opened. > > Alternatively print the inode numbers and file types of flock calls to see > > where we have a mismatch. > > > > Thanks, > > Amir. > > Both the leaked lock and crash are on the server. > > I can emulate one of the lock failure case with a reproducer run along with > android building. The reproducer's behavior and result are very similar with > out/.lock generated by android build to control only one build process can > run on at the same time. In the first time (out/.lock is not exist), > flock works but a > "Leaked ..." message is supposed caused by it. After a round of build > completed, do a second build, the out/.lock is now failed to be locked. > The reproducer open and flock another file under out/ can reproduce the case. > Can this scenario help us to debug? > > process 1: process 2: > $ ~/flock/a.out /mnt/n/out/mylock > flock succeed, press any key to continue... > > $ cd /mnt/n && make -j12 # (build android) > close succeed > $ ~/flock/a.out /mnt/n/out/mylock > failed to lock file '/mnt/n/out/mylock': Resource temporarily unavailable > close succeed > > reproducer: > #include <stdio.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <fcntl.h> > #include <unistd.h> > #include <sys/file.h> > #include <errno.h> > #include <string.h> > > int main(int argc, void **argv) { > char *filename=argv[1]; > int fd = open(filename, O_RDWR|O_CREAT, 0666); > int flock_result = flock(fd, LOCK_EX | LOCK_NB); > int err; > if (flock_result != 0) { > printf("failed to lock file '%s': %s\n", filename, strerror(errno)); > goto out; > } > printf("flock succeed, press any key to continue...\n"); > getchar(); > > out: > err = close(fd); > if (err == 0) > printf("close succeed\n"); > else > printf("failed to close %d: %s\n", fd, strerror(errno)); > } > This setup is pretty complicated. IIUC, you are exporting overlayfs via knfsd and then using the NFS client's flock emulation to map flock locks to POSIX ones. I think you probably want to simplify this reproducer a bit. Is it possible to reproduce this on a setup that doesn't have overlayfs involved, just to rule it in or out as a factor here? There are also a number of tracepoints in the posix locking code. It might be interesting to turn on the ones for posix_lock_inode and locks_remove_posix and and then run the reproducer to get a better idea of what's happening to those locks. Cheers, -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html