Re: Leaked POSIX lock warning and crash

Amir Goldstein <amir73il@xxxxxxxxx> · Mon, 9 Jul 2018 15:17:50 +0300

On Mon, Jul 9, 2018 at 3:16 PM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> On Mon, Jul 9, 2018 at 2:52 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>> On Mon, 2018-07-09 at 14:22 +0300, Amir Goldstein wrote:
>>> On Mon, Jul 9, 2018 at 2:10 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote:
>>> > On Mon, 2018-07-09 at 16:15 +0800, Eddie Horng wrote:
>>> > > 2018-07-09 14:30 GMT+08:00 Amir Goldstein <amir73il@xxxxxxxxx>:
>>> > > > I have no clue.
>>> > > > Is the leaked lock and crash on the client or the server?
>>> > > > If you can get an strace from the process that gets the Leaked message
>>> > > > maybe it will give us a clue to the sort of file descriptor of the leaked
>>> > > > file and how it was opened.
>>> > > > Alternatively print the inode numbers and file types of flock calls to see
>>> > > > where we have a mismatch.
>>> > > >
>>> > > > Thanks,
>>> > > > Amir.
>>> > >
>>> > > Both the leaked lock and crash are on the server.
>>> > >
>>> > > I can emulate one of the lock failure case with a reproducer run along with
>>> > > android building. The reproducer's behavior and result are very similar with
>>> > > out/.lock generated by android build to control only one build process can
>>> > > run on at the same time. In the first time (out/.lock is not exist),
>>> > > flock works but a
>>> > > "Leaked ..." message is supposed caused by it. After a round of build
>>> > > completed, do a second build, the out/.lock is now failed to be locked.
>>> > > The reproducer open and flock another file under out/ can reproduce the case.
>>> > > Can this scenario help us to debug?
>>> > >
>>> > > process 1:                                                          process 2:
>>> > > $ ~/flock/a.out /mnt/n/out/mylock
>>> > > flock succeed, press any key to continue...
>>> > >
>>> > >     $ cd /mnt/n && make -j12  # (build android)
>>> > > close succeed
>>> > > $ ~/flock/a.out /mnt/n/out/mylock
>>> > > failed to lock file '/mnt/n/out/mylock': Resource temporarily unavailable
>>> > > close succeed
>>> > >
>>> > > reproducer:
>>> > > #include <stdio.h>
>>> > > #include <sys/types.h>
>>> > > #include <sys/stat.h>
>>> > > #include <fcntl.h>
>>> > > #include <unistd.h>
>>> > > #include <sys/file.h>
>>> > > #include <errno.h>
>>> > > #include <string.h>
>>> > >
>>> > > int main(int argc, void **argv) {
>>> > > char *filename=argv[1];
>>> > > int fd = open(filename, O_RDWR|O_CREAT, 0666);
>>> > > int flock_result = flock(fd, LOCK_EX | LOCK_NB);
>>> > > int err;
>>> > > if (flock_result != 0) {
>>> > >       printf("failed to lock file '%s': %s\n", filename, strerror(errno));
>>> > >       goto out;
>>> > >     }
>>> > >     printf("flock succeed, press any key to continue...\n");
>>> > >     getchar();
>>> > >
>>> > > out:
>>> > >     err = close(fd);
>>> > >     if (err == 0)
>>> > >     printf("close succeed\n");
>>> > >     else
>>> > >     printf("failed to close %d: %s\n", fd, strerror(errno));
>>> > > }
>>> > >
>>> >
>>> > This setup is pretty complicated. IIUC, you are exporting overlayfs via
>>> > knfsd and then using the NFS client's flock emulation to map flock locks
>>> > to POSIX ones. I think you probably want to simplify this reproducer a
>>> > bit.
>>> >
>>> > Is it possible to reproduce this on a setup that doesn't have overlayfs
>>> > involved, just to rule it in or out as a factor here?
>>> >
>>> > There are also a number of tracepoints in the posix locking code. It
>>> > might be interesting to turn on the ones for posix_lock_inode and
>>> > locks_remove_posix and and then run the reproducer to get a better idea
>>> > of what's happening to those locks.
>>> >
>>>
>>> Thanks for the suggestions Jeff.
>>>
>>> Eddie,
>>>
>>> This is NFS v4. Right?
>>
>> I think he said v3, which means NLM (lockd).
>>
>>> Do you wait until Android build completes before closing the
>>> first reproducer fd?
>>>
>>> I suspect you can replace the effect of Android build with
>>> drop_caches on the server.
>>>
>>> Jeff,
>>>
>>> Does knfsd hold a reference on the file/dentry/inode when
>>> a lock is taken?
>>>
>>
>> Given that he's using v3, it would actually be lockd in this case, but
>> yes, it should hold a struct file reference by virtue of a nlm_file
>> reference.
>>
>>> Assuming this is indeed a bug reproduced only with NFS+overlayfs
>>> it sounds like overlay decode file handle fails to return the same
>>> inode that knfd holds with the lock.
>>
>> That's a possibility.
>
> Oh oh!
>
> Those should be s/file_inode/locks_inode:
>
> static inline struct inode *nlmsvc_file_inode(struct nlm_file *file)
> {
>         return file_inode(file->f_file);
> }
>
> static inline int nlm_compare_locks(const struct file_lock *fl1,
>                                     const struct file_lock *fl2)
> {
>         return file_inode(fl1->fl_file) == file_inode(fl2->fl_file)
> ...
>
> Probably all the instances under fs/lockd/ as well...
> and probably some in fs/nfsd
>
> Messy.
>
> I have a suggestion.
>
> Eddie,
>
> Do you mind trying out the code in Miklos' overlayfs-next branch

Sorry wrong link:

git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs.git

>
> This code gets rid of the file_inode/locks_inode divergion
> so its a much healthier fix for the problem going forward.
>
> Jeff,
>
> What do you suggest that do with stable >= v4.16 kernels with
> overlayfs NFS export. Do we have an easy way to disable
> knfsd/lockd from using locks on exported overlayfs?
> Maybe a sanity check (file_inode(file) == locks_inode(file))
> to avoid the leaks/crashes.
> Where would be the most strategic point(s) to add such a check?
>
> Thanks,
> Amir.
--
To unsubscribe from this list: send the line "unsubscribe linux-unionfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html