How many times does an API need to be misused before we throw it out? When a file is closed, locks_remove_posix() is called and this *must* remove all locks. The 'struct file' will be removed, so ->fl_file will become a dangling link. But the 'struct file_lock' will still be on the global lock list and will trigger and oops if anyone reads /proc/locks. However, locks_remove_posix() will call filp->f_op->lock() if it exists, and there are one or two (or more) filesystems which seem to think that it is OK for unlock to fail due to a signal or network error etc. It isn't that long ago since this sort of problem was fixed in cifs and cephfs (well... it isn't that long since the fix was backported to a SLE kernel and I noticed - I don't recall when the fix hit mainline). It was (partly) fixed in NFS a few years back (f30cb757f) - I just tripped over this again for an SLE-LTS kernel. Just this week I've seen the NFS bug and and also a gpfs (out-of-tree GPL code) bug. I just wandered through the current code and found .... orangefs: This won't remove locks if ORANGEFS_OPT_LOCK_LOCK is not set. It won't add them either so maybe that is OK .... except that I think "mount -o remount" can clear that flag. If you clear it while a lock is held - badness results. fuse - if fc->no_lock is zero - seems to assume that sending a request to the daemon - fuse_simple_request() - *will* be sufficient. I wouldn't trust any daemon not to trigger an oops. It also assumes that "locking is restartable" - (ok to return ERESTARTSYS), but 'locks_remove_posix()' isn't. gfs2, oscfs both use dlk_posix_lock which will fail if kmalloc() fails (which is certainly possible if the close is happening because the process was killed by the OOM killer). I'm not sure if there are other failure modes. NFSv2 and NFSv3 use nlmclnt_proc which can also fail on kmalloc failure. NFSv4 can nominally fail if 'ctx->state' is NULL - but that is probably impossible. A WARN() might no go astray there. NFS will allow do_unlk() to fail if a fatal signal is pending, but not in the FL_CLOSE case, so that protects locks_remove_posix() and locks_remove_flock(). Maybe other calls to unlock don't matter ... if there is a fatal signal pending they definitely don't - which makes the speical-casing of FL_CLOSE rather pointless. locks_remove_posix() and locks_remove_flock() aren't the only problem areas. nfsd will call vfs_setlease(.. F_UNLCK..) and assume that the lease is gone. If the filesystem messes up and doesn't remove the lease, then when the lease subsequently gets broken, it will access some nfsd data structure that has been freed. Oops. I don't think we should just fix all those bugs in those filesystems. I think that F_UNLCK should *always* remove the lock/lease. I imaging this happening by *always* calling posix_lock_file() (or similar) in the unlock case - after calling f_op->lock() first if that is appropriate. What do people think? It there on obvious reason that is a non-starter? Thanks, NeilBrown
Attachment:
signature.asc
Description: PGP signature