On Thu, 2013-07-11 at 10:28 -0400, Jeff Layton wrote: > On Thu, 11 Jul 2013 14:19:10 +0000 > "Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> wrote: > > > On Thu, 2013-07-11 at 07:13 -0400, Jeff Layton wrote: > > > On Thu, 13 Jun 2013 13:47:37 -0500 > > > Malahal Naineni <malahal@xxxxxxxxxx> wrote: > > > > > > > Hi Trond, > > > > > > > > I saw Bryan's patches here https://patchwork.kernel.org/patch/987402/ > > > > that fix issues after loss of a lock. What is the status on this patch > > > > set? Do they need more work? We have an application that uses range > > > > locks on a file. Two threads from two different clients end up writing > > > > to the same a file due to this bug after a lease expiry from a client. > > > > > > > > Regards, Malahal. > > > > > > (cc'ing Bryan since he did the original set) > > > > > > Yeah, this set would be a nice thing to have. A couple of comments: > > > > > > - I still think it would be best to make SIGLOST its own signal, but as > > > Bryan points out, it would need to be larger than SIGRTMAX. I'm > > > not sure that's possible on all arches with the way the RT signals > > > were done. It's probably worth investigating that though before > > > settling on SIGIO since it would be hard to change that retroactively. > > > > > > - This is not really a v4.1 specific thing. It should also be done for > > > v4.0 and v2/3, though the latter two really need to be done within > > > lockd. > > > > SIGLOST is not part of any standard. It is a hack that has been adopted > > by IBM and Solaris. > > > > The POSIXly correct way to do this is to use EBADF to warn the > > application that the file descriptor is no longer valid (in the sense > > that the server is no longer honouring the lock) and EIO in order to > > warn it that data may have been lost. > > > > It is a hack...I won't argue there > > I'm not sure that returning errors is really the best approach though. > In some cases, the fd may be fine. It may only be the lock that has > been lost. > > With a signal, the program has more of a choice as to whether it cares > about lost locks and is more immediate when the problem occurs. An > error code seems like it might cause a lot of grief for programs that > aren't expecting that sort of behavior. EBADF is a error that has an obvious meaning in POSIX: you need to reopen the file and re-establish any locks. How is that not better than receiving a signal they won't be expecting? Consider that we'd have to overload SIGIO, which has a completely different meaning in POSIX... -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥