Thanks for all of the resources! I was trying to implement an NFS server, and v3 sounded like an easier place to start :-) I think I'll move on to v4. If we're revisiting the past, maybe just one last historical question: Do either of you know why the Linux Kernel only uses the IP address/svid to identify the caller? FreeBSD uses the owner field as well. Jan On Sun, Aug 7, 2022 at 8:01 AM Tom Talpey <tom@xxxxxxxxxx> wrote: > > On 8/6/2022 3:49 PM, Trond Myklebust wrote: > > On Sat, 2022-08-06 at 11:03 -0400, Jan Kasiak wrote: > >> Hi Trond, > >> > >> The v4 RFCs do mention protocol design flaws, but don't go into more > >> detail. > >> > >> I was trying to understand those flaws in order to understand how and > >> why v3 was problematic. > >> > >> > > > > The main issues derive from the fact that NLM is a side band protocol, > > meaning that it has no ability to influence the NFS protocol > > operations. In particular, there is no way to ensure safe ordering of > > locks and I/O. e.g. if your readahead code kicks in while you are > > unlocking the file, then there is nothing that guarantees the page > > reads happened while the lock was in place on the server. > > The same weakness also causes problems for reboots: if your client > > doesn't notice that the server rebooted (and lost your locks) because > > the statd callback mechanism failed, then you're SOL. Your I/O may > > succeed, but can end up causing problems for another client that has > > since grabbed the lock and assumes it now has exclusive access to the > > file. > > > > NLM also suffers from intrinsic problems of its own such as lack of > > only-once semantics. If you send a blocking LOCK request, and > > subsequently send a CANCEL operation, then who knows whether or not the > > lock or the cancel get processed first by the server? Many servers will > > reply LCK_GRANTED to the CANCEL even if they did not find the lock > > request. Sending an UNLOCK can also cause issues if the lock was > > granted via a blocking lock callback (NLM_GRANTED) since there is no > > ordering between the reply to the NLM_GRANTED and the UNLOCK. > > > > Finally, as already mentioned, there are multiple issues associated > > with client or server reboot. The NLM mechanism is pretty dependent on > > yet another side band mechanism (STATD) to tell you when this occurs, > > but that mechanism does not work to release the locks held by a client > > if it fails to come back after reboot. Even if the client does come > > back, it might forget to invoke the statd process, or it might use a > > different identifier than it did during the last boot instance (e.g. > > because DHCP allocated a different IP address, or the IP address it not > > unique due to use of NAT, or a hostname was used that is non-unique, > > ...). > > If the server reboots, then it may fail to notify the client of that > > reboot through the callback mechanism. Reasons may include the > > existence of a NAT, failure of the rpcbind/portmapper process on the > > client, firewalls,... > > That brought back memories. > > http://www.nfsv4bat.org/Documents/ConnectAThon/2006/talpey-cthon06-nsm.pdf > > Here's an even older issues list for nlm on Solaris circa 1996. > The portrait-mode slides are in reverse order. :) > > http://www.nfsv4bat.org/Documents/ConnectAThon/1996/lockmgr.pdf > > The NLM protocol is an antique and hasn't been looked at in well > over a decade (or two!). NLMv4 (circa 1995) widened offsets to > 64-bit, which was the last innovation it got. None of the RPC > sideband protocols were ever standardized, btw. > > Jan, what are you planning to use it for? Personally I'd advise > against pretty much anything. > > Tom. > > > > >> -Jan > >> > >> > >> On Fri, Aug 5, 2022 at 10:27 PM Trond Myklebust > >> <trondmy@xxxxxxxxxxxxxxx> wrote: > >>> > >>> On Fri, 2022-08-05 at 19:17 -0400, Jan Kasiak wrote: > >>>> Hi, > >>>> > >>>> I was looking at the code for nlmclnt_lock and wanted to ask a > >>>> question about how the Linux kernel client and the NLM 4 protocol > >>>> handle some errors around certain edge cases. > >>>> > >>>> Specifically, I think there is a race condition around two > >>>> threads of > >>>> the same program acquiring a lock, one of the threads being > >>>> interrupted, and the NFS client sending an unlock when none of > >>>> the > >>>> program threads called unlock. > >>>> > >>>> On NFS server machine S: > >>>> there exists an unlocked file F > >>>> > >>>> On NFS client machine C: > >>>> in program P: > >>>> thread 1 tries to lock(F) with fd A > >>>> thread 2 tries to lock(F) with fd B > >>>> > >>>> The Linux client will issue two NLM_LOCK calls with the same svid > >>>> and > >>>> same range, because it uses the program id to map to an svid. > >>>> > >>>> For whatever reason, assume the connection is broken (cable gets > >>>> pulled etc...) > >>>> and `status = nlmclnt_call(cred, req, NLMPROC_LOCK);` fails. > >>>> > >>>> The Linux client will retry the request, but at some point thread > >>>> 1 > >>>> receives a signal and nlmclnt_lock breaks out of its loop. > >>>> Because > >>>> the > >>>> Linux client request failed, it will fall through and go to the > >>>> out_unlock label, where it will want to send an unlock request. > >>>> > >>>> Assume that at some point the connection is reestablished. > >>>> > >>>> The Linux kernel client now has two outstanding lock requests to > >>>> send > >>>> to the remote server: one for a lock that thread 2 is still > >>>> trying to > >>>> acquire, and one for an unlock of thread 1 that failed and was > >>>> interrupted. > >>>> > >>>> I'm worried that the Linux client may first send the lock > >>>> request, > >>>> and > >>>> tell thread 2 that it acquired the lock, and then send an unlock > >>>> request from the cancelled thread 1 request. > >>>> > >>>> The server will successfully process both requests, because the > >>>> svid > >>>> is the same for both, and the true server side state will be that > >>>> the > >>>> file is unlocked. > >>>> > >>>> One can talk about the wisdom of using multiple threads to > >>>> acquire > >>>> the > >>>> same file lock, but this behavior is weird, because none of the > >>>> threads called unlock. > >>>> > >>>> I have experimented with reproducing this, but have not been > >>>> successful in triggering this ordering of events. > >>>> > >>>> I've also looked at the code of in clntproc.c and I don't see a > >>>> spot > >>>> where outstanding failed lock/unlock requests are checked while > >>>> processing lock requests? > >>>> > >>>> Thanks, > >>>> -Jan > >>> > >>> Nobody here is likely to want to waste much time trying to 'fix' > >>> the > >>> NLM locking protocol. The protocol itself is known to be extremely > >>> fragile, and the endemic problems constitute some of the main > >>> motivations for the development of the NFSv4 protocol > >>> (See https://datatracker.ietf.org/doc/html/rfc2624#section-8 > >>> and https://datatracker.ietf.org/doc/html/rfc7530#section-9). > >>> > >>> If you need more reliable support for POSIX locks beyond what > >>> exists > >>> today for NLM, then please consider NFSv4. > >>> > >>> -- > >>> Trond Myklebust > >>> Linux NFS client maintainer, Hammerspace > >>> trond.myklebust@xxxxxxxxxxxxxxx > >>> > >>> > >