> You want to reduce the retransmission timeout on NLM because you receive > more than 1 email per retransmission timeout? I can't see how the two > are related. To explain it, let's see two examples. In those following examples, I used the following syntax: lro: is a Lock Reclaim OK (the server didn't sent NLM_BLOCKED) lre: is a Lock RElease (the client don't want the lock anymore) lrb: is a Lock Reclaim not OK (the server sent NLM_BLOCKED and the client will retry x seconds later) Example 1 2 e-mail servers, 1 NAS, 1 e-mail every 10 seconds on each servers, 2 seconds to store an e-mail retransmission timeout: 30 seconds (default) time 000 001 002 010 012 020 022 030 031 032 server1 lro lre lro lre lro lre lro lre server2 lrn lrn Report: Between t=0 and t=32, e-mails processed by server1: 4 e-mails processed by server1: 0 t=32, e-mails in server1's local queue: 0 e-mails in server2's local queue: 4 Example 2 2 e-mail servers, 1 NAS, 1 e-mail every 10 seconds on each servers, 2 seconds to store an e-mail retransmission timeout: 3 seconds time 000 001 002 005 007 010 011 012 013 015... server1 lro lre lro lre ... server2 lrn lro lre lrn lro lre... Report: Between t=0 and t=15, e-mails processed by server1: 2 e-mails processed by server1: 2 t=15, e-mails in server1's local queue: 0 e-mails in server2's local queue: 0 Of course, a server never receives EXACTLY 1 e-mail every 10 seconds, but what we can see in a production environment could be summarized with those two examples. > Normally, the server should call your client back using an NLM_GRANTED > call as soon as the lock is available. If that isn't happening, then you > need to look at why not. The retransmission+timeout is supposed to be a > failsafe for when the NLM_GRANTED mechanism fails, not the main method > for grabbing a lock. > > For instance, it may be that the server is unable to call the client > back because you've hidden it behind a firewall or NAT, or perhaps your > netfilter settings on either the client or the server are blocking the > callback. The only reason why I propose this short serie of patchs is that there is one case in which we can not use the NLM_GRANTED mechanism and in which we must always use the retransmission+timeout failsafe: the NFS server is under HPUX. Let's have a look at the comment of the nlmclnt_lock function: 473 /* 474 * LOCK: Try to create a lock 475 * 476 * Programmer Harassment Alert 477 * 478 * When given a blocking lock request in a sync RPC call, the HPUX lockd 479 * will faithfully return LCK_BLOCKED but never cares to notify us when 480 * the lock could be granted. This way, our local process could hang 481 * around forever waiting for the callback. 482 * 483 * Solution A: Implement busy-waiting 484 * Solution B: Use the async version of the call (NLM_LOCK_{MSG,RES}) 485 * 486 * For now I am implementing solution A, because I hate the idea of 487 * re-implementing lockd for a third time in two months. The async 488 * calls shouldn't be too hard to do, however. 489 * 490 * This is one of the lovely things about standards in the NFS area: 491 * they're so soft and squishy you can't really blame HP for doing this. 492 */ Note that I made my tests using a NetApp NAS ;) Indeed, Data ONTAP only sends NLM_GRANTED over UDP (not TCP). So we can reproduce the HPUX behaviour with "nlm_udpport = 0" on the client. Cheers, Mikael -- Mikael Davranche System Engineer Atos Worldline, France -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html