On Tue, Aug 26, 2014 at 2:19 PM, Boaz Harrosh <boaz@xxxxxxxxxxxxx> wrote: > On 08/26/2014 08:54 PM, Trond Myklebust wrote: >> On Tue, Aug 26, 2014 at 1:06 PM, Boaz Harrosh <boaz@xxxxxxxxxxxxx> wrote: > >> >> The deadlock occurs _if_ the above layout commit is unable to get a >> slot. You can't guarantee that it will, because the slot table is a >> finite resource and it can be exhausted > > Yes all I ever seen is 1 slot in any of the clients/servers I've > seen so I assume 1 slot ever > >> if you allow fore channel >> calls to trigger synchronous recalls on the back channel > > Beep! but this is exactly what I'm trying to say. The STD specifically > forbids that. The server is not allowed to wait here, it must return > imitatively, with an error that frees the slot and then later issue the > RECALL. > > This is what I said exactly three times in my mail, and what I have > depicted in my flow: > Server async operation (mandated by the STD) > Client back-channel can be sync with for channel (Not mentioned by the STD) > >> that again trigger synchronous calls on the fore channel. > > >> You're basically saying >> that the client needs to guarantee that it can allocate 2 slots before >> it is allowed to send a layoutget just in case the server needs to >> recall a layout. >> > > No I am not saying that, please count. Since the Server is not allowed > sync operation then one slot is enough and the client can do sync lo_commit > while in recall. > >> If, OTOH, the layoutcommit is asynchronous, then there is no >> serialisation and the back channel thread can happily reply to the >> layout recall even if there are no free slots in the fore channel. >> > > Sure that will work as well, but not optimally, and for no good reason. > > Please go back to my flow with the 3 cases. See how the server never waits > for anything and will always imitatively reply to the layout_get. > Since the server is not allowed a sync operation and is mandated by the > RFC text to not wait, then the client is allowed and can do sync operations > because it is enough that only one do async. > > BTW: If what you are saying is true than there is a bug in the slot code > because this patch does work, and everything flows past this situation. > I have a reproducer test that fails 100% of the time without this patch > and only fails much later at some other place, but not at this deadlock, > with this patch applied. > > Cheers > Boaz > Whether or not your particular server allows it or not is irrelevant. We're not coding the client to a particular implementation. None of the other callbacks do synchronous RPC calls, and that's very intentional. -- Trond Myklebust Linux NFS client maintainer, PrimaryData trond.myklebust@xxxxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html