On Tue, Oct 13, 2015 at 12:24 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: > On Tue, Oct 13, 2015 at 10:13 AM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >> On Tue, Oct 13, 2015 at 9:27 AM, Trond Myklebust >> <trond.myklebust@xxxxxxxxxxxxxxx> wrote: >>> On Tue, Oct 13, 2015 at 8:26 AM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>>> >>>> On Mon, Oct 12, 2015 at 11:47 PM, Trond Myklebust >>>> <trond.myklebust@xxxxxxxxxxxxxxx> wrote: >>>> > On Mon, Oct 12, 2015 at 5:55 PM, Olga Kornievskaia <aglo@xxxxxxxxx> wrote: >>>> >> It'll be nice to know when we return delegations synchronously or not. >>>> > >>>> > Why? This patch forces us to carry an otherwise completely unnecessary >>>> > parameter, so at the very minimum we should have a discussion of what >>>> > the real use cases are. >>>> >>>> I used it to diagnose the race of open and delegreturn. If it's kept >>> >>> How were you using it? >> >> I added two more traces points in the beginning of delegreturn and in >> nfs4_do_open before sending the rpc. I can see that a given file >> handle: >> -- delegreturn prepare tracepoint is happening, >> -- then the tracepoint of before sending the open is logged, >> -- then delegreturn prepare is logged again, >> -- then tracepoint for nfs4_open_file which is after receiving reply >> to the open from the server >> -- then delegreturn_exit tracepoint >> >> kworker/1:0H-14168 [001] .... 576.571636: >> nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9 >> issync=0 >> >> hammer-13955 [000] .... 576.942632: nfs4_open_file_begin: >> flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:0 >> fhandle=0x00000000 name=00:2a:904/000002CB.ham >> >> hammer-13955 [001] .... 577.043084: nfs4_open_file: >> error=0 (OK) flags=32768 (0x8000) fmode=READ|0x801c fileid=00:2a:7708 >> fhandle=0x84792ca9 name=00:2a:904/000002CB.ham >> >> kworker/0:1H-431 [000] .... 577.064013: >> nfs4_delegreturn_prepare: error=0 (OK) dev=00:2a fhandle=0x84792ca9 >> issync=0 >> >> kworker/0:1H-431 [000] .... 577.101076: nfs4_delegreturn_exit: >> error=0 (OK) dev=00:2a fhandle=0x84792ca9 >> >> kworker/0:1H-431 [000] .... 577.113021: nfs4_read: >> error=-10025 (BAD_STATEID) fileid=00:2a:7708 fhandle=0x84792ca9 >> offset=0 count=64 >> >> >>> >>>> that some delegreturns are synchronous and others are not I think the >>>> information is useful. >>> >>> The only difference between synchronous and asynchronous in this case >>> is whether or not the process that launched the delegreturn actually >>> waits for it to complete; a signal could easily prevent it from doing >>> so without interrupting the delegreturn call itself. >>> IOW: for complete information when debugging races here, you really >>> need to examine the return value from the wait call. >>> >>>> Speaking of there is a race between state manager thread returning >>>> used delegations and new open. Previously I thought it was evict >>>> inode... >>> >>> Is this with commit 5e99b532bb95 ("nfs4: reset states to use >>> open_stateid when returning delegation voluntarily") applied? >> >> No I have not. I will try that. Thanks. > > This patch does not help. The race is still present. OK. So what are the symptoms? I'm having trouble seeing how a race can happen, given a correctly coded server. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html