On Wed, 2012-08-08 at 15:45 -0400, Fred Isaman wrote: > On Wed, Aug 8, 2012 at 2:33 PM, Myklebust, Trond > <Trond.Myklebust@xxxxxxxxxx> wrote: > > On Wed, 2012-08-08 at 14:15 -0400, Fred Isaman wrote: > >> On Wed, Aug 8, 2012 at 2:03 PM, Myklebust, Trond > >> <Trond.Myklebust@xxxxxxxxxx> wrote: > >> > On Wed, 2012-08-08 at 13:51 -0400, Fred Isaman wrote: > >> >> On Wed, Aug 8, 2012 at 1:34 PM, Myklebust, Trond > >> >> <Trond.Myklebust@xxxxxxxxxx> wrote: > >> >> > On Wed, 2012-08-08 at 18:48 +0200, Tigran Mkrtchyan wrote: > >> >> >> Hi, > >> >> >> > >> >> >> It's quite some time without kernel panic reports from me .... > >> >> >> > >> >> >> Observer on MDS and DS shutdown during IO. > >> >> >> > >> >> >> This is with 3.5.0-2.fc17.x86_64 kernel. Line in code: > >> >> >> > >> >> >> nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); > >> >> >> > >> >> > > >> >> > If the server doesn't return a stateid, then that is supposed to > >> >> > indicate that it thinks that it doesn't hold any more layout segments > >> >> > for this file. > >> >> > To me, that indicates that we should be calling > >> >> > mark_matching_lsegs_invalid() rather than Oopsing. > >> >> > > >> >> > Any dissenting voices from the pNFS crowd? > >> >> > > >> >> > >> >> But this implies that the client thinks it has a layout which the > >> >> server does not believe it has, which seems to me to imply an earlier > >> >> bug. If you change to mark_matching_lsegs_invalid, I would suggest > >> >> keeping a WARN_ON. > >> > > >> > We could possibly add a printk, but I don't see what value a WARN_ON > >> > would have here: how is a stack dump going to be useful in debugging > >> > this issue? > >> > > >> > Also, don't we sometimes expect this sort of thing to happen on > >> > occasion? What if our layoutreturn ends up racing with the layout recall > >> > following a DS shutdown? > >> > > >> > >> Actually, I forgot about the whole LAYUTRETURN as fencing possibility. > >> In that case, you can pretty easily hit the BUG_ON. Though I claim > >> that, while calling mark_matching_lsegs_invalid doesn't hurt, it > >> should be unnecessary. > > > > Right... So maybe just a dprintk() for debugging purposes? > > > > BTW: Why shouldn't we do the mark_matching_lsegs_invalid? If not, then > > we will need either to do an extra layoutreturn or fail a read/write > > attempt to the DS in order to figure out that the stateid is now > > invalid. > > > > > They should have already been marked as invalid, and are just waiting > on io to finish for release. So fallback to MDS? Why are we issuing a layoutreturn in that case? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥