Re: client kernel panic on server restart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2012-08-08 at 15:45 -0400, Fred Isaman wrote:
> On Wed, Aug 8, 2012 at 2:33 PM, Myklebust, Trond
> <Trond.Myklebust@xxxxxxxxxx> wrote:
> > On Wed, 2012-08-08 at 14:15 -0400, Fred Isaman wrote:
> >> On Wed, Aug 8, 2012 at 2:03 PM, Myklebust, Trond
> >> <Trond.Myklebust@xxxxxxxxxx> wrote:
> >> > On Wed, 2012-08-08 at 13:51 -0400, Fred Isaman wrote:
> >> >> On Wed, Aug 8, 2012 at 1:34 PM, Myklebust, Trond
> >> >> <Trond.Myklebust@xxxxxxxxxx> wrote:
> >> >> > On Wed, 2012-08-08 at 18:48 +0200, Tigran Mkrtchyan wrote:
> >> >> >> Hi,
> >> >> >>
> >> >> >> It's quite some time without kernel panic reports from me ....
> >> >> >>
> >> >> >> Observer on MDS and DS shutdown during IO.
> >> >> >>
> >> >> >> This is with  3.5.0-2.fc17.x86_64 kernel. Line in code:
> >> >> >>
> >> >> >> nfs4proc.c:6252 :   BUG_ON(!list_empty(&lo->plh_segs));
> >> >> >>
> >> >> >
> >> >> > If the server doesn't return a stateid, then that is supposed to
> >> >> > indicate that it thinks that it doesn't hold any more layout segments
> >> >> > for this file.
> >> >> > To me, that indicates that we should be calling
> >> >> > mark_matching_lsegs_invalid() rather than Oopsing.
> >> >> >
> >> >> > Any dissenting voices from the pNFS crowd?
> >> >> >
> >> >>
> >> >> But this implies that the client thinks it has a layout which the
> >> >> server does not believe it has, which seems to me to imply an earlier
> >> >> bug.  If you change to mark_matching_lsegs_invalid, I would suggest
> >> >> keeping a WARN_ON.
> >> >
> >> > We could possibly add a printk, but I don't see what value a WARN_ON
> >> > would have here: how is a stack dump going to be useful in debugging
> >> > this issue?
> >> >
> >> > Also, don't we sometimes expect this sort of thing to happen on
> >> > occasion? What if our layoutreturn ends up racing with the layout recall
> >> > following a DS shutdown?
> >> >
> >>
> >> Actually, I forgot about the whole LAYUTRETURN as fencing possibility.
> >>  In that case, you can pretty easily hit the BUG_ON.  Though I claim
> >> that, while calling mark_matching_lsegs_invalid doesn't hurt, it
> >> should be unnecessary.
> >
> > Right... So maybe just a dprintk() for debugging purposes?
> >
> > BTW: Why shouldn't we do the mark_matching_lsegs_invalid? If not, then
> > we will need either to do an extra layoutreturn or fail a read/write
> > attempt to the DS in order to figure out that the stateid is now
> > invalid.
> >
> 
> 
> They should have already been marked as invalid, and are just waiting
> on io to finish for release.

So fallback to MDS? Why are we issuing a layoutreturn in that case?

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com

��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux