Re: [PATCH] NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 26 Jul 2013 10:19:16 -0400 "J.Bruce Fields" <bfields@xxxxxxxxxxxxxx>
wrote:

> On Fri, Jul 26, 2013 at 06:33:03AM +1000, NeilBrown wrote:
> > On Thu, 25 Jul 2013 16:18:05 -0400 "J.Bruce Fields" <bfields@xxxxxxxxxxxxxx>
> > wrote:
> > 
> > > On Thu, Jul 25, 2013 at 11:30:23AM +1000, NeilBrown wrote:
> > > > 
> > > > Since we enabled auto-tuning for sunrpc TCP connections we do not
> > > > guarantee that there is enough write-space on each connection to
> > > > queue a reply.
> ...
> > > This is great, thanks!
> > > 
> > > Inclined to queue it up for 3.11 and stable....
> > 
> > I'd agree for 3.11.
> > It feels a bit border-line for stable.  "dead-lock" and "has been seen in the
> > wild" are technically enough justification...
> > I'd probably mark it as "pleas don't apply to -stable until 3.11 is released"
> > or something like that, just for a bit of breathing space.
> > Your call though.
> 
> 
> So my takeaway from http://lwn.net/Articles/559113/ was that Linus and
> Greg were requesting that:
> 
> 	- criteria for -stable and late -rc's should really be about the
> 	  same, and
> 	- people should follow Documentation/stable-kernel-rules.txt.
> 
> So as an exercise to remind me what those rules are:
> 
> Easy questions:
> 
> 	- "no bigger than 100 lines, with context."  Check.
> 	- "It must fix only one thing."  Check.
> 	- "real bug that bothers people".  Check.
> 	- "tested": yep.  It doesn't actually say "tested on stable
> 	  trees", and I recall this did land you with a tricky bug one
> 	  time when a prerequisite was omitted from the backport.
> 
> Judgement calls:
> 
> 	- "obviously correct": it's short, but admittedly subtle, and
> 	  performance regressions can take a while to get sorted out.
> 	- "It must fix a problem that causes a build error (but not for
> 	  things marked CONFIG_BROKEN), an oops, a hang, data
> 	  corruption, a real security issue, or some "oh, that's not
> 	  good" issue.  In short, something critical."  We could argue
> 	  that "server stops responding" is critical, though not to the
> 	  same degree as a panic.
> 	- OR: alternatively: "Serious issues as reported by a user of a
> 	  distribution kernel may also be considered if they fix a
> 	  notable performance or interactivity issue." The only bz I've
> 	  personally seen was the result of artificial testing of some
> 	  kind, and it sounds like your case involved a disk failure?
> 
> --b.

Looks like  good analysis ... except that it doesn't seem conclusive.  Being
conclusive would make it really good. :-)

The case that brought it to my attention doesn't require the fix.
A file system was mis-behaving (blocking when it should return EJUKEBOX) and
this resulted in nfsd behaviour different than my expectation.
I expected nfsd to keep accepting requests until all threads were blocks.
However only 4 requests were accepted (which is actually better behaviour,
but not what I expected).
So I looked into it and thought that what I found wasn't really right.  Which
turned out to be the case, but not the way I thought...

So my direct experience doesn't argue for the patch going to -stable at all.
If the only other reports are from artificial testing then I'd leave it out
of -stable.  I don't feel -rc4 (that's next I think) is too late for it
though.

NeilBrown

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux