Re: kernel NULL pointer dereference in rpcb_getport_done (2.6.29.4)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 15, 2009 at 02:46:38PM -0700, Simon Kirby wrote:
> On Tue, Aug 11, 2009 at 10:17:45AM -0700, Simon Kirby wrote:
> 
> > On Mon, Aug 10, 2009 at 07:55:36PM -0400, J. Bruce Fields wrote:
> > 
> > > Looking back at that commit--I'm now confused about how it was meant to
> > > work.  In the case where the woken-up thread is waiting inside of
> > > svc_recv(), ->nwaking doesn't get decremented at all until the request
> > > is processed and svc_recv() is called again--effectively limiting the
> > > number of concurrent requests to 5 per pool, so, if I read the code
> > > correctly, likely to cause problems if your workload would benefit from
> > > lots of requests being able to wait on io simultaneously (e.g. if you
> > > have a large working set and more than 5 spindles per pool).
> > 
> > Yes, this box is serving about 50 TB of storage space, so there are more
> > than 5 spindles. :)
> > 
> > I can't believe others aren't all complaining about the same problem, but
> > I guess the loads are different.
> > 
> > > I'm inclined to revert the patch and take another look at Greg's
> > > original problem.
> > 
> > I'm inclined to be totally happy with that! :)
> 
> By the way, the original commit for this,
> 59a252ff8c0f2fa32c896f69d56ae33e641ce7ad, still seems to be in the
> kernel.

Apologies, yes, this fell through the cracks.

> Would you like me to make a patch to remove this code?

That would be helpful.

And I think Greg may be back from his travels, so perhaps he'll be able
to comment on it too.

> I suspect it can't just be pulled as-is because the pool stats were
> added later which now export the "overloads-avoided" through
> /proc/fs/nfsd/pool_stats .  I suspect the header is there to make the
> format dynamic (eg: we could kill overloads-avoided) without breaking
> backwards compatibility...?

A line-based instead of column-based format probably would have made it
more likely parsers would get this right.  Hm, I thought 

	http://oss.sgi.com/projects/pcp/

had some code that parsed this?  But a quick "git grep pool_stats" on
the git repo doesn't show me anything.

We could be conservative and just set the field to zero, but it would be
better if the format were flexible--and changing the number of columns
early on might be one way to ensure people know it will be....

Of course, best would be if Greg could explain what we're
misunderstanding about the overload avoidance--perhaps there's a still
a simple bugfix that would make it work for your situation....

> I see overloads-avoided still rising even with SVC_MAX_WAKING set to
> 127 instead of 5.  It seems to happen a lot when storage gets a little
> stuck, but it is probably increasing the latency of other requests that
> could be served from cache.
> 
> Speaking of latency, I was also looking at blktrace output for the
> underlying devices, and it seems like there are cases where knfsd is
> issuing _read_ requests in a way that leave the queue plugged sometimes,
> causing the unplug timer to have to clear it.  I still need to try to
> track this down (see recent linux-btrace thread).

A quick web search failed to turn up that thread for me.--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux