Re: What's the NFS OOM problem?

Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> · Fri, 11 Aug 2006 10:48:30 +0200

On Fri, 2006-08-11 at 10:33 +1000, Neil Brown wrote:
> On Thursday August 10, w@xxxxxx wrote:
> > 
> > > Can someone help me and give me a brief description on OOM issue?
> > 
> > I don't know about any OOM issue related to NFS. At most it might happen
> > on the client (eg: stating firefox from an NFS root) which might not have
> > enough memory for new network buffers, but I don't even know if it's
> > possible at all.
> 
> We've had reports of OOM problems with NFS at SuSE.
> The common factors seem to be lots of memory (6G+) and very large
> files. 
> Tuning down  /proc/sys/vm/dirty_*ratio seems to avoid the problem,
> but I'm not very close to understanding what the real problem is.

Would it not be related to mmap'ed files, where the client will not
properly
track the dirty pages? This will make the reclaim code go crap itself
because
suddenly not a single page is easily freeable anymore, all pages are
then
found to be dirty and require writeback, which takes more memory - ie.
allocate
network packets, and wait for proper answer.

Andrew is currently carrying some patches that will avoid this problem
by
virtue of tracking dirtying of mmap'ed pages. With these patches
nr_dirty is
properly incremented and the pdflush logic should kick in and do its
thing.

This would explain why lowering dirty_*ratio would sometimes help, that
would
kick off the pdflush thread earlier, which would then detect the
previously
unknown dirty pages.

-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html