Re: XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)

"Josef 'Jeff' Sipek" <jeffpc@xxxxxxxxxxxxxx> · Thu, 22 Aug 2013 11:07:17 -0400

On Thu, Aug 22, 2013 at 12:25:44PM +1000, Dave Chinner wrote:
> On Wed, Aug 21, 2013 at 11:24:58AM -0400, Josef 'Jeff' Sipek wrote:
> > We've started experimenting with larger directory block sizes to avoid
> > directory fragmentation.  Everything seems to work fine, except that the log
> > is spammed with these lovely debug messages:
> > 
> > 	XFS: possible memory allocation deadlock in kmem_alloc (mode:0x250)
> > 
> > From looking at the code, it looks like that each of those messages (there
> > are thousands) equates to 100 trips through the loop.  My guess is that the
> > larger blocks require multi-page allocations which are harder to satisfy.
> > This is with 3.10 kernel.
> 
> No, larger blocks simply require more single pages. The buffer cache
> does not require multi-page allocation at all. So, mode = 0x250,
> which means ___GFP_NOWARN | ___GFP_IO | ___GFP_WAIT which is also
> known as a GFP_NOFS allocation context.

Doh!  Not sure why I didn't remember the fact that directories are no
different from regular files...

...
> > /proc/slabinfo: https://www.copy.com/s/1x1yZFjYO2EI/slab.txt
> 
> Hmmm. You're using filestreams. That's unusual.

Right.  I keep forgetting about that.

> > sysrq m output: https://www.copy.com/s/mYfMYfJJl2EB/sysrq-m.txt
> 
> 27764401 total pagecache pages
> 
> which indicates that you've got close to 110GB of pages in the page
> cache. Hmmm, and 24-25GB of dirty pages in memory.
> 
> You know, I'd be suspecting a memory reclaim problem here to do with
> having large amounts of dirty memory in the page cache. I don't
> think the underlying cause is going to be the filesystem code, as
> the warning should never be emitted if memory reclaim is making
> progress. Perhaps you could try lowering all the dirty memory
> thresholds to see if that allows memory reclaim to make more
> progress because there are fewer dirty pages in memory...

Yep.  This makes perfect sense.  Amusingly enough, we don't read much of the
data so really the pagecache is supposed to buffer the writes because I/O is
slow.  We'll play with the dirty memory thresholds and see if that helps.

Thanks!

Jeff.

-- 
All parts should go together without forcing.  You must remember that the
parts you are reassembling were disassembled by you.  Therefore, if you
can’t get them together again, there must be a reason.  By all means, do not
use a hammer.
		— IBM Manual, 1925

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs