Re: xfs_iext_realloc_indirect and "XFS: possible memory allocation deadlock"

Brian Foster <bfoster@xxxxxxxxxx> · Mon, 29 Jun 2015 15:02:23 -0400

On Mon, Jun 29, 2015 at 07:59:00PM +0200, Alex Lyakas wrote:
> Hi Brian,
> Thanks for your comments.
> 
> Here is the information you asked for:
> 
> meta-data=/dev/dm-147            isize=256    agcount=67, agsize=268435440
> blks
>         =                       sectsz=512   attr=2
> data     =                       bsize=4096   blocks=17825792000, imaxpct=5
>         =                       sunit=16     swidth=160 blks
> naming   =version 2              bsize=4096   ascii-ci=0
> log      =internal               bsize=4096   blocks=521728, version=2
>         =                       sectsz=512   sunit=16 blks, lazy-count=1
> realtime =none                   extsz=4096   blocks=0, rtextents=0
> 
> Mount options:
> /dev/dm-147 /export/nfsvol xfs rw,sync,noatime,wsync,attr2,discard,inode64,allocsize=64k,logbsize=64k,sunit=128,swidth=1280,noquota
> 0 0
> 
> So yes, we are using "allocsize=64k", which influences the speculative
> allocation logic. I did various experiments, and indeed when I remove this
> "allocsize=64k", fragmentation is much lesser. (Tried also other things,
> like using a single nfsd thread, mounting without "sync" and patching nfsd
> to provide "nicer" IOV to vfs_write, but none of these helped). On the other
> hand, we started using this option "allocsize=64k" to prevent aggressive
> preallocation that we saw XFS doing on large QCOW files (VM images).
> 

What was the problem with regard to preallocation and large VM images?
The preallocation is not permanent and should be cleaned up if the file
is inactive for a period of time (see the other prealloc FAQ entries).

> Still, when doing local IO to a mounted XFS, even with "allocsize=64k", we
> still get very few extents. Still don't know why is this difference between
> local IO and NFS. Would be great to receive a clue for that phenomena.
> 

What exactly is your test in this case? I assume you're also testing
with the same mount options and whatnot. One difference could be that
NFS might involve more open-write-close cycles than a local write test,
which could impact reclaim of preallocation. For example, what happens
if you run something like the following locally?

for i in $(seq 0 2 100); do
	xfs_io -fc "pwrite $((i * 4096)) 4k" /mnt/file
done

This will do the strided writes while opening and closing the file each
time and thus probably more closely matches what might be happening over
NFS. Prealloc is typically trimmed on close, but there is an NFS
specific heuristic that should detect this and let it hang around for
longer in this case. Taking a quick look at that code shows that it is
tied to the existence of delayed allocation blocks at close time,
however. I suppose that might never trigger due to the sync mount
option. What's the reason for using that one?

Brian

> 
> 
> Thanks,
> Alex.
> 
> 
> 
> 
> -----Original Message----- From: Brian Foster
> Sent: 29 June, 2015 1:43 PM
> To: Alex Lyakas
> Cc: Dave Chinner ; Danny Shavit ; Shyam Kaushik ; Yair Hershko ; hch@xxxxxx
> ; xfs@xxxxxxxxxxx
> Subject: Re: xfs_iext_realloc_indirect and "XFS: possible memory allocation
> deadlock"
> 
> On Sat, Jun 27, 2015 at 11:01:30PM +0200, Alex Lyakas wrote:
> >Hi Dave,
> >
> >First, to answer your questions:
> >- this filesystem is accessed only through NFS, so I guess we can say this
> >is a "dedicated" NFS server
> >- yes, both NFSv3 and NFSv4 are used (by different servers), and
> >definitely
> >they may attempt to access the same file
> >
> >I did some further debugging by patching XFS to get some more info on how
> >much memory is requested, and who requests it. I added a kmem_realloc
> >variant that is called only by xfs_iext_realloc_indirect. This new
> >function
> >(kmem_realloc_xfs_iext_realloc_indirect) also prints the inode number,
> >which
> >needs the memory. This is after some fiddling to check that "ifp" is the
> >one
> >embedded in xfs_inode and not the one allocated through kmem_cache (attr
> >fork). After memory allocation fails for the first time, I also call
> >handle_sysrq('m'). Finally, I print after how many retries the allocation
> >succeeded. I also added some info to the normal kmem_alloc function.
> >
> >Results are following:
> >- memory allocation failures happened only on the
> >kmem_realloc_xfs_iext_realloc_indirect path for now
> >- XFS hits memory re-allocation failures when it needs to allocate about
> >35KB. Sometimes allocation succeeds after few retries, but sometimes it
> >takes several thousands of retries.
> >- All allocation failures happened on NFSv3 paths
> >- Three inode numbers were reported as failing memory allocations. After
> >several hours, "find -inum" is still searching for these inodes...this is
> >a
> >huge filesystem... Is there any other quicker (XFS-specific?) way to find
> >the file based on inode number?
> >
> >Please see a typical allocation failure in [1]
> >
> >Any recommendation how to move forward with this issue?
> >
> >Additional observation that I saw in my local system: writing files to XFS
> >locally vs writing the same files via NFS (both 3 and 4), the amount of
> >extents reported by "xfs_bmap" is much higher for the NFS case. For
> >example,
> >creating a new file and writing into it as follows:
> >- write 4KB
> >- skip 4KB (i.e., lseek to 4KB + 4KB)
> >- write 4KB
> >- skip 4KB
> >...
> >Create a file of say 50MB this way.
> >
> >Locally it ends up with very few (1-5) extents. But same exact workload
> >through NFS results in several thousands of extents. The filesystem is
> >mounted as "sync" in both cases. This is currently just an observation,
> >which I will try to debug further. Are you perhaps familiar with such
> >behavior? Can you perhaps try and check whether it also happens in your
> >environment?
> >
> 
> I believe this is usually the work of speculative preallocation[0]. It
> basically should allocate more blocks than necessary for a file
> extending allocation to avoid this kind of fragmentation. The
> xfs_iomap_prealloc_size tracepoint was added to help observe behavior,
> but it looks like that went in 3.10 and you're on 3.8. It was also more
> recently disabled for files smaller than 64k, but again that happened in
> 3.11.
> 
> Anyways, it seems like perhaps it is working normally locally, but for
> some reason not working appropriately over NFS..? It might be a good
> idea to post xfs_info and mount options if you haven't.
> 
> Brian
> 
> [0] http://xfs.org/index.php/XFS_FAQ#Q:_What_is_speculative_preallocation.3F
> 
> >Thanks for your help,
> >Alex.
> >
> >
> >
> >[1]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.497033] XFS: pid=12642
> >kmem_alloc failure inum=35646862875 size=35584 flags=0x4 lflags=0x250
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498407] Pid: 12642, comm:
> >nfsd Tainted: GF          O 3.8.13-030813-generic #201305111843
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498410] Call Trace:
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498450]
> >[<ffffffffa07a832e>] kmem_alloc_xfs_iext_realloc_indirect+0x16e/0x1f0
> >[xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498465]
> >[<ffffffffa07a86b3>] kmem_realloc_xfs_iext_realloc_indirect+0x33/0x80
> >[xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498482]
> >[<ffffffffa07db2a0>] xfs_iext_realloc_indirect+0x40/0x60 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498498]
> >[<ffffffffa07db2ff>] xfs_iext_irec_new+0x3f/0x180 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498516]
> >[<ffffffffa07db5b9>] xfs_iext_add_indirect_multi+0x179/0x2b0 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498532]
> >[<ffffffffa07db9be>] xfs_iext_add+0xce/0x290 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498539]
> >[<ffffffff81076634>] ? wake_up_worker+0x24/0x30
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498553]
> >[<ffffffffa07dbbd2>] xfs_iext_insert+0x52/0x100 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498569]
> >[<ffffffffa07b6ed3>] ? xfs_bmap_add_extent_hole_delay+0xd3/0x6a0 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498583]
> >[<ffffffffa07b6ed3>] xfs_bmap_add_extent_hole_delay+0xd3/0x6a0 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498598]
> >[<ffffffffa07da3fc>] ? xfs_iext_bno_to_ext+0x8c/0x160 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498614]
> >[<ffffffffa07b7719>] xfs_bmapi_reserve_delalloc+0x279/0x2a0 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498628]
> >[<ffffffffa07be742>] xfs_bmapi_delay+0x122/0x270 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498642]
> >[<ffffffffa079d703>] xfs_iomap_write_delay+0x173/0x320 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498656]
> >[<ffffffffa07be42c>] ? xfs_bmapi_read+0xfc/0x2f0 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498660]
> >[<ffffffff8135d8f3>] ? call_rwsem_down_write_failed+0x13/0x20
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498665]
> >[<ffffffff81195bbc>] ? lookup_page_cgroup+0x4c/0x50
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498677]
> >[<ffffffffa078ab40>] __xfs_get_blocks+0x280/0x550 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498688]
> >[<ffffffffa078ae41>] xfs_get_blocks+0x11/0x20 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498695]
> >[<ffffffff811cf77e>] __block_write_begin+0x1ae/0x4e0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498705]
> >[<ffffffffa078ae30>] ? xfs_get_blocks_direct+0x20/0x20 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498708]
> >[<ffffffff81135fff>] ? grab_cache_page_write_begin+0x8f/0xf0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498720]
> >[<ffffffffa078a09f>] xfs_vm_write_begin+0x5f/0xe0 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498725]
> >[<ffffffff8113552a>] generic_perform_write+0xca/0x210
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498729]
> >[<ffffffff811356cd>] generic_file_buffered_write+0x5d/0x90
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498740]
> >[<ffffffffa07952d5>] xfs_file_buffered_aio_write+0x115/0x1c0 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498744]
> >[<ffffffff816159f4>] ? ip_finish_output+0x224/0x3b0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498758]
> >[<ffffffffa079547c>] xfs_file_aio_write+0xfc/0x1b0 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498769]
> >[<ffffffffa0795380>] ? xfs_file_buffered_aio_write+0x1c0/0x1c0 [xfs]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498773]
> >[<ffffffff8119b8c3>] do_sync_readv_writev+0xa3/0xe0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498776]
> >[<ffffffff8119bb8d>] do_readv_writev+0xcd/0x1d0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498780]
> >[<ffffffff810877e0>] ? set_groups+0x40/0x60
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498805]
> >[<ffffffffa033a6b0>] ? nfsd_setuser+0x120/0x2b0 [nfsd]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498808]
> >[<ffffffff8119bccc>] vfs_writev+0x3c/0x50
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498815]
> >[<ffffffffa0333dd2>] nfsd_vfs_write.isra.12+0x92/0x350 [nfsd]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498818]
> >[<ffffffff8119a6cb>] ? dentry_open+0x6b/0xd0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498825]
> >[<ffffffffa0336679>] nfsd_write+0xf9/0x110 [nfsd]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498834]
> >[<ffffffffa0340dd1>] nfsd3_proc_write+0xb1/0x140 [nfsd]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498841]
> >[<ffffffffa032fd62>] nfsd_dispatch+0x102/0x270 [nfsd]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498865]
> >[<ffffffffa0103b48>] svc_process_common+0x328/0x5e0 [sunrpc]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498876]
> >[<ffffffffa0104153>] svc_process+0x103/0x160 [sunrpc]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498882]
> >[<ffffffffa032f72f>] nfsd+0xbf/0x130 [nfsd]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498888]
> >[<ffffffffa032f670>] ? nfsd_destroy+0x80/0x80 [nfsd]
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498892]
> >[<ffffffff8107f050>] kthread+0xc0/0xd0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498895]
> >[<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498900]
> >[<ffffffff816f61ec>] ret_from_fork+0x7c/0xb0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498903]
> >[<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498905] SysRq : Show
> >Memory
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499385] Mem-Info:
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499388] Node 0 DMA
> >per-cpu:
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499391] CPU    0: hi: 0,
> >btch:   1 usd:   0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499392] CPU    1: hi: 0,
> >btch:   1 usd:   0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499394] CPU    2: hi: 0,
> >btch:   1 usd:   0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499395] CPU    3: hi: 0,
> >btch:   1 usd:   0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499396] Node 0 DMA32
> >per-cpu:
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499399] CPU    0: hi:
> >186,
> >btch:  31 usd:   0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499400] CPU    1: hi:
> >186,
> >btch:  31 usd:   0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499402] CPU    2: hi:
> >186,
> >btch:  31 usd:   0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499403] CPU    3: hi:
> >186,
> >btch:  31 usd:   0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499404] Node 0 Normal
> >per-cpu:
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499406] CPU    0: hi:
> >186,
> >btch:  31 usd:  23
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499407] CPU    1: hi:
> >186,
> >btch:  31 usd:  25
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499409] CPU    2: hi:
> >186,
> >btch:  31 usd:   0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499410] CPU    3: hi:
> >186,
> >btch:  31 usd:   7
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415] active_anon:2143
> >inactive_anon:44181 isolated_anon:0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]
> >active_file:913373
> >inactive_file:1464930 isolated_file:0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  unevictable:8449
> >dirty:6742 writeback:115 unstable:0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  free:159418
> >slab_reclaimable:146857 slab_unreclaimable:66681
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  mapped:5561
> >shmem:383 pagetables:2195 bounce:0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  free_cma:0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499418] Node 0 DMA
> >free:15884kB min:84kB low:104kB high:124kB active_anon:0kB
> >inactive_anon:0kB
> >active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
> >isolated(file):0kB present:15628kB managed:15884kB mlocked:0kB dirty:0kB
> >writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
> >slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
> >bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
> >all_unreclaimable?
> >yes
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499423] lowmem_reserve[]:
> >0
> >3512 12080 12080
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499427] Node 0 DMA32
> >free:240040kB min:19624kB low:24528kB high:29436kB active_anon:8kB
> >inactive_anon:15196kB active_file:1503256kB inactive_file:1624328kB
> >unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3596532kB
> >managed:3546428kB mlocked:0kB dirty:14652kB writeback:0kB mapped:988kB
> >shmem:0kB slab_reclaimable:173044kB slab_unreclaimable:15988kB
> >kernel_stack:248kB pagetables:28kB unstable:0kB bounce:0kB free_cma:0kB
> >writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499432] lowmem_reserve[]:
> >0
> >0 8568 8568
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499435] Node 0 Normal
> >free:381748kB min:47872kB low:59840kB high:71808kB active_anon:8564kB
> >inactive_anon:161528kB active_file:2150236kB inactive_file:4235392kB
> >unevictable:33796kB isolated(anon):0kB isolated(file):0kB
> >present:8773632kB
> >managed:8715068kB mlocked:33796kB dirty:12316kB writeback:460kB
> >mapped:21256kB shmem:1532kB slab_reclaimable:414384kB
> >slab_unreclaimable:250736kB kernel_stack:8424kB pagetables:8752kB
> >unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
> >all_unreclaimable? no
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499443] lowmem_reserve[]:
> >0
> >0 0 0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499446] Node 0 DMA: 1*4kB
> >(U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB
> >1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15884kB
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499458] Node 0 DMA32:
> >27954*4kB (UEM) 12312*8kB (UEM) 1713*16kB (UEM) 17*32kB (UMR) 2*64kB (R)
> >2*128kB (R) 1*256kB (R) 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB =
> >240440kB
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499469] Node 0 Normal:
> >31971*4kB (UEM) 31254*8kB (UEM) 2*16kB (EM) 5*32kB (R) 4*64kB (R) 2*128kB
> >(R) 0*256kB 0*512kB 1*1024kB (R) 1*2048kB (R) 0*4096kB = 381692kB
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499480] 2380174 total
> >pagecache pages
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499482] 0 pages in swap
> >cache
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499483] Swap cache stats:
> >add 0, delete 0, find 0/0
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499485] Free swap  =
> >522236kB
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499486] Total swap
> >=522236kB
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 3145712 pages RAM
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 69093 pages
> >reserved
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 3401898 pages
> >shared
> >Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 1067069 pages
> >non-shared
> >...
> >34501    Jun 26 14:20:03 vsa-00000003-vc-1 kernel: [91864.156051] XFS:
> >pid=12642 kmem_alloc success after 13443 retries inum=35646862875
> >size=35584
> >flags=0x4 lflags=0x250
> >
> >-----Original Message----- From: Dave Chinner
> >Sent: 23 June, 2015 10:18 PM
> >To: Alex Lyakas
> >Cc: xfs@xxxxxxxxxxx ; hch@xxxxxx
> >Subject: Re: xfs_iext_realloc_indirect and "XFS: possible memory
> >allocation
> >deadlock"
> >
> >On Tue, Jun 23, 2015 at 10:18:21AM +0200, Alex Lyakas wrote:
> >>Greetings,
> >>
> >>We are hitting an issue with XFS printing messages like “XFS:
> >>possible memory allocation deadlock in kmem_alloc
> >>(mode:0x250)” and stack trace like in [1]. Eventually,
> >>hung-task panic kicks in with stack traces like [2].
> >>
> >>We are running kernel 3.8.13. I see that in
> >>http://oss.sgi.com/archives/xfs/2012-01/msg00341.html a similar
> >>issue has been discussed, but no code changes followed comparing
> >>to what we have in 3.8.13.
> >>
> >>Any suggestion on how to move forward with this problem? For
> >>example, does this memory has to be really allocated with kmalloc
> >>(i.e., physically continuous) or vmalloc can be used?
> >
> >We left it alone because it is relatively rare for people to hit it,
> >and generally it indicates a severe fragmentation problem when they
> >do hit it (i.e. a file with millions of extents in it). Can you
> >track down the file that this is occurring against and see how many
> >extents it has?
> >
> >i.e. you may be much better off by taking measures to avoid excessive
> >fragmentation than removing the canary from the mine...
> >
> >>[109626.075483] nfsd            D 0000000000000002     0 20042      2
> >>0x00000000
> >
> >Hmmm - it's also a file written by the NFS server - this is is on an
> >a dedicated NFS server?
> >
> >>[109626.075483]  [<ffffffffa01be6b0>] ? nfsd_setuser+0x120/0x2b0 [nfsd]
> >>[109626.075483]  [<ffffffff8119bccc>] ? vfs_writev+0x3c/0x50
> >>[109626.075483]  [<ffffffffa01b7dd2>] ? nfsd_vfs_write.isra.12+0x92/0x350
> >>[nfsd]
> >>[109626.075483]  [<ffffffff8119a6cb>] ? dentry_open+0x6b/0xd0
> >>[109626.075483]  [<ffffffffa01ba679>] ? nfsd_write+0xf9/0x110 [nfsd]
> >>[109626.075483]  [<ffffffffa01c4dd1>] ? nfsd3_proc_write+0xb1/0x140
> >>[nfsd]
> >
> >Interesting that this is an NFSv3 write...
> >
> >>[87303.976119] INFO: task nfsd:5684 blocked for more than 180 seconds.
> >>[87303.976976] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> >>disables
> >>this message.
> >>[87303.978012] nfsd            D 0000000000000003     0  5684      2
> >>0x00000000
> >....
> >>[87303.978174]  [<ffffffffa0269623>] nfsd_write+0xa3/0x110 [nfsd]
> >>[87303.978182]  [<ffffffffa027794c>] nfsd4_write+0x1cc/0x250 [nfsd]
> >>[87303.978189]  [<ffffffffa027746c>] nfsd4_proc_compound+0x5ac/0x7a0
> >>[nfsd]
> >
> >And that is a NFsv4 write. You have multiple clients writing to the
> >same file using different versions of the NFS protocol?
> >
> >Cheers,
> >
> >Dave.
> >-- 
> >Dave Chinner
> >david@xxxxxxxxxxxxx
> >
> >_______________________________________________
> >xfs mailing list
> >xfs@xxxxxxxxxxx
> >http://oss.sgi.com/mailman/listinfo/xfs
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs