Re: xfs_iext_realloc_indirect and "XFS: possible memory allocation deadlock"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,

First, to answer your questions:
- this filesystem is accessed only through NFS, so I guess we can say this is a "dedicated" NFS server - yes, both NFSv3 and NFSv4 are used (by different servers), and definitely they may attempt to access the same file

I did some further debugging by patching XFS to get some more info on how much memory is requested, and who requests it. I added a kmem_realloc variant that is called only by xfs_iext_realloc_indirect. This new function (kmem_realloc_xfs_iext_realloc_indirect) also prints the inode number, which needs the memory. This is after some fiddling to check that "ifp" is the one embedded in xfs_inode and not the one allocated through kmem_cache (attr fork). After memory allocation fails for the first time, I also call handle_sysrq('m'). Finally, I print after how many retries the allocation succeeded. I also added some info to the normal kmem_alloc function.

Results are following:
- memory allocation failures happened only on the kmem_realloc_xfs_iext_realloc_indirect path for now - XFS hits memory re-allocation failures when it needs to allocate about 35KB. Sometimes allocation succeeds after few retries, but sometimes it takes several thousands of retries.
- All allocation failures happened on NFSv3 paths
- Three inode numbers were reported as failing memory allocations. After several hours, "find -inum" is still searching for these inodes...this is a huge filesystem... Is there any other quicker (XFS-specific?) way to find the file based on inode number?

Please see a typical allocation failure in [1]

Any recommendation how to move forward with this issue?

Additional observation that I saw in my local system: writing files to XFS locally vs writing the same files via NFS (both 3 and 4), the amount of extents reported by "xfs_bmap" is much higher for the NFS case. For example, creating a new file and writing into it as follows:
- write 4KB
- skip 4KB (i.e., lseek to 4KB + 4KB)
- write 4KB
- skip 4KB
...
Create a file of say 50MB this way.

Locally it ends up with very few (1-5) extents. But same exact workload through NFS results in several thousands of extents. The filesystem is mounted as "sync" in both cases. This is currently just an observation, which I will try to debug further. Are you perhaps familiar with such behavior? Can you perhaps try and check whether it also happens in your environment?

Thanks for your help,
Alex.



[1]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.497033] XFS: pid=12642 kmem_alloc failure inum=35646862875 size=35584 flags=0x4 lflags=0x250 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498407] Pid: 12642, comm: nfsd Tainted: GF O 3.8.13-030813-generic #201305111843
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498410] Call Trace:
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498450] [<ffffffffa07a832e>] kmem_alloc_xfs_iext_realloc_indirect+0x16e/0x1f0 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498465] [<ffffffffa07a86b3>] kmem_realloc_xfs_iext_realloc_indirect+0x33/0x80 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498482] [<ffffffffa07db2a0>] xfs_iext_realloc_indirect+0x40/0x60 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498498] [<ffffffffa07db2ff>] xfs_iext_irec_new+0x3f/0x180 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498516] [<ffffffffa07db5b9>] xfs_iext_add_indirect_multi+0x179/0x2b0 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498532] [<ffffffffa07db9be>] xfs_iext_add+0xce/0x290 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498539] [<ffffffff81076634>] ? wake_up_worker+0x24/0x30 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498553] [<ffffffffa07dbbd2>] xfs_iext_insert+0x52/0x100 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498569] [<ffffffffa07b6ed3>] ? xfs_bmap_add_extent_hole_delay+0xd3/0x6a0 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498583] [<ffffffffa07b6ed3>] xfs_bmap_add_extent_hole_delay+0xd3/0x6a0 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498598] [<ffffffffa07da3fc>] ? xfs_iext_bno_to_ext+0x8c/0x160 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498614] [<ffffffffa07b7719>] xfs_bmapi_reserve_delalloc+0x279/0x2a0 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498628] [<ffffffffa07be742>] xfs_bmapi_delay+0x122/0x270 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498642] [<ffffffffa079d703>] xfs_iomap_write_delay+0x173/0x320 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498656] [<ffffffffa07be42c>] ? xfs_bmapi_read+0xfc/0x2f0 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498660] [<ffffffff8135d8f3>] ? call_rwsem_down_write_failed+0x13/0x20 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498665] [<ffffffff81195bbc>] ? lookup_page_cgroup+0x4c/0x50 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498677] [<ffffffffa078ab40>] __xfs_get_blocks+0x280/0x550 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498688] [<ffffffffa078ae41>] xfs_get_blocks+0x11/0x20 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498695] [<ffffffff811cf77e>] __block_write_begin+0x1ae/0x4e0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498705] [<ffffffffa078ae30>] ? xfs_get_blocks_direct+0x20/0x20 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498708] [<ffffffff81135fff>] ? grab_cache_page_write_begin+0x8f/0xf0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498720] [<ffffffffa078a09f>] xfs_vm_write_begin+0x5f/0xe0 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498725] [<ffffffff8113552a>] generic_perform_write+0xca/0x210 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498729] [<ffffffff811356cd>] generic_file_buffered_write+0x5d/0x90 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498740] [<ffffffffa07952d5>] xfs_file_buffered_aio_write+0x115/0x1c0 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498744] [<ffffffff816159f4>] ? ip_finish_output+0x224/0x3b0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498758] [<ffffffffa079547c>] xfs_file_aio_write+0xfc/0x1b0 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498769] [<ffffffffa0795380>] ? xfs_file_buffered_aio_write+0x1c0/0x1c0 [xfs] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498773] [<ffffffff8119b8c3>] do_sync_readv_writev+0xa3/0xe0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498776] [<ffffffff8119bb8d>] do_readv_writev+0xcd/0x1d0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498780] [<ffffffff810877e0>] ? set_groups+0x40/0x60 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498805] [<ffffffffa033a6b0>] ? nfsd_setuser+0x120/0x2b0 [nfsd] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498808] [<ffffffff8119bccc>] vfs_writev+0x3c/0x50 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498815] [<ffffffffa0333dd2>] nfsd_vfs_write.isra.12+0x92/0x350 [nfsd] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498818] [<ffffffff8119a6cb>] ? dentry_open+0x6b/0xd0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498825] [<ffffffffa0336679>] nfsd_write+0xf9/0x110 [nfsd] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498834] [<ffffffffa0340dd1>] nfsd3_proc_write+0xb1/0x140 [nfsd] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498841] [<ffffffffa032fd62>] nfsd_dispatch+0x102/0x270 [nfsd] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498865] [<ffffffffa0103b48>] svc_process_common+0x328/0x5e0 [sunrpc] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498876] [<ffffffffa0104153>] svc_process+0x103/0x160 [sunrpc] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498882] [<ffffffffa032f72f>] nfsd+0xbf/0x130 [nfsd] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498888] [<ffffffffa032f670>] ? nfsd_destroy+0x80/0x80 [nfsd] Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498892] [<ffffffff8107f050>] kthread+0xc0/0xd0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498895] [<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498900] [<ffffffff816f61ec>] ret_from_fork+0x7c/0xb0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498903] [<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498905] SysRq : Show Memory
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499385] Mem-Info:
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499388] Node 0 DMA per-cpu:
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499391] CPU 0: hi: 0, btch: 1 usd: 0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499392] CPU 1: hi: 0, btch: 1 usd: 0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499394] CPU 2: hi: 0, btch: 1 usd: 0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499395] CPU 3: hi: 0, btch: 1 usd: 0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499396] Node 0 DMA32 per-cpu: Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499399] CPU 0: hi: 186, btch: 31 usd: 0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499400] CPU 1: hi: 186, btch: 31 usd: 0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499402] CPU 2: hi: 186, btch: 31 usd: 0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499403] CPU 3: hi: 186, btch: 31 usd: 0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499404] Node 0 Normal per-cpu: Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499406] CPU 0: hi: 186, btch: 31 usd: 23 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499407] CPU 1: hi: 186, btch: 31 usd: 25 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499409] CPU 2: hi: 186, btch: 31 usd: 0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499410] CPU 3: hi: 186, btch: 31 usd: 7 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415] active_anon:2143 inactive_anon:44181 isolated_anon:0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415] active_file:913373 inactive_file:1464930 isolated_file:0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415] unevictable:8449 dirty:6742 writeback:115 unstable:0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415] free:159418 slab_reclaimable:146857 slab_unreclaimable:66681 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415] mapped:5561 shmem:383 pagetables:2195 bounce:0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  free_cma:0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499418] Node 0 DMA free:15884kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15628kB managed:15884kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499423] lowmem_reserve[]: 0 3512 12080 12080 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499427] Node 0 DMA32 free:240040kB min:19624kB low:24528kB high:29436kB active_anon:8kB inactive_anon:15196kB active_file:1503256kB inactive_file:1624328kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3596532kB managed:3546428kB mlocked:0kB dirty:14652kB writeback:0kB mapped:988kB shmem:0kB slab_reclaimable:173044kB slab_unreclaimable:15988kB kernel_stack:248kB pagetables:28kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499432] lowmem_reserve[]: 0 0 8568 8568 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499435] Node 0 Normal free:381748kB min:47872kB low:59840kB high:71808kB active_anon:8564kB inactive_anon:161528kB active_file:2150236kB inactive_file:4235392kB unevictable:33796kB isolated(anon):0kB isolated(file):0kB present:8773632kB managed:8715068kB mlocked:33796kB dirty:12316kB writeback:460kB mapped:21256kB shmem:1532kB slab_reclaimable:414384kB slab_unreclaimable:250736kB kernel_stack:8424kB pagetables:8752kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499443] lowmem_reserve[]: 0 0 0 0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499446] Node 0 DMA: 1*4kB (U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15884kB Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499458] Node 0 DMA32: 27954*4kB (UEM) 12312*8kB (UEM) 1713*16kB (UEM) 17*32kB (UMR) 2*64kB (R) 2*128kB (R) 1*256kB (R) 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 240440kB Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499469] Node 0 Normal: 31971*4kB (UEM) 31254*8kB (UEM) 2*16kB (EM) 5*32kB (R) 4*64kB (R) 2*128kB (R) 0*256kB 0*512kB 1*1024kB (R) 1*2048kB (R) 0*4096kB = 381692kB Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499480] 2380174 total pagecache pages Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499482] 0 pages in swap cache Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499483] Swap cache stats: add 0, delete 0, find 0/0 Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499485] Free swap = 522236kB Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499486] Total swap =522236kB
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 3145712 pages RAM
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 69093 pages reserved Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 3401898 pages shared Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 1067069 pages non-shared
...
34501 Jun 26 14:20:03 vsa-00000003-vc-1 kernel: [91864.156051] XFS: pid=12642 kmem_alloc success after 13443 retries inum=35646862875 size=35584 flags=0x4 lflags=0x250

-----Original Message----- From: Dave Chinner
Sent: 23 June, 2015 10:18 PM
To: Alex Lyakas
Cc: xfs@xxxxxxxxxxx ; hch@xxxxxx
Subject: Re: xfs_iext_realloc_indirect and "XFS: possible memory allocation
deadlock"

On Tue, Jun 23, 2015 at 10:18:21AM +0200, Alex Lyakas wrote:
Greetings,

We are hitting an issue with XFS printing messages like “XFS:
possible memory allocation deadlock in kmem_alloc
(mode:0x250)” and stack trace like in [1]. Eventually,
hung-task panic kicks in with stack traces like [2].

We are running kernel 3.8.13. I see that in
http://oss.sgi.com/archives/xfs/2012-01/msg00341.html a similar
issue has been discussed, but no code changes followed comparing
to what we have in 3.8.13.

Any suggestion on how to move forward with this problem? For
example, does this memory has to be really allocated with kmalloc
(i.e., physically continuous) or vmalloc can be used?

We left it alone because it is relatively rare for people to hit it,
and generally it indicates a severe fragmentation problem when they
do hit it (i.e. a file with millions of extents in it). Can you
track down the file that this is occurring against and see how many
extents it has?

i.e. you may be much better off by taking measures to avoid excessive
fragmentation than removing the canary from the mine...

[109626.075483] nfsd            D 0000000000000002     0 20042      2
0x00000000

Hmmm - it's also a file written by the NFS server - this is is on an
a dedicated NFS server?

[109626.075483]  [<ffffffffa01be6b0>] ? nfsd_setuser+0x120/0x2b0 [nfsd]
[109626.075483]  [<ffffffff8119bccc>] ? vfs_writev+0x3c/0x50
[109626.075483]  [<ffffffffa01b7dd2>] ? nfsd_vfs_write.isra.12+0x92/0x350
[nfsd]
[109626.075483]  [<ffffffff8119a6cb>] ? dentry_open+0x6b/0xd0
[109626.075483]  [<ffffffffa01ba679>] ? nfsd_write+0xf9/0x110 [nfsd]
[109626.075483]  [<ffffffffa01c4dd1>] ? nfsd3_proc_write+0xb1/0x140 [nfsd]

Interesting that this is an NFSv3 write...

[87303.976119] INFO: task nfsd:5684 blocked for more than 180 seconds.
[87303.976976] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[87303.978012] nfsd            D 0000000000000003     0  5684      2
0x00000000
....
[87303.978174]  [<ffffffffa0269623>] nfsd_write+0xa3/0x110 [nfsd]
[87303.978182]  [<ffffffffa027794c>] nfsd4_write+0x1cc/0x250 [nfsd]
[87303.978189]  [<ffffffffa027746c>] nfsd4_proc_compound+0x5ac/0x7a0
[nfsd]

And that is a NFsv4 write. You have multiple clients writing to the
same file using different versions of the NFS protocol?

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs




[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux