Re: xfs_iext_realloc_indirect and "XFS: possible memory allocation deadlock"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Dave,
I have some more observations from further debugging.

1) I tracked down the problematic inodes using xfs_db (I still don't know their pathnames though). One of them is not problematic anymore (it seems like it got truncated, or perhaps deleted and inode number got reused). But two other inodes are huge files with hundreds of thousands of extents [1][2]

2) I am doing some experiments writing local files vs same workload through NFS. Trying the same 4KB-data/4KB-hole workload on small files of 2MB. When writing the file locally, I see that xfs_file_buffered_aio_write is always called with a single 4KB buffer:
xfs_file_buffered_aio_write: inum=100663559 nr_segs=1
seg #0: {.iov_base=0x18db8f0, .iov_len=4096}

But when doing the same workload through NFS:
xfs_file_buffered_aio_write: inum=167772423 nr_segs=2
seg #0: {.iov_base=0xffff88006c1100a8, .iov_len=3928}
seg #1: {.iov_base=0xffff88005556e000, .iov_len=168}
There are always two such buffers in the IOV.

Debugging further into nfsd4_decode_write, it appears that first some NFS headers come in the page, and the data starts after them, so additional page is required to fit the rest of 4KB of data.

I am still trying to debug why this results in XFS requiring much more extents to fit such workload. I tapped into some functions and seeing:

Local workload:
6 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x0 ext_diff=0x1, nextents=0 new_size=16 if_bytes=0 if_real_bytes=0 25 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 44 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 63 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 82 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 101 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 120 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 139 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 158 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 177 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 196 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 215 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 234 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 253 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 272 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 291 xfs_iext_add: ifp=0xffff8800096de6b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0
So we always stay within inline extents.

NFS workload:
7 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x0 ext_diff=0x1, nextents=0 new_size=16 if_bytes=0 if_real_bytes=0 31 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 61 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 88 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 115 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x1 ext_diff=0x1, nextents=1 new_size=32 if_bytes=16 if_real_bytes=0 124 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x1 ext_diff=0x1, nextents=2 new_size=48 if_bytes=32 if_real_bytes=0 130 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x1 ext_diff=0x1, nextents=2 new_size=48 if_bytes=32 if_real_bytes=0 136 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x2 ext_diff=0x1, nextents=3 new_size=64 if_bytes=48 if_real_bytes=128 142 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x3 ext_diff=0x1, nextents=4 new_size=80 if_bytes=64 if_real_bytes=64 154 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x5 ext_diff=0x1, nextents=5 new_size=96 if_bytes=80 if_real_bytes=128 160 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x5 ext_diff=0x1, nextents=6 new_size=112 if_bytes=96 if_real_bytes=128 166 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x6 ext_diff=0x1, nextents=6 new_size=112 if_bytes=96 if_real_bytes=128 178 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x5 ext_diff=0x1, nextents=7 new_size=128 if_bytes=112 if_real_bytes=128 193 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x6 ext_diff=0x1, nextents=8 new_size=144 if_bytes=128 if_real_bytes=128 199 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x7 ext_diff=0x1, nextents=9 new_size=160 if_bytes=144 if_real_bytes=256 205 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x8 ext_diff=0x1, nextents=10 new_size=176 if_bytes=160 if_real_bytes=256 217 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0xa ext_diff=0x1, nextents=11 new_size=192 if_bytes=176 if_real_bytes=256 226 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0xb ext_diff=0x1, nextents=11 new_size=192 if_bytes=176 if_real_bytes=256 244 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0xb ext_diff=0x1, nextents=11 new_size=192 if_bytes=176 if_real_bytes=256 262 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0xc ext_diff=0x1, nextents=12 new_size=208 if_bytes=192 if_real_bytes=256 271 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0xb ext_diff=0x1, nextents=13 new_size=224 if_bytes=208 if_real_bytes=256 277 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0xb ext_diff=0x1, nextents=13 new_size=224 if_bytes=208 if_real_bytes=256 295 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0xd ext_diff=0x1, nextents=14 new_size=240 if_bytes=224 if_real_bytes=256 316 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0xe ext_diff=0x1, nextents=14 new_size=240 if_bytes=224 if_real_bytes=256 343 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0xe ext_diff=0x1, nextents=14 new_size=240 if_bytes=224 if_real_bytes=256 370 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0xe ext_diff=0x1, nextents=14 new_size=240 if_bytes=224 if_real_bytes=256 382 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0xe ext_diff=0x1, nextents=14 new_size=240 if_bytes=224 if_real_bytes=256 406 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0xe ext_diff=0x1, nextents=15 new_size=256 if_bytes=240 if_real_bytes=256 412 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0xe ext_diff=0x1, nextents=15 new_size=256 if_bytes=240 if_real_bytes=256 418 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0xf ext_diff=0x1, nextents=16 new_size=272 if_bytes=256 if_real_bytes=256 424 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x10 ext_diff=0x1, nextents=17 new_size=288 if_bytes=272 if_real_bytes=512 430 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x11 ext_diff=0x1, nextents=18 new_size=304 if_bytes=288 if_real_bytes=512 439 xfs_iext_add: ifp=0xffff8800096df4b8 idx=0x13 ext_diff=0x1, nextents=19 new_size=320 if_bytes=304 if_real_bytes=512

Number of extents is growing. But still I could not see why this is happening. Can you please give a hint why?

3) I tried to see what is the largest file XFS can maintain with this 4KB-data/4KB-hole workload on a VM with 5GB RAM. I was able to reach 146GB and almost 9M extents. There were a lof of "memory allocation deadlock" messages popping, but eventually allocation would succeed. Until finally, allocation could not succeed for 3 minutes and hung-task panic occurred. With a no-hole 32KB workload, XFS successfully created a 400GB file with 92789 extents.

Thanks,
Alex.





[1]
xfs_db> inode 35741520897
xfs_db> p
core.magic = 0x494e
core.mode = 0100600
core.version = 2
core.format = 3 (btree)
core.nlinkv2 = 1
core.onlink = 0
core.projid_lo = 0
core.projid_hi = 0
core.uid = 0
core.gid = 0
core.flushiter = 454
core.atime.sec = Sat Jun 13 00:02:19 2015
core.atime.nsec = 698625000
core.mtime.sec = Tue Jun 16 02:36:42 2015
core.mtime.nsec = 786938999
core.ctime.sec = Sat Jun 27 03:21:14 2015
core.ctime.nsec = 899709747
core.size = 180676487417
core.nblocks = 44114144
core.extsize = 0
core.nextents = 928797
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 1092231143
next_unlinked = null
u.bmbt.level = 3
u.bmbt.numrecs = 1
u.bmbt.keys[1] = [startoff] 1:[0]
u.bmbt.ptrs[1] = 1:2541992517

[2]
xfs_db> inode 35646862875
xfs_db> p
core.magic = 0x494e
core.mode = 0100600
core.version = 2
core.format = 3 (btree)
core.nlinkv2 = 1
core.onlink = 0
core.projid_lo = 0
core.projid_hi = 0
core.uid = 0
core.gid = 0
core.flushiter = 1321
core.atime.sec = Fri Jun 26 11:07:27 2015
core.atime.nsec = 642685048
core.mtime.sec = Fri Jun 26 14:20:57 2015
core.mtime.nsec = 840214048
core.ctime.sec = Fri Jun 26 14:21:07 2015
core.ctime.nsec = 616214048
core.size = 149676922744
core.nblocks = 36544482
core.extsize = 0
core.nextents = 572425
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
core.dmevmask = 0
core.dmstate = 0
core.newrtbm = 0
core.prealloc = 0
core.realtime = 0
core.immutable = 0
core.append = 0
core.sync = 0
core.noatime = 0
core.nodump = 0
core.rtinherit = 0
core.projinherit = 0
core.nosymlinks = 0
core.extsz = 0
core.extszinherit = 0
core.nodefrag = 0
core.filestream = 0
core.gen = 3873914382
next_unlinked = null
u.bmbt.level = 2
u.bmbt.numrecs = 9
u.bmbt.keys[1-9] = [startoff] 1:[0] 2:[3764720] 3:[7291520] 4:[11373840] 5:[16320112] 6:[20521147] 7:[23938933] 8:[28154576] 9:[32607104] u.bmbt.ptrs[1-9] = 1:9885338478 2:9892028257 3:5911725342 4:10259977998 5:9907385089 6:10056188649 7:10058512073 8:9917653582 9:9922453713






-----Original Message----- From: Alex Lyakas
Sent: 27 June, 2015 11:01 PM
To: Dave Chinner
Cc: xfs@xxxxxxxxxxx ; hch@xxxxxx ; Yair Hershko ; Shyam Kaushik ; Danny Shavit Subject: Re: xfs_iext_realloc_indirect and "XFS: possible memory allocation deadlock"

Hi Dave,

First, to answer your questions:
- this filesystem is accessed only through NFS, so I guess we can say this
is a "dedicated" NFS server
- yes, both NFSv3 and NFSv4 are used (by different servers), and definitely
they may attempt to access the same file

I did some further debugging by patching XFS to get some more info on how
much memory is requested, and who requests it. I added a kmem_realloc
variant that is called only by xfs_iext_realloc_indirect. This new function
(kmem_realloc_xfs_iext_realloc_indirect) also prints the inode number, which
needs the memory. This is after some fiddling to check that "ifp" is the one
embedded in xfs_inode and not the one allocated through kmem_cache (attr
fork). After memory allocation fails for the first time, I also call
handle_sysrq('m'). Finally, I print after how many retries the allocation
succeeded. I also added some info to the normal kmem_alloc function.

Results are following:
- memory allocation failures happened only on the
kmem_realloc_xfs_iext_realloc_indirect path for now
- XFS hits memory re-allocation failures when it needs to allocate about
35KB. Sometimes allocation succeeds after few retries, but sometimes it
takes several thousands of retries.
- All allocation failures happened on NFSv3 paths
- Three inode numbers were reported as failing memory allocations. After
several hours, "find -inum" is still searching for these inodes...this is a
huge filesystem... Is there any other quicker (XFS-specific?) way to find
the file based on inode number?

Please see a typical allocation failure in [1]

Any recommendation how to move forward with this issue?

Additional observation that I saw in my local system: writing files to XFS
locally vs writing the same files via NFS (both 3 and 4), the amount of
extents reported by "xfs_bmap" is much higher for the NFS case. For example,
creating a new file and writing into it as follows:
- write 4KB
- skip 4KB (i.e., lseek to 4KB + 4KB)
- write 4KB
- skip 4KB
...
Create a file of say 50MB this way.

Locally it ends up with very few (1-5) extents. But same exact workload
through NFS results in several thousands of extents. The filesystem is
mounted as "sync" in both cases. This is currently just an observation,
which I will try to debug further. Are you perhaps familiar with such
behavior? Can you perhaps try and check whether it also happens in your
environment?

Thanks for your help,
Alex.



[1]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.497033] XFS: pid=12642
kmem_alloc failure inum=35646862875 size=35584 flags=0x4 lflags=0x250
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498407] Pid: 12642, comm:
nfsd Tainted: GF          O 3.8.13-030813-generic #201305111843
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498410] Call Trace:
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498450]
[<ffffffffa07a832e>] kmem_alloc_xfs_iext_realloc_indirect+0x16e/0x1f0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498465]
[<ffffffffa07a86b3>] kmem_realloc_xfs_iext_realloc_indirect+0x33/0x80 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498482]
[<ffffffffa07db2a0>] xfs_iext_realloc_indirect+0x40/0x60 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498498]
[<ffffffffa07db2ff>] xfs_iext_irec_new+0x3f/0x180 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498516]
[<ffffffffa07db5b9>] xfs_iext_add_indirect_multi+0x179/0x2b0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498532]
[<ffffffffa07db9be>] xfs_iext_add+0xce/0x290 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498539]
[<ffffffff81076634>] ? wake_up_worker+0x24/0x30
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498553]
[<ffffffffa07dbbd2>] xfs_iext_insert+0x52/0x100 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498569]
[<ffffffffa07b6ed3>] ? xfs_bmap_add_extent_hole_delay+0xd3/0x6a0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498583]
[<ffffffffa07b6ed3>] xfs_bmap_add_extent_hole_delay+0xd3/0x6a0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498598]
[<ffffffffa07da3fc>] ? xfs_iext_bno_to_ext+0x8c/0x160 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498614]
[<ffffffffa07b7719>] xfs_bmapi_reserve_delalloc+0x279/0x2a0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498628]
[<ffffffffa07be742>] xfs_bmapi_delay+0x122/0x270 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498642]
[<ffffffffa079d703>] xfs_iomap_write_delay+0x173/0x320 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498656]
[<ffffffffa07be42c>] ? xfs_bmapi_read+0xfc/0x2f0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498660]
[<ffffffff8135d8f3>] ? call_rwsem_down_write_failed+0x13/0x20
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498665]
[<ffffffff81195bbc>] ? lookup_page_cgroup+0x4c/0x50
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498677]
[<ffffffffa078ab40>] __xfs_get_blocks+0x280/0x550 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498688]
[<ffffffffa078ae41>] xfs_get_blocks+0x11/0x20 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498695]
[<ffffffff811cf77e>] __block_write_begin+0x1ae/0x4e0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498705]
[<ffffffffa078ae30>] ? xfs_get_blocks_direct+0x20/0x20 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498708]
[<ffffffff81135fff>] ? grab_cache_page_write_begin+0x8f/0xf0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498720]
[<ffffffffa078a09f>] xfs_vm_write_begin+0x5f/0xe0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498725]
[<ffffffff8113552a>] generic_perform_write+0xca/0x210
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498729]
[<ffffffff811356cd>] generic_file_buffered_write+0x5d/0x90
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498740]
[<ffffffffa07952d5>] xfs_file_buffered_aio_write+0x115/0x1c0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498744]
[<ffffffff816159f4>] ? ip_finish_output+0x224/0x3b0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498758]
[<ffffffffa079547c>] xfs_file_aio_write+0xfc/0x1b0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498769]
[<ffffffffa0795380>] ? xfs_file_buffered_aio_write+0x1c0/0x1c0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498773]
[<ffffffff8119b8c3>] do_sync_readv_writev+0xa3/0xe0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498776]
[<ffffffff8119bb8d>] do_readv_writev+0xcd/0x1d0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498780]
[<ffffffff810877e0>] ? set_groups+0x40/0x60
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498805]
[<ffffffffa033a6b0>] ? nfsd_setuser+0x120/0x2b0 [nfsd]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498808]
[<ffffffff8119bccc>] vfs_writev+0x3c/0x50
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498815]
[<ffffffffa0333dd2>] nfsd_vfs_write.isra.12+0x92/0x350 [nfsd]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498818]
[<ffffffff8119a6cb>] ? dentry_open+0x6b/0xd0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498825]
[<ffffffffa0336679>] nfsd_write+0xf9/0x110 [nfsd]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498834]
[<ffffffffa0340dd1>] nfsd3_proc_write+0xb1/0x140 [nfsd]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498841]
[<ffffffffa032fd62>] nfsd_dispatch+0x102/0x270 [nfsd]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498865]
[<ffffffffa0103b48>] svc_process_common+0x328/0x5e0 [sunrpc]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498876]
[<ffffffffa0104153>] svc_process+0x103/0x160 [sunrpc]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498882]
[<ffffffffa032f72f>] nfsd+0xbf/0x130 [nfsd]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498888]
[<ffffffffa032f670>] ? nfsd_destroy+0x80/0x80 [nfsd]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498892]
[<ffffffff8107f050>] kthread+0xc0/0xd0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498895]
[<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498900]
[<ffffffff816f61ec>] ret_from_fork+0x7c/0xb0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498903]
[<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498905] SysRq : Show Memory
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499385] Mem-Info:
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499388] Node 0 DMA per-cpu:
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499391] CPU    0: hi:    0,
btch:   1 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499392] CPU    1: hi:    0,
btch:   1 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499394] CPU    2: hi:    0,
btch:   1 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499395] CPU    3: hi:    0,
btch:   1 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499396] Node 0 DMA32
per-cpu:
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499399] CPU    0: hi:  186,
btch:  31 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499400] CPU    1: hi:  186,
btch:  31 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499402] CPU    2: hi:  186,
btch:  31 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499403] CPU    3: hi:  186,
btch:  31 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499404] Node 0 Normal
per-cpu:
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499406] CPU    0: hi:  186,
btch:  31 usd:  23
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499407] CPU    1: hi:  186,
btch:  31 usd:  25
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499409] CPU    2: hi:  186,
btch:  31 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499410] CPU    3: hi:  186,
btch:  31 usd:   7
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415] active_anon:2143
inactive_anon:44181 isolated_anon:0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  active_file:913373
inactive_file:1464930 isolated_file:0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  unevictable:8449
dirty:6742 writeback:115 unstable:0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  free:159418
slab_reclaimable:146857 slab_unreclaimable:66681
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  mapped:5561
shmem:383 pagetables:2195 bounce:0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  free_cma:0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499418] Node 0 DMA
free:15884kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:0kB
active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB
isolated(file):0kB present:15628kB managed:15884kB mlocked:0kB dirty:0kB
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable?
yes
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499423] lowmem_reserve[]: 0
3512 12080 12080
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499427] Node 0 DMA32
free:240040kB min:19624kB low:24528kB high:29436kB active_anon:8kB
inactive_anon:15196kB active_file:1503256kB inactive_file:1624328kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3596532kB
managed:3546428kB mlocked:0kB dirty:14652kB writeback:0kB mapped:988kB
shmem:0kB slab_reclaimable:173044kB slab_unreclaimable:15988kB
kernel_stack:248kB pagetables:28kB unstable:0kB bounce:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499432] lowmem_reserve[]: 0
0 8568 8568
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499435] Node 0 Normal
free:381748kB min:47872kB low:59840kB high:71808kB active_anon:8564kB
inactive_anon:161528kB active_file:2150236kB inactive_file:4235392kB
unevictable:33796kB isolated(anon):0kB isolated(file):0kB present:8773632kB
managed:8715068kB mlocked:33796kB dirty:12316kB writeback:460kB
mapped:21256kB shmem:1532kB slab_reclaimable:414384kB
slab_unreclaimable:250736kB kernel_stack:8424kB pagetables:8752kB
unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499443] lowmem_reserve[]: 0
0 0 0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499446] Node 0 DMA: 1*4kB
(U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB
1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15884kB
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499458] Node 0 DMA32:
27954*4kB (UEM) 12312*8kB (UEM) 1713*16kB (UEM) 17*32kB (UMR) 2*64kB (R)
2*128kB (R) 1*256kB (R) 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB =
240440kB
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499469] Node 0 Normal:
31971*4kB (UEM) 31254*8kB (UEM) 2*16kB (EM) 5*32kB (R) 4*64kB (R) 2*128kB
(R) 0*256kB 0*512kB 1*1024kB (R) 1*2048kB (R) 0*4096kB = 381692kB
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499480] 2380174 total
pagecache pages
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499482] 0 pages in swap
cache
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499483] Swap cache stats:
add 0, delete 0, find 0/0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499485] Free swap  =
522236kB
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499486] Total swap
=522236kB
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 3145712 pages RAM
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 69093 pages
reserved
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 3401898 pages
shared
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 1067069 pages
non-shared
...
34501    Jun 26 14:20:03 vsa-00000003-vc-1 kernel: [91864.156051] XFS:
pid=12642 kmem_alloc success after 13443 retries inum=35646862875 size=35584
flags=0x4 lflags=0x250

-----Original Message----- From: Dave Chinner
Sent: 23 June, 2015 10:18 PM
To: Alex Lyakas
Cc: xfs@xxxxxxxxxxx ; hch@xxxxxx
Subject: Re: xfs_iext_realloc_indirect and "XFS: possible memory allocation
deadlock"

On Tue, Jun 23, 2015 at 10:18:21AM +0200, Alex Lyakas wrote:
Greetings,

We are hitting an issue with XFS printing messages like “XFS:
possible memory allocation deadlock in kmem_alloc
(mode:0x250)” and stack trace like in [1]. Eventually,
hung-task panic kicks in with stack traces like [2].

We are running kernel 3.8.13. I see that in
http://oss.sgi.com/archives/xfs/2012-01/msg00341.html a similar
issue has been discussed, but no code changes followed comparing
to what we have in 3.8.13.

Any suggestion on how to move forward with this problem? For
example, does this memory has to be really allocated with kmalloc
(i.e., physically continuous) or vmalloc can be used?

We left it alone because it is relatively rare for people to hit it,
and generally it indicates a severe fragmentation problem when they
do hit it (i.e. a file with millions of extents in it). Can you
track down the file that this is occurring against and see how many
extents it has?

i.e. you may be much better off by taking measures to avoid excessive
fragmentation than removing the canary from the mine...

[109626.075483] nfsd            D 0000000000000002     0 20042      2
0x00000000

Hmmm - it's also a file written by the NFS server - this is is on an
a dedicated NFS server?

[109626.075483]  [<ffffffffa01be6b0>] ? nfsd_setuser+0x120/0x2b0 [nfsd]
[109626.075483]  [<ffffffff8119bccc>] ? vfs_writev+0x3c/0x50
[109626.075483]  [<ffffffffa01b7dd2>] ? nfsd_vfs_write.isra.12+0x92/0x350
[nfsd]
[109626.075483]  [<ffffffff8119a6cb>] ? dentry_open+0x6b/0xd0
[109626.075483]  [<ffffffffa01ba679>] ? nfsd_write+0xf9/0x110 [nfsd]
[109626.075483]  [<ffffffffa01c4dd1>] ? nfsd3_proc_write+0xb1/0x140 [nfsd]

Interesting that this is an NFSv3 write...

[87303.976119] INFO: task nfsd:5684 blocked for more than 180 seconds.
[87303.976976] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[87303.978012] nfsd            D 0000000000000003     0  5684      2
0x00000000
....
[87303.978174]  [<ffffffffa0269623>] nfsd_write+0xa3/0x110 [nfsd]
[87303.978182]  [<ffffffffa027794c>] nfsd4_write+0x1cc/0x250 [nfsd]
[87303.978189]  [<ffffffffa027746c>] nfsd4_proc_compound+0x5ac/0x7a0
[nfsd]

And that is a NFsv4 write. You have multiple clients writing to the
same file using different versions of the NFS protocol?

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs




[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux