Re: High Fragmentation with XFS and NFS Sync

Nick Fisk <friskyfisk10@xxxxxxxxxxxxxx> · Sat, 2 Jul 2016 22:30:56 +0100

On 2 July 2016 at 22:00, Nick Fisk <friskyfisk10@xxxxxxxxxxxxxx> wrote:
> On 2 July 2016 at 21:12, Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
>> On Sat, Jul 02, 2016 at 09:52:40AM +0100, Nick Fisk wrote:
>>> Hi, hope someone can help me here.
>>>
>>> I'm exporting some XFS fs's to ESX via NFS with the sync option enabled. I'm
>>> seeing really heavy fragmentation when multiple VM's are copied onto the
>>> share at the same time. I'm also seeing kmem_alloc failures, which is
>>> probably the biggest problem as this effectively takes everything down.
>>
>> (Probably a result of loading the millions of bmbt extents into memory?)
>
> Yes I thought that was the case.
>
>>
>>> Underlying storage is a Ceph RBD, the server the FS is running on, is
>>> running kernel 4.5.7. Mount options are currently default. I'm seeing
>>> Millions of extents, where the ideal is listed as a couple of thousand when
>>> running xfs_db, there is only a couple of 100 files on the FS. It looks
>>> like roughly the extent sizes roughly match the IO size that the VM's were
>>> written to XFS with. So it looks like each parallel IO thread is being
>>> allocated next to each other rather than at spaced out regions of the disk.
>>>
>>> From what I understand, this is because each NFS write opens and closes the
>>> file which throws off any chance that XFS will be able to use its allocation
>>> features to stop parallel write streams from interleaving with each other.
>>>
>>> Is there anything I can tune to try and give each write to each file a
>>> little bit of space, so that it at least gives readahead a chance when
>>> reading, that it might hit at least a few MB of sequential data?
>>
>> /me wonders if setting an extent size hint on the rootdir before copying
>> the files over would help here...
>
> I've set a 16M hint and will copy a new VM over, interested to see
> what happens. Thanks for the suggestion.

Well, I set the 16M hint at the root of the FS and proceeded to copy
two VM's in parallel and got this after ~30-60s. But after rebooting,
it looks like the extents were being allocated in larger blocks, so I
guess you can call it progress. Any ideas?

Jul  2 22:11:56 Proxy3 kernel: [48777.591415] XFS (rbd8): Access to
block zero in inode 7054483473 start_block: 0 start_off: 0 blkcnt: 0
extent-state: 0 lastx: 30d18
Jul  2 22:11:56 Proxy3 kernel: [48777.602725] XFS (rbd8): Internal
error XFS_WANT_CORRUPTED_GOTO at line 1947 of file
/home/kernel/COD/linux/fs/xfs/libxfs/xfs_bmap.c.  Caller
xfs_bmapi_write+0x749/0xa00 [xfs]
Jul  2 22:11:56 Proxy3 kernel: [48777.608381] CPU: 3 PID: 1463 Comm:
nfsd Tainted: G           OE   4.5.7-040507-generic #201606100436
Jul  2 22:11:56 Proxy3 kernel: [48777.608385] Hardware name: VMware,
Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS
6.00 09/17/2015
Jul  2 22:11:56 Proxy3 kernel: [48777.608389]  0000000000000286
000000008a69edda ffff8800b76fb548 ffffffff813e1173
Jul  2 22:11:56 Proxy3 kernel: [48777.608395]  00000000000000cc
ffff8800b76fb6e0 ffff8800b76fb560 ffffffffc07ad60c
Jul  2 22:11:56 Proxy3 kernel: [48777.608399]  ffffffffc077c959
ffff8800b76fb658 ffffffffc0777a13 ffff8802116c9000
Jul  2 22:11:56 Proxy3 kernel: [48777.608403] Call Trace:
Jul  2 22:11:56 Proxy3 kernel: [48777.608471]  [<ffffffff813e1173>]
dump_stack+0x63/0x90
Jul  2 22:11:56 Proxy3 kernel: [48777.608566]  [<ffffffffc07ad60c>]
xfs_error_report+0x3c/0x40 [xfs]
Jul  2 22:11:56 Proxy3 kernel: [48777.608620]  [<ffffffffc077c959>] ?
xfs_bmapi_write+0x749/0xa00 [xfs]
Jul  2 22:11:56 Proxy3 kernel: [48777.608669]  [<ffffffffc0777a13>]
xfs_bmap_add_extent_delay_real+0x883/0x1ce0 [xfs]
Jul  2 22:11:56 Proxy3 kernel: [48777.608721]  [<ffffffffc077c959>]
xfs_bmapi_write+0x749/0xa00 [xfs]
Jul  2 22:11:56 Proxy3 kernel: [48777.608777]  [<ffffffffc07b8a8d>]
xfs_iomap_write_allocate+0x16d/0x380 [xfs]
Jul  2 22:11:56 Proxy3 kernel: [48777.608822]  [<ffffffffc07a25d3>]
xfs_map_blocks+0x173/0x240 [xfs]
Jul  2 22:11:56 Proxy3 kernel: [48777.608871]  [<ffffffffc07a33a8>]
xfs_vm_writepage+0x198/0x660 [xfs]
Jul  2 22:11:56 Proxy3 kernel: [48777.608902]  [<ffffffff811993a3>]
__writepage+0x13/0x30
Jul  2 22:11:56 Proxy3 kernel: [48777.608908]  [<ffffffff81199efe>]
write_cache_pages+0x1fe/0x530
Jul  2 22:11:56 Proxy3 kernel: [48777.608912]  [<ffffffff81199390>] ?
wb_position_ratio+0x1f0/0x1f0
Jul  2 22:11:56 Proxy3 kernel: [48777.608961]  [<ffffffffc07bbbaa>] ?
xfs_iunlock+0xea/0x120 [xfs]
Jul  2 22:11:56 Proxy3 kernel: [48777.608970]  [<ffffffff8119a281>]
generic_writepages+0x51/0x80
Jul  2 22:11:56 Proxy3 kernel: [48777.609019]  [<ffffffffc07a31c3>]
xfs_vm_writepages+0x53/0xa0 [xfs]
Jul  2 22:11:56 Proxy3 kernel: [48777.609028]  [<ffffffff8119c73e>]
do_writepages+0x1e/0x30
Jul  2 22:11:56 Proxy3 kernel: [48777.609048]  [<ffffffff8118f516>]
__filemap_fdatawrite_range+0xc6/0x100
Jul  2 22:11:56 Proxy3 kernel: [48777.609053]  [<ffffffff8118f691>]
filemap_write_and_wait_range+0x41/0x90
Jul  2 22:11:56 Proxy3 kernel: [48777.609103]  [<ffffffffc07af7d3>]
xfs_file_fsync+0x63/0x210 [xfs]
Jul  2 22:11:56 Proxy3 kernel: [48777.609130]  [<ffffffff81249e9b>]
vfs_fsync_range+0x4b/0xb0
Jul  2 22:11:56 Proxy3 kernel: [48777.609156]  [<ffffffffc04b856d>]
nfsd_vfs_write+0x14d/0x380 [nfsd]
Jul  2 22:11:56 Proxy3 kernel: [48777.609176]  [<ffffffffc04babf0>]
nfsd_write+0x120/0x2f0 [nfsd]
Jul  2 22:11:56 Proxy3 kernel: [48777.609190]  [<ffffffffc04c105c>]
nfsd3_proc_write+0xbc/0x150 [nfsd]
Jul  2 22:11:56 Proxy3 kernel: [48777.609207]  [<ffffffffc04b3348>]
nfsd_dispatch+0xb8/0x200 [nfsd]
Jul  2 22:11:56 Proxy3 kernel: [48777.609243]  [<ffffffffc03cd21c>]
svc_process_common+0x40c/0x650 [sunrpc]
Jul  2 22:11:56 Proxy3 kernel: [48777.609267]  [<ffffffffc03ce5c3>]
svc_process+0x103/0x1b0 [sunrpc]
Jul  2 22:11:56 Proxy3 kernel: [48777.609286]  [<ffffffffc04b2d8f>]
nfsd+0xef/0x160 [nfsd]
Jul  2 22:11:56 Proxy3 kernel: [48777.609297]  [<ffffffffc04b2ca0>] ?
nfsd_destroy+0x60/0x60 [nfsd]
Jul  2 22:11:56 Proxy3 kernel: [48777.609322]  [<ffffffff810a06a8>]
kthread+0xd8/0xf0
Jul  2 22:11:56 Proxy3 kernel: [48777.609329]  [<ffffffff810a05d0>] ?
kthread_create_on_node+0x1a0/0x1a0
Jul  2 22:11:56 Proxy3 kernel: [48777.609369]  [<ffffffff81825bcf>]
ret_from_fork+0x3f/0x70
Jul  2 22:11:56 Proxy3 kernel: [48777.609374]  [<ffffffff810a05d0>] ?
kthread_create_on_node+0x1a0/0x1a0
Jul  2 22:11:56 Proxy3 kernel: [48777.609444] XFS (rbd8): Internal
error xfs_trans_cancel at line 990 of file
/home/kernel/COD/linux/fs/xfs/xfs_trans.c.  Caller
xfs_iomap_write_allocate+0x270/0x380 [xfs]

>
>>
>> --D
>>
>>>
>>> I have read that inode32 allocates more randomly compared to inode64, so I'm
>>> not sure if it's worth trying this as there will likely be less than a 1000
>>> files per FS.
>>>
>>> Or am I best just to run fsr after everything has been copied on?
>>>
>>> Thanks for any advice
>>> Nick
>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@xxxxxxxxxxx
>>> http://oss.sgi.com/mailman/listinfo/xfs
>>

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs