On 2 July 2016 at 22:00, Nick Fisk <friskyfisk10@xxxxxxxxxxxxxx> wrote: > On 2 July 2016 at 21:12, Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote: >> On Sat, Jul 02, 2016 at 09:52:40AM +0100, Nick Fisk wrote: >>> Hi, hope someone can help me here. >>> >>> I'm exporting some XFS fs's to ESX via NFS with the sync option enabled. I'm >>> seeing really heavy fragmentation when multiple VM's are copied onto the >>> share at the same time. I'm also seeing kmem_alloc failures, which is >>> probably the biggest problem as this effectively takes everything down. >> >> (Probably a result of loading the millions of bmbt extents into memory?) > > Yes I thought that was the case. > >> >>> Underlying storage is a Ceph RBD, the server the FS is running on, is >>> running kernel 4.5.7. Mount options are currently default. I'm seeing >>> Millions of extents, where the ideal is listed as a couple of thousand when >>> running xfs_db, there is only a couple of 100 files on the FS. It looks >>> like roughly the extent sizes roughly match the IO size that the VM's were >>> written to XFS with. So it looks like each parallel IO thread is being >>> allocated next to each other rather than at spaced out regions of the disk. >>> >>> From what I understand, this is because each NFS write opens and closes the >>> file which throws off any chance that XFS will be able to use its allocation >>> features to stop parallel write streams from interleaving with each other. >>> >>> Is there anything I can tune to try and give each write to each file a >>> little bit of space, so that it at least gives readahead a chance when >>> reading, that it might hit at least a few MB of sequential data? >> >> /me wonders if setting an extent size hint on the rootdir before copying >> the files over would help here... > > I've set a 16M hint and will copy a new VM over, interested to see > what happens. Thanks for the suggestion. Well, I set the 16M hint at the root of the FS and proceeded to copy two VM's in parallel and got this after ~30-60s. But after rebooting, it looks like the extents were being allocated in larger blocks, so I guess you can call it progress. Any ideas? Jul 2 22:11:56 Proxy3 kernel: [48777.591415] XFS (rbd8): Access to block zero in inode 7054483473 start_block: 0 start_off: 0 blkcnt: 0 extent-state: 0 lastx: 30d18 Jul 2 22:11:56 Proxy3 kernel: [48777.602725] XFS (rbd8): Internal error XFS_WANT_CORRUPTED_GOTO at line 1947 of file /home/kernel/COD/linux/fs/xfs/libxfs/xfs_bmap.c. Caller xfs_bmapi_write+0x749/0xa00 [xfs] Jul 2 22:11:56 Proxy3 kernel: [48777.608381] CPU: 3 PID: 1463 Comm: nfsd Tainted: G OE 4.5.7-040507-generic #201606100436 Jul 2 22:11:56 Proxy3 kernel: [48777.608385] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/17/2015 Jul 2 22:11:56 Proxy3 kernel: [48777.608389] 0000000000000286 000000008a69edda ffff8800b76fb548 ffffffff813e1173 Jul 2 22:11:56 Proxy3 kernel: [48777.608395] 00000000000000cc ffff8800b76fb6e0 ffff8800b76fb560 ffffffffc07ad60c Jul 2 22:11:56 Proxy3 kernel: [48777.608399] ffffffffc077c959 ffff8800b76fb658 ffffffffc0777a13 ffff8802116c9000 Jul 2 22:11:56 Proxy3 kernel: [48777.608403] Call Trace: Jul 2 22:11:56 Proxy3 kernel: [48777.608471] [<ffffffff813e1173>] dump_stack+0x63/0x90 Jul 2 22:11:56 Proxy3 kernel: [48777.608566] [<ffffffffc07ad60c>] xfs_error_report+0x3c/0x40 [xfs] Jul 2 22:11:56 Proxy3 kernel: [48777.608620] [<ffffffffc077c959>] ? xfs_bmapi_write+0x749/0xa00 [xfs] Jul 2 22:11:56 Proxy3 kernel: [48777.608669] [<ffffffffc0777a13>] xfs_bmap_add_extent_delay_real+0x883/0x1ce0 [xfs] Jul 2 22:11:56 Proxy3 kernel: [48777.608721] [<ffffffffc077c959>] xfs_bmapi_write+0x749/0xa00 [xfs] Jul 2 22:11:56 Proxy3 kernel: [48777.608777] [<ffffffffc07b8a8d>] xfs_iomap_write_allocate+0x16d/0x380 [xfs] Jul 2 22:11:56 Proxy3 kernel: [48777.608822] [<ffffffffc07a25d3>] xfs_map_blocks+0x173/0x240 [xfs] Jul 2 22:11:56 Proxy3 kernel: [48777.608871] [<ffffffffc07a33a8>] xfs_vm_writepage+0x198/0x660 [xfs] Jul 2 22:11:56 Proxy3 kernel: [48777.608902] [<ffffffff811993a3>] __writepage+0x13/0x30 Jul 2 22:11:56 Proxy3 kernel: [48777.608908] [<ffffffff81199efe>] write_cache_pages+0x1fe/0x530 Jul 2 22:11:56 Proxy3 kernel: [48777.608912] [<ffffffff81199390>] ? wb_position_ratio+0x1f0/0x1f0 Jul 2 22:11:56 Proxy3 kernel: [48777.608961] [<ffffffffc07bbbaa>] ? xfs_iunlock+0xea/0x120 [xfs] Jul 2 22:11:56 Proxy3 kernel: [48777.608970] [<ffffffff8119a281>] generic_writepages+0x51/0x80 Jul 2 22:11:56 Proxy3 kernel: [48777.609019] [<ffffffffc07a31c3>] xfs_vm_writepages+0x53/0xa0 [xfs] Jul 2 22:11:56 Proxy3 kernel: [48777.609028] [<ffffffff8119c73e>] do_writepages+0x1e/0x30 Jul 2 22:11:56 Proxy3 kernel: [48777.609048] [<ffffffff8118f516>] __filemap_fdatawrite_range+0xc6/0x100 Jul 2 22:11:56 Proxy3 kernel: [48777.609053] [<ffffffff8118f691>] filemap_write_and_wait_range+0x41/0x90 Jul 2 22:11:56 Proxy3 kernel: [48777.609103] [<ffffffffc07af7d3>] xfs_file_fsync+0x63/0x210 [xfs] Jul 2 22:11:56 Proxy3 kernel: [48777.609130] [<ffffffff81249e9b>] vfs_fsync_range+0x4b/0xb0 Jul 2 22:11:56 Proxy3 kernel: [48777.609156] [<ffffffffc04b856d>] nfsd_vfs_write+0x14d/0x380 [nfsd] Jul 2 22:11:56 Proxy3 kernel: [48777.609176] [<ffffffffc04babf0>] nfsd_write+0x120/0x2f0 [nfsd] Jul 2 22:11:56 Proxy3 kernel: [48777.609190] [<ffffffffc04c105c>] nfsd3_proc_write+0xbc/0x150 [nfsd] Jul 2 22:11:56 Proxy3 kernel: [48777.609207] [<ffffffffc04b3348>] nfsd_dispatch+0xb8/0x200 [nfsd] Jul 2 22:11:56 Proxy3 kernel: [48777.609243] [<ffffffffc03cd21c>] svc_process_common+0x40c/0x650 [sunrpc] Jul 2 22:11:56 Proxy3 kernel: [48777.609267] [<ffffffffc03ce5c3>] svc_process+0x103/0x1b0 [sunrpc] Jul 2 22:11:56 Proxy3 kernel: [48777.609286] [<ffffffffc04b2d8f>] nfsd+0xef/0x160 [nfsd] Jul 2 22:11:56 Proxy3 kernel: [48777.609297] [<ffffffffc04b2ca0>] ? nfsd_destroy+0x60/0x60 [nfsd] Jul 2 22:11:56 Proxy3 kernel: [48777.609322] [<ffffffff810a06a8>] kthread+0xd8/0xf0 Jul 2 22:11:56 Proxy3 kernel: [48777.609329] [<ffffffff810a05d0>] ? kthread_create_on_node+0x1a0/0x1a0 Jul 2 22:11:56 Proxy3 kernel: [48777.609369] [<ffffffff81825bcf>] ret_from_fork+0x3f/0x70 Jul 2 22:11:56 Proxy3 kernel: [48777.609374] [<ffffffff810a05d0>] ? kthread_create_on_node+0x1a0/0x1a0 Jul 2 22:11:56 Proxy3 kernel: [48777.609444] XFS (rbd8): Internal error xfs_trans_cancel at line 990 of file /home/kernel/COD/linux/fs/xfs/xfs_trans.c. Caller xfs_iomap_write_allocate+0x270/0x380 [xfs] > >> >> --D >> >>> >>> I have read that inode32 allocates more randomly compared to inode64, so I'm >>> not sure if it's worth trying this as there will likely be less than a 1000 >>> files per FS. >>> >>> Or am I best just to run fsr after everything has been copied on? >>> >>> Thanks for any advice >>> Nick >> >>> _______________________________________________ >>> xfs mailing list >>> xfs@xxxxxxxxxxx >>> http://oss.sgi.com/mailman/listinfo/xfs >> _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs