On Sun, Jun 28, 2020 at 01:06:08PM +0100, Edwin Török wrote: > On Wed, 2020-06-24 at 18:17 -0700, Darrick J. Wong wrote: > > Mr. Torok: Could you try applying these patches to a recent kernel to > > see if they fix the fs crash problems you were seeing with > > duperemove, > > please? > > Hi, > > Thanks for the fix. > > I've tested commit e812e6bd89dc6489ca924756635fb81855091700 (reflink- > cleanups_2020-06-24) with my original test, and duperemove has now > finished successfully: > > [...] > Comparison of extent info shows a net change in shared extents of: > 237116676 > > There were some hung tasks reported in dmesg, but they recovered > eventually: > [32142.324709] INFO: task pool:5190 blocked for more than 120 seconds. > [32504.821571] INFO: task pool:5191 blocked for more than 120 seconds. > [32625.653640] INFO: task pool:5191 blocked for more than 241 seconds. > [32746.485671] INFO: task pool:5191 blocked for more than 362 seconds. > [32867.318072] INFO: task pool:5191 blocked for more than 483 seconds. > [34196.472677] INFO: task pool:5180 blocked for more than 120 seconds. > [34317.304542] INFO: task pool:5180 blocked for more than 241 seconds. > [34317.304627] INFO: task pool:5195 blocked for more than 120 seconds. > [34438.136740] INFO: task pool:5180 blocked for more than 362 seconds. > [34438.136816] INFO: task pool:5195 blocked for more than 241 seconds. > > The blocked tasks were alternating between these 2 stacktraces: > [32142.324715] Call Trace: > [32142.324721] __schedule+0x2d3/0x770 > [32142.324722] schedule+0x55/0xc0 > [32142.324724] rwsem_down_read_slowpath+0x16c/0x4a0 > [32142.324726] ? __wake_up_common_lock+0x8a/0xc0 > [32142.324750] ? xfs_vn_fiemap+0x32/0x80 [xfs] > [32142.324752] down_read+0x85/0xa0 > [32142.324769] xfs_ilock+0x8a/0x100 [xfs] > [32142.324784] xfs_vn_fiemap+0x32/0x80 [xfs] > [32142.324785] do_vfs_ioctl+0xef/0x680 > [32142.324787] ksys_ioctl+0x73/0xd0 > [32142.324788] __x64_sys_ioctl+0x1a/0x20 > [32142.324789] do_syscall_64+0x49/0xc0 > [32142.324790] entry_SYSCALL_64_after_hwframe+0x44/0xa > > [32504.821577] Call Trace: > [32504.821583] __schedule+0x2d3/0x770 > [32504.821584] schedule+0x55/0xc0 > [32504.821587] rwsem_down_write_slowpath+0x244/0x4d0 > [32504.821588] down_write+0x41/0x50 > [32504.821610] xfs_ilock2_io_mmap+0xc8/0x230 [xfs] > [32504.821628] ? xfs_reflink_remap_blocks+0x11f/0x2a0 [xfs] > [32504.821643] xfs_reflink_remap_prep+0x51/0x1f0 [xfs] > [32504.821660] xfs_file_remap_range+0xbe/0x2f0 [xfs] > [32504.821662] ? security_capable+0x3d/0x60 > [32504.821664] vfs_dedupe_file_range_one+0x12d/0x150 > [32504.821665] vfs_dedupe_file_range+0x156/0x1e0 > [32504.821666] do_vfs_ioctl+0x4a6/0x680 > [32504.821667] ksys_ioctl+0x73/0xd0 > [32504.821668] __x64_sys_ioctl+0x1a/0x20 > [32504.821669] do_syscall_64+0x49/0xc0 > [32504.821670] entry_SYSCALL_64_after_hwframe+0x44/0xa > > Looking at /proc/$(pidof duperemove)/fd it had ~80 files open at one > point, which causes a lot of seeking on a HDD if it was trying to > dedupe them all at once so I'm not too worried about these hung tasks. > > Running an xfs_repair after the duperemove finished showed no errors. > > > > > v2: various cleanups suggested by Brian Foster > > > > If you're going to start using this mess, you probably ought to just > > pull from my git trees, which are linked below. > > > > This is an extraordinary way to destroy everything. Enjoy! > > To be safe I've created an LVM snapshot before trying it. Yay! Thank you for testing this out! Sorry it broke in the first place, though. :/ --D > Best regards, > --Edwin >