On Sun, Mar 02, 2025 at 09:47:10AM +0100, Steinar H. Gunderson wrote: > This ioctl call successfully deduplicated the data, but it took 71.52 _seconds_. > Deduplicating the entire set is on the order of days. I don't understand why > this would take so much time; I understand that it needs to make a read to > verify that the file ranges are indeed the same (this is the only sane API > design!), but it comes out to something like 2800 kB/sec from an array that > can deliver almost 400 times that. There is no other activity on the file > system in question, so it should not conflict with other activity (locks > etc.), and the process does not appear to be taking significant amounts of > CPU time. iostat shows read activity varying from maybe 300 kB/sec to > 12000 kB/sec or so; /proc/<pid>/stack says: > > [<0>] folio_wait_bit_common+0x174/0x220 > [<0>] filemap_read_folio+0x64/0x8b > [<0>] do_read_cache_folio+0x119/0x164 > [<0>] __generic_remap_file_range_prep+0x372/0x568 > [<0>] generic_remap_file_range_prep+0x7/0xd This does comparison one folio at a time and does no readahead. Hence if the data isn't already in cache, it is doing synchronous small reads and waiting for every single one of them. This really should use an internal interface that is capable of issuing readahead... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx