Re: Slow deduplication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Mar 02, 2025 at 09:47:10AM +0100, Steinar H. Gunderson wrote:
> This ioctl call successfully deduplicated the data, but it took 71.52 _seconds_.
> Deduplicating the entire set is on the order of days. I don't understand why
> this would take so much time; I understand that it needs to make a read to
> verify that the file ranges are indeed the same (this is the only sane API
> design!), but it comes out to something like 2800 kB/sec from an array that
> can deliver almost 400 times that. There is no other activity on the file
> system in question, so it should not conflict with other activity (locks
> etc.), and the process does not appear to be taking significant amounts of
> CPU time. iostat shows read activity varying from maybe 300 kB/sec to
> 12000 kB/sec or so; /proc/<pid>/stack says:
> 
>   [<0>] folio_wait_bit_common+0x174/0x220
>   [<0>] filemap_read_folio+0x64/0x8b
>   [<0>] do_read_cache_folio+0x119/0x164
>   [<0>] __generic_remap_file_range_prep+0x372/0x568
>   [<0>] generic_remap_file_range_prep+0x7/0xd

This does comparison one folio at a time and does no readahead.
Hence if the data isn't already in cache, it is doing synchronous
small reads and waiting for every single one of them. This really
should use an internal interface that is capable of issuing
readahead...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux