Hi Willy, linux-btrfs@vger, Matthew Wilcox (Oracle) wrote on Sun, Jun 19, 2022 at 04:11:41PM +0100: > We had an off-by-one error which meant that we never marked the first page > in a read as accessed. This was visible as a slowdown when re-reading > a file as pages were being evicted from cache too soon. In reviewing > this code, we noticed a second bug where a multi-page folio would be > marked as accessed multiple times when doing reads that were less than > the size of the folio. when debugging an unrelated issue (short reads on btrfs with io_uring and O_DIRECT[1]), I noticed that my horrible big file copy speeds fell down from ~2GB/s (there's compression and lots of zeroes) to ~100MB/s the second time I was copying it with cp. I've taken a moment to bisect this and came down to this patch. [1] https://lore.kernel.org/all/YrrFGO4A1jS0GI0G@xxxxxxxxxxxxxxxxx/T/#u Dropping caches (echo 3 > /proc/sys/vm/drop_caches) restore the speed, so there appears to be some bad effect to having the file in cache for fiemap? To be fair that file is pretty horrible: --- # compsize bigfile Processed 1 file, 194955 regular extents (199583 refs), 0 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 15% 3.7G 23G 23G none 100% 477M 477M 514M zstd 14% 3.2G 23G 23G --- Here's what perf has to say about it on top of this patch when running `cp bigfile /dev/null` the first time: 98.97% 0.00% cp [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe entry_SYSCALL_64_after_hwframe do_syscall_64 - 93.40% ksys_read - 93.36% vfs_read - 93.25% new_sync_read - 93.20% filemap_read - 83.38% filemap_get_pages - 82.76% page_cache_ra_unbounded + 59.72% folio_alloc + 13.43% read_pages + 8.75% filemap_add_folio 0.64% xa_load 0.52% filemap_get_read_batch + 8.75% copy_page_to_iter - 4.73% __x64_sys_ioctl - 4.72% do_vfs_ioctl - btrfs_fiemap - 4.70% extent_fiemap + 3.95% btrfs_check_shared + 0.70% get_extent_skip_holes and second time: 99.90% 0.00% cp [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwfram entry_SYSCALL_64_after_hwframe do_syscall_64 - 94.62% __x64_sys_ioctl do_vfs_ioctl btrfs_fiemap - extent_fiemap - 50.01% get_extent_skip_holes - 50.00% btrfs_get_extent_fiemap - 49.97% count_range_bits rb_next + 28.72% lock_extent_bits + 15.55% __clear_extent_bit - 5.21% ksys_read + 5.21% vfs_read (if this isn't readable, 95% of the time is spent on fiemap the second time around) I've also been observing RCU stalls on my laptop with the same workload (cp to /dev/null), but unfortunately I could not reproduce in qemu so I could not take traces to confirm they are caused by the same commit but given the workload I'd say that is it? I can rebuild a kernel for my laptop and confirm if you think it should be something else. I didn't look at the patch itself (yet) so have no suggestion at this point - it's plausible the patch fixed something and just exposed slow code that had been there all along so it might be better to look at the btrfs side first, I don't know. If you don't manage to reproduce I'll be happy to test anything thrown at me at the very least. Thanks, -- Dominique Martinet | Asmadeus