On Mon, Jun 26, 2023 at 03:04:53PM -0300, Marcelo Tosatti wrote: > Upon closer investigation, it was found that in current codebase, lookup_bh_lru > is slower than __find_get_block_slow: > > 114 ns per __find_get_block > 68 ns per __find_get_block_slow > > So remove the per-CPU buffer_head caching. LOL. That's amazing. I can't even see why it's so expensive. The local_irq_disable(), perhaps? Your test case is the best possible one for lookup_bh_lru() where you're not even doing the copy. Reviewed-by: Matthew Wilcox (oracle) <willy@xxxxxxxxxxxxx>