On Wed, Oct 11, 2017 at 11:06:13PM +0200, Jan Kara wrote: > On Wed 11-10-17 10:34:47, Dave Hansen wrote: > > On 10/11/2017 01:06 AM, Jan Kara wrote: > > >>> when rebasing our enterprise distro to a newer kernel (from 4.4 to 4.12) we > > >>> have noticed a regression in bonnie++ benchmark when deleting files. > > >>> Eventually we have tracked this down to a fact that page cache truncation got > > >>> slower by about 10%. There were both gains and losses in the above interval of > > >>> kernels but we have been able to identify that commit 83929372f629 "filemap: > > >>> prepare find and delete operations for huge pages" caused about 10% regression > > >>> on its own. > > >> It's odd that just checking if some pages are huge should be that > > >> expensive, but ok .. > > > Yeah, I was surprised as well but profiles were pretty clear on this - part > > > of the slowdown was caused by loads of page->_compound_head (PageTail() > > > and page_compound() use that) which we previously didn't have to load at > > > all, part was in hpage_nr_pages() function and its use. > > > > Well, page->_compound_head is part of the same cacheline as the rest of > > the page, and the page is surely getting touched during truncation at > > _some_ point. The hpage_nr_pages() might cause the cacheline to get > > loaded earlier than before, but I can't imagine that it's that expensive. > > Then my intuition matches yours ;) but profiles disagree. Do you get the same benefit across different filesystems? > That being said > I'm not really expert in CPU microoptimizations and profiling so feel free > to gather perf profiles yourself before and after commit 83929372f629 and > get better explanation of where the cost is - I would be really curious > what you come up with because the explanation I have disagrees with my > intuition as well... When I see this sort of stuff my immediate thought is "what is the change in the icache footprint of the hot codepath"? There's a few IO benchmarks (e.g. IOZone) that are l1/l2 cache footprint sensitive on XFS, and can see up to 10% differences in performance from kernel build to kernel build that have no code changes in the IO paths or l1/l2 dcache footprint. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx