On 6/28/22 12:32 PM, Kent Overstreet wrote: > On Tue, Jun 28, 2022 at 12:13:06PM -0600, Jens Axboe wrote: >> It's much less about using whatever amount of memory for inflight IO, >> and much more about not bloating fast path structures (of which the >> bio is certainly one). All of this gunk has to be initialized for >> each IO, and that's the real issue. >> >> Just look at the recent work for iov_iter and why ITER_UBUF makes >> sense to do. >> >> This is not a commentary on this patchset, just a general >> observation. Sizes of hot data structures DO matter, and quite a bit >> so. > > Younger me would have definitely been in agreement; initializing these > structs definitely tends to show up in profiles. Older me still greatly cares :-) > These days I'm somewhat less inclined towards that view - profiles > naturally highlight where your cache misses are happening, and > initializing a freshly allocated data structure is always going to be > a cache miss. But the difference between touching 3 and 6 contiguous > cachelines is practically nil... assuming we aren't doing anything > stupid like using memset (despite what Linus wants from the CPU > vendors, rep stos _still_ sucks) and perhaps inserting prefetches > where appropriate. > > And I see work going by that makes me really wonder if it was > justified - in particular I _really_ want to know if Christoph's bio > initialization change was justified by actual benchmarks and what > those numbers were, vs. looking at profiles. Wasn't anything in the > commit log... Not sure what Christoph change you are referring to, but all the ones that I did to improve the init side were all backed by numbers I ran at that time (and most/all of the commit messages will have that data). So yes, it is indeed still very noticeable. Maybe not at 100K IOPS, but at 10M on a core it most certainly is. I'm all for having solid and maintainable code, obviously, but frivolous bloating of structures and more expensive setup cannot be hand waved away with "it doesn't matter if we touch 3 or 6 cachelines" because we obviously have a disagreement on that. -- Jens Axboe