On Tue, Jun 28, 2022 at 12:13:06PM -0600, Jens Axboe wrote: > It's much less about using whatever amount of memory for inflight IO, > and much more about not bloating fast path structures (of which the bio > is certainly one). All of this gunk has to be initialized for each IO, > and that's the real issue. > > Just look at the recent work for iov_iter and why ITER_UBUF makes sense > to do. > > This is not a commentary on this patchset, just a general observation. > Sizes of hot data structures DO matter, and quite a bit so. Younger me would have definitely been in agreement; initializing these structs definitely tends to show up in profiles. These days I'm somewhat less inclined towards that view - profiles naturally highlight where your cache misses are happening, and initializing a freshly allocated data structure is always going to be a cache miss. But the difference between touching 3 and 6 contiguous cachelines is practically nil... assuming we aren't doing anything stupid like using memset (despite what Linus wants from the CPU vendors, rep stos _still_ sucks) and perhaps inserting prefetches where appropriate. And I see work going by that makes me really wonder if it was justified - in particular I _really_ want to know if Christoph's bio initialization change was justified by actual benchmarks and what those numbers were, vs. looking at profiles. Wasn't anything in the commit log...