On 8/12/21 12:51 AM, Christoph Hellwig wrote: > On Wed, Aug 11, 2021 at 01:35:28PM -0600, Jens Axboe wrote: >> The memset() used is measurably slower in targeted benchmarks. Get rid >> of it and fill in the bio manually, in a separate helper. > > If you have some numbers if would be great to throw them in here. It's about 1% of the overhead of the alloc after the cache, which comes later in the series. Percent│ return __underlying_memset(p, c, size); │ lea 0x8(%r8),%rdi │ bio_alloc_kiocb(): 2.18 │ cmove %rax,%r9 │ memset(): │ mov %r8,%rcx │ and $0xfffffffffffffff8,%rdi │ movq $0x0,(%r8) │ sub %rdi,%rcx │ add $0x60,%ecx │ shr $0x3,%ecx 55.02 │ rep stos %rax,%es:(%rdi) This is on AMD, might look different on Intel, the manual clear seems like a nice win on both. As a minor detail, avoids things like re-setting bio->bi_pool for cached entries, as it never changes. >> +static inline void __bio_init(struct bio *bio) > > Why is this split from bio_init and what are the criteria where an > initialization goes? Got rid of the helper. >> + bio->bi_flags = bio->bi_ioprio = bio->bi_write_hint = 0; > > Please keep each initialization on a separate line. Done -- Jens Axboe