On 7/13/20 3:18 PM, Pavel Begunkov wrote: > On 14/07/2020 00:16, Jens Axboe wrote: >> On 7/13/20 3:12 PM, Pavel Begunkov wrote: >>> On 14/07/2020 00:09, Jens Axboe wrote: >>>> On 7/13/20 1:59 PM, Pavel Begunkov wrote: >>>>> @@ -3040,8 +3040,7 @@ static int io_read(struct io_kiocb *req, bool force_nonblock, >>>>> } >>>>> } >>>>> out_free: >>>>> - if (!(req->flags & REQ_F_NEED_CLEANUP)) >>>>> - kfree(iovec); >>>>> + kfree(iovec); >>>>> return ret; >>>>> } >>>> >>>> Faster to do: >>>> >>>> if (iovec) >>>> kfree(iovec) >>>> >>>> to avoid a stupid call. Kind of crazy, but I just verified with this one >>>> as well that it's worth about 1.3% CPU in my stress test. >>> >>> That looks crazy indeed >> >> I suspect what needs to happen is that kfree should be something ala: >> >> static inline void kfree(void *ptr) >> { >> if (ptr) >> __kfree(ptr); >> } >> >> to avoid silly games like this. Needs to touch all three slab >> allocators, though definitely in the trivial category. > > Just thought the same, but not sure it's too common to have kfree(NULL). Right, except the io_read/io_write path it'll be 100% common unless you have more than the inline number of segments. I see the same thing for eg the slab should_failslab() call, which isn't inlined even if the kconfig isn't enabled. And for should_fail_bio() as well. Those two add up to another ~1% or so of pointless overhead. > The drop is probably because of extra call + cold jumps with unlikely(). > > void kfree() { > trace_kfree(_RET_IP_, objp); > > if (unlikely(ZERO_OR_NULL_PTR(objp))) > return; > } Must be, since the kfree() one adds up to more than the two above ones that are just the call itself. -- Jens Axboe