On 12/11/19 9:56 AM, Kees Cook wrote: > On Wed, Dec 11, 2019 at 10:20:13AM +0000, Will Deacon wrote: >> On Tue, Dec 10, 2019 at 03:55:05PM -0700, Jens Axboe wrote: >>> On 12/10/19 3:46 PM, Kees Cook wrote: >>>> On Tue, Dec 10, 2019 at 03:21:04PM -0700, Jens Axboe wrote: >>>>> On 12/10/19 3:04 PM, Jann Horn wrote: >>>>>> [context preserved for additional CCs] >>>>>> >>>>>> On Tue, Dec 10, 2019 at 4:57 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >>>>>>> Recently had a regression that turned out to be because >>>>>>> CONFIG_REFCOUNT_FULL was set. >>>>>> >>>>>> I assume "regression" here refers to a performance regression? Do you >>>>>> have more concrete numbers on this? Is one of the refcounting calls >>>>>> particularly problematic compared to the others? >>>>> >>>>> Yes, a performance regression. io_uring is using io-wq now, which does >>>>> an extra get/put on the work item to make it safe against async cancel. >>>>> That get/put translates into a refcount_inc and refcount_dec per work >>>>> item, and meant that we went from 0.5% refcount CPU in the test case to >>>>> 1.5%. That's a pretty substantial increase. >>>>> >>>>>> I really don't like it when raw atomic_t is used for refcounting >>>>>> purposes - not only because that gets rid of the overflow checks, but >>>>>> also because it is less clear semantically. >>>>> >>>>> Not a huge fan either, but... It's hard to give up 1% of extra CPU. You >>>>> could argue I could just turn off REFCOUNT_FULL, and I could. Maybe >>>>> that's what I should do. But I'd prefer to just drop the refcount on the >>>>> io_uring side and keep it on for other potential useful cases. >>>> >>>> There is no CONFIG_REFCOUNT_FULL any more. Will Deacon's version came >>>> out as nearly identical to the x86 asm version. Can you share the >>>> workload where you saw this? We really don't want to regression refcount >>>> protections, especially in the face of new APIs. >>>> >>>> Will, do you have a moment to dig into this? >>> >>> Ah, hopefully it'll work out ok, then. The patch came from testing the >>> full backport on 5.2. > > Oh good! I thought we had some kind of impossible workload. :) > >>> Do you have a link to the "nearly identical"? I can backport that >>> patch and try on 5.2. >> >> You could try my refcount/full branch, which is what ended up getting merged >> during the merge window: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=refcount/full > > Yeah, as you can see in the measured tight-loop timings in > https://git.kernel.org/linus/dcb786493f3e48da3272b710028d42ec608cfda1 > there was 0.1% difference for Will's series compared to the x86 assembly > version, where as the old FULL was almost 70%. That looks very promising! Hopefully the patch is moot at that point, I dropped it from the series yesterday in any case. I'll revisit as soon as I can and holler if there's an issue. -- Jens Axboe