On Thu, Aug 19, 2021 at 12:09:37PM -0700, Linus Torvalds wrote: > On Thu, Aug 19, 2021 at 8:21 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > If we can skip the OF... we can do something like this: > > Honestly, I think a lot of the refcount code is questionable. It was > absolutely written with no care for performance AT ALL. That's a bit unfair I feel. Will's last rewrite of the stuff was specifically to address performance issues. > I'm not sure it helps to then add arch-specific code for it without > thinking it through a _lot_ first. > > It might be better to just have a "atomic_t with overflow handling" in > general, exactly because the refcount_t was designed and written > without any regard for code that cares about performance. The primary concern was to use a single unconditional atomic op where possible (mostly fetch_add), as the atomic op dominates whatever else it does. The rest is just because C absolutely sucks at conditions. Doing atomic_t with overflow handling would require doing the whole thing in arch asm. > > static inline bool refcount_dec_and_test(refcount_t *r) > > { > > asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t" > > "jz %l[cc_zero]\n\t" > > "jns 1f\n\t" > > I think you can use "jl" for the bad case. Duh yes. I clearly didn't have my head on straight. > I think it's better to handle that case out-of-line than play games > with UD, though - this is going to be the rare case, the likelihood > that we get the fixup wrong is just too big. Once it's out-of-line > it's not as critical any more, even if it does add to the size of the > code. Fine with me; although the immediate complaint from Andrew was about size, hence my UD1 hackery. > So if we do this, I think it should be something like > > static inline __must_check bool refcount_dec_and_test(refcount_t *r) > { > asm_volatile_goto (LOCK_PREFIX "decl %[var]\n\t" > "jz %l[cc_zero]\n\t" > "jl %l[cc_error]" > : : [var] "m" (r->refs.counter) > : "memory" : cc_zero, cc_error); > > return false; > > cc_zero: > return true; > cc_error: > refcount_warn_saturate(r, REFCOUNT_SUB_UAF); > return false; > } > > and we can discuss whether we could improve on the > refcount_warn_saturate() separately. I can do the refcount_warn_saturate() change separately. Let me go check how small I can get it... > But see above: maybe just make this a separate "careful atomic_t", > with the option to panic-on-overflow. So then we could get rid of > refcount_warn_saturate() enmtirely above, and instead just have a > (compile-time option) BUG() case, with the non-careful version just > being our existing atomic_dec_and_test. We used to have that option; the argument was made that everybody cares about security and as long as this doesn't show up on benchmarks we good. Also, I don't think most people want the overflow to go BUG, WARN is mostly the right thing and only the super paranoid use panic-on-warn or something.