On Tue, 3 Mar 2020 at 11:54, Jann Horn <jannh@xxxxxxxxxx> wrote: > > Document the circumstances under which refcount_t's saturation mechanism > works deterministically. > > Signed-off-by: Jann Horn <jannh@xxxxxxxxxx> I /think/ the main point of Kees's suggestion was that FUTEX_TID_MASK is UAPI, so unlikely to change. > --- > > Notes: > v2: > - write down the math (Kees) > > include/linux/refcount.h | 23 ++++++++++++++++++----- > 1 file changed, 18 insertions(+), 5 deletions(-) > > diff --git a/include/linux/refcount.h b/include/linux/refcount.h > index 0ac50cf62d062..0e3ee25eb156a 100644 > --- a/include/linux/refcount.h > +++ b/include/linux/refcount.h > @@ -38,11 +38,24 @@ > * atomic operations, then the count will continue to edge closer to 0. If it > * reaches a value of 1 before /any/ of the threads reset it to the saturated > * value, then a concurrent refcount_dec_and_test() may erroneously free the > - * underlying object. Given the precise timing details involved with the > - * round-robin scheduling of each thread manipulating the refcount and the need > - * to hit the race multiple times in succession, there doesn't appear to be a > - * practical avenue of attack even if using refcount_add() operations with > - * larger increments. > + * underlying object. > + * Linux limits the maximum number of tasks to PID_MAX_LIMIT, which is currently > + * 0x400000 (and can't easily be raised in the future beyond FUTEX_TID_MASK). > + * With the current PID limit, if no batched refcounting operations are used and > + * the attacker can't repeatedly trigger kernel oopses in the middle of refcount > + * operations, this makes it impossible for a saturated refcount to leave the > + * saturation range, even if it is possible for multiple uses of the same > + * refcount to nest in the context of a single task: > + * > + * (UINT_MAX+1-REFCOUNT_SATURATED) / PID_MAX_LIMIT = > + * 0x40000000 / 0x400000 = 0x100 = 256 > + * > + * If hundreds of references are added/removed with a single refcounting > + * operation, it may potentially be possible to leave the saturation range; but > + * given the precise timing details involved with the round-robin scheduling of > + * each thread manipulating the refcount and the need to hit the race multiple > + * times in succession, there doesn't appear to be a practical avenue of attack > + * even if using refcount_add() operations with larger increments. > * > * Memory ordering > * =============== > > base-commit: 98d54f81e36ba3bf92172791eba5ca5bd813989b > -- > 2.25.0.265.gbab2e86ba0-goog >