Re: [PATCH] secretmem: Prevent secretmem_users from wrapping to zero

Kees Cook <keescook@xxxxxxxxxxxx> · Mon, 25 Oct 2021 17:17:57 -0700

On Mon, Oct 25, 2021 at 04:37:01PM -0700, Linus Torvalds wrote:
> If we want other semantics, it should be a new type.

Okay, that's reasonable.

> > I see that is more like a "shared resource usage count" where the shared
> > resource doesn't necessarily disappear when we reach "no users"?
> 
> So I think that's really "atomic_t".
> 
> And instead of saturating, people should always check such shared
> resources for limits.

Right, but people make mistakes, etc. I agree about the limit being much
more sane than saturating (though in the cases of "missed decrement"),
we get to the same place: an open-coded check for the limit that never
goes down doesn't matter if it's refcount_t nor atomic_t. :)

> > i.e. there is some resource, and it starts its life with no one using it
> > (count = 1).
> 
> You are already going off into the weeds.
> 
> That's not a natural thing to do. It's already confusing. Really. Read
> that sentence yourself, and read it like an outsider.
> 
> "No one is using it, so count == 1" is a nonsensican statement on the
> face of it.
> 
> You are thinking of a refcount_t trick, not some sane semantics.
> 
> Yes, we have played off-by-one games in the kernel before. We've done
> it for various subtle reasons.

Right, sure, but it's not a rare pattern. Given that it exists, and that
it _does_ get used for allocation management (e.g. module loader), it
seems worth constructing a proper type for it so that all the open coded
stuff around these instances can be consolidated, and the API can be
defined in a way that will behave sanely.

> I really don't see what's wrong with 'atomic_t', and just checking for limits.

It's that last part. :) If we go through atomic_dec() see a zero and do
something, okay, fine. But these places need to check for insane
conditions too ("we got a -1 back -- this means there's a bug but what
do we do?"). Same for atomic_inc(): "oh, we're at our limit, do
something", but what above discovering ourselves above the limit?
There's nothing about using the atomic_t primitives that enforces these
kinds of checks. (And there likely shouldn't be for atomic_t -- it's a
plain type.) But we likely need something that fills in this API gap
between atomic_t and refcount_t.

> So if a user can ever trigger a saturating counter, that's a big big
> problem in itself.

Yes! It is. :) But they don't get to gain control over a Use-after-Free.
The risk to the system is DoS instead of loss of execution control.
That's a meaningful risk downgrade.

So, what's the right semantics for an atomic type that could be used in
the module loader, that would catch kernel counting bugs in a safe manner?

The "refcount_t but 1-based" is close, but clearly not the right name. :)

-- 
Kees Cook