Re: [PATCH] secretmem: Prevent secretmem_users from wrapping to zero

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Mon, 25 Oct 2021 16:37:01 -0700

On Mon, Oct 25, 2021 at 3:30 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
>
> > A refcount being zero means that the data it referenced no longer exists.
>
> I don't disagree with this definition, but I would like to understand how
> some other use-cases fit into this.

I certainly hope that there are no other use-cases for 'recount_t',
because that "zero is invalid" is very much part of the semantics.

If we want other semantics, it should be a new type.

>      What about the case of what
> I see that is more like a "shared resource usage count" where the shared
> resource doesn't necessarily disappear when we reach "no users"?

So I think that's really "atomic_t".

And instead of saturating, people should always check such shared
resources for limits.

> i.e. there is some resource, and it starts its life with no one using it
> (count = 1).

You are already going off into the weeds.

That's not a natural thing to do. It's already confusing. Really. Read
that sentence yourself, and read it like an outsider.

"No one is using it, so count == 1" is a nonsensican statement on the
face of it.

You are thinking of a refcount_t trick, not some sane semantics.

Yes, we have played off-by-one games in the kernel before. We've done
it for various subtle reasons.

For example, traditionally, on x86, with atomic counting there are
three special situations: negative, 0 and positive. So if you use the
traditional x86 counting atomics (just add/sub/inc/dec, no xadd) then
there are situations where you can get more information about the
result in %eflags if you don't use zero as the initial value, but -1.

Because then you can do "inc", and if ZF is set, you know you were the
_first_ person to increment it. And when you use "dec", and SF is set
afterwards, you know you are the _last_ person to decrement it.

That was useful when things like "xadd" weren't available, and cmpxchg
loops are expensive. So we used to have counters where -1 was that
"zero point". Very similar to your "1 is the zero point".

But was it _logical_? No. It was an implementation trick. I think
we've removed all those cases because it was so subtle and confusing
(but maybe we still have it somewhere - I did not check).

So we've certainly played those kinds of games. But it had better be
for a really good reason.

> I don't see as clear a distinction between secretmem and the above
> examples.

I really don't see what's wrong with 'atomic_t', and just checking for limits.

Saturating counters are EVIL AND BAD. They are a DoS waiting to
happen. Once you saturate, the machine is basically dead. You may have
"protected" against some attack, but you did so by killing the machine
and making the resource accounting no longer work.

So if a user can ever trigger a saturating counter, that's a big big
problem in itself.

In contrast, an 'atomic_t' with a simple limit? It just works.

And it doesn't need illogical tricks to work.

Stop thinking that refcount_t is a good type. Start realizing the
downsides. Start understanding that saturation is a HORRENDOUSLY BAD
solution, and horrible QoI.

              Linus