Re: [PATCH v2] creds: Convert cred.usage to refcount_t

Kees Cook <keescook@xxxxxxxxxxxx> · Fri, 18 Aug 2023 11:46:20 -0700

On Fri, Aug 18, 2023 at 10:55:42AM -0700, Andrew Morton wrote:
> On Thu, 17 Aug 2023 21:17:41 -0700 Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> 
> > From: Elena Reshetova <elena.reshetova@xxxxxxxxx>
> > 
> > atomic_t variables are currently used to implement reference counters
> > with the following properties:
> >  - counter is initialized to 1 using atomic_set()
> >  - a resource is freed upon counter reaching zero
> >  - once counter reaches zero, its further
> >    increments aren't allowed
> >  - counter schema uses basic atomic operations
> >    (set, inc, inc_not_zero, dec_and_test, etc.)
> > 
> > Such atomic variables should be converted to a newly provided
> > refcount_t type and API that prevents accidental counter overflows and
> > underflows. This is important since overflows and underflows can lead
> > to use-after-free situation and be exploitable.
> 
> ie, if we have bugs which we have no reason to believe presently exist,
> let's bloat and slow down the kernel just in case we add some in the
> future?  Or something like that.  dangnabbit, that refcount_t.
> 
> x86_64 defconfig:
> 
> before:
>    text	   data	    bss	    dec	    hex	filename
>    3869	    552	      8	   4429	   114d	kernel/cred.o
>    6140	    724	     16	   6880	   1ae0	net/sunrpc/auth.o
> 
> after:
>    text	   data	    bss	    dec	    hex	filename
>    4573	    552	      8	   5133	   140d	kernel/cred.o
>    6236	    724	     16	   6976	   1b40	net/sunrpc/auth.o
> 
> 
> Please explain, in a non handwavy and non cargoculty fashion why this
> speed and space cost is justified.

Since it's introduction, refcount_t has found a lot of bugs. This is easy
to see even with a simplistic review of commits:

$ git log --date=short --pretty='format:%ad %C(auto)%h ("%s")' \
          --grep 'refcount_t:' | \
  cut -d- -f1 | sort | uniq -c
      1 2016
     15 2017
      9 2018
     23 2019
     24 2020
     18 2021
     24 2022
     10 2023

It's not really tapering off, either. All of these would have been silent
memory corruptions, etc. In the face of _what_ is being protected,
"cred" is pretty important for enforcing security boundaries, etc,
so having it still not protected is a weird choice we've implicitly
made. Given cred code is still seeing changes and novel uses (e.g.
io_uring), it's not unreasonable to protect it from detectable (and
_exploitable_) problems.

While the size differences look large in cred.o, it's basically limited
only to cred.o:

   text    data     bss     dec     hex filename
30515570        12255806        17190916        59962292        392f3b4 vmlinux.before
30517500        12255838        17190916        59964254        392fb5e vmlinux.after

And we've even seen performance _improvements_ in some conditions:
https://lore.kernel.org/lkml/20200615005732.GV12456@shao2-debian/

Looking at confirmed security flaws, exploitable reference counting
related bugs have dropped significantly. (The CVE database is irritating
to search, but most recent refcount-related CVEs are due to counts that
are _not_ using refcount_t.)

I'd rather ask the question, "Why should we _not_ protect cred lifetime
management?"

-Kees

-- 
Kees Cook