Re: [PATCH v2 bpf-next 0/9] add new acquire/release BPF kfuncs

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Wed, 27 Mar 2024 14:41:36 -0700

On Mon, Mar 18, 2024 at 6:14 AM Christian Brauner <brauner@xxxxxxxxxx> wrote:
>
> On Wed, Mar 13, 2024 at 02:05:13PM -0700, Alexei Starovoitov wrote:
>
> But this whole polemic just illustrates that you simply didn't bother to
> understand how the code works. The way you talk about UAF together with
> SLAB_TYPESAFE_BY_RCU is telling. Please read the code instead of
> guessing.

Ok. Fair enough. I've read the code and old threads from Sep-Nov.
I think the concerns about typesafe_by_rcu made folks believe in
races that don't exist.

           if (unlikely(!atomic_long_inc_not_zero(&file->f_count)))
..
            *  (a) the file ref already went down to zero and the
            *      file hasn't been reused yet or the file count
            *      isn't zero but the file has already been reused.

..
            if (unlikely(file != rcu_dereference_raw(*fdentry)) ||

The first part of the comment is partially incorrect.
(it's in the wrong place).

The file ref could have been down to zero and not reused yet,
but it's before atomic_long_inc_not_zero.
Once the code reaches 2nd check the file guaranteed to be reused
and it went through init_file(), because that's the only code
that brings it back from zero.
This race is ok:

cpu0                                    cpu1
file = rcu_dereference_raw(*fdentry);
// file->f_count == 1
                                        rcu_assign_pointer(fdt->fd[fd], NULL);
                                        fput() // reaches zero

atomic_long_inc_not_zero()
// will not succeed.

This race is ok too:
cpu0                                    cpu1
file = rcu_dereference_raw(*fdentry);
// file->f_count == 1
                                        rcu_assign_pointer(fdt->fd[fd], NULL);

atomic_long_inc_not_zero()
// succeeds. f_count == 2
                                        fput() // f_count == 1

file != rcu_dereference_raw(*fdentry)
// will fail
but it doesn't have to.
This is a safe race.
It's no different if cpu0 went through
these steps including successful last check
                                        and then cpu1 did close(fd)

The file held by cpu0 is not on the
path to zero.

Similarly, back then, there was a concern about two parallel __fget_files_rcu()
where one cpu incremented refcnt, failed some check and didn't do fput yet.
In this case the file is not on the path to zero either.
Both cpu-s saw non-zero f_count when they went through atomic_long_inc_not_zero.
The file is not on the path to be reused.

Now the second part of the comment
"the file count isn't zero but the file has already been reused"
is misleading.

The (file != rcu_dereference_raw(*fdentry)) check is racy.
Another cpu could have replaced that slot right after that check.
Example:
cpu0 doing lockless __fget_files_rcu() while cpu1 doing sys_close.
__fget_files_rcu() will be returning a file that doesn't exist in fdt.
And it's safe.

This race is possible:
cpu0                                    cpu1
file = rcu_dereference_raw(*fdentry);
                                        fput() reaches zero
                                        file_free
                                        alloc_empty_file // got same file
                                        init_file // f_count = 1
atomic_long_inc_not_zero()
// succeeds. f_count == 2
file != rcu_dereference_raw(*fdentry))
// preventing a reuse of a file that
was never in this fdt.

The only value of this check is to make sure that this file
either _is_ or _was_ at some point in this fdt.
It's not preventing reuse per-se.

This race is possible:
cpu0                                    cpu1
file = rcu_dereference_raw(*fdentry);
                                        fput() reaches zero
                                        file_free
                                        alloc_empty_file // got same file
                                        init_file // f_count = 1
                                        fd_install
atomic_long_inc_not_zero()
// succeeds. f_count == 2
file == rcu_dereference_raw(*fdentry))
// success, though the file _was reused_.

I suggest to revise the comment.

>
> You've been provided:
>
> a) good reasons why the patchset in it's current form isn't acceptable
>    repeated multiple times

We will improve commit logs in the next revision.

> b) support for exporting a variant of bpf_d_path() that is safe to use

good, but bpf_d_path kfunc alone is not useful.
As Jann noted back in September:
https://lore.kernel.org/all/CAG48ez2d5CW=CDi+fBOU1YqtwHfubN3q6w=1LfD+ss+Q1PWHgQ@xxxxxxxxxxxxxx/

The conversion of files to typesafe_by_rcu broke that verifier
assumption about mm->exe_file and we need kfuncs to safely
acquire/release file reference to fix that breakage.

> c) a request that all kfunc exports for the vfs will have to be located
>    under fs/, not in kernel/bpf/

we've added kfuncs to
net/netfilter/, net/xfrm/, net/ipv4/, kernel/cgroup/, drivers/hid/
because maintainers of those subsystems demonstrated understanding
of what bpf is doing and what these kfuncs are for.

We can put them in fs/, but you need to demonstrate willingness to
understand the problem we're solving instead of arguing
about how hard file typesafe_by_rcu is to understand.

> d) a path on how to move forward with additional kfunc requests:
>    Clear and documented rules when it's ok for someone to come along and
>    request access to bpf kfuncs when it's to be rejected and when it's
>    ok to be supported.

There are ~36500 EXPORT_SYMBOL in the kernel.
Are there "clear documented rules when it's ok for someone to"
add or remove them?
There is a gentleman's agreement that maintainers of subsystems need to
be cc-ed when new EXPORT_SYMBOL-s are added.
In this case no new EXPORT_SYMBOLs are requested.

Compare that to 221 bpf helpers (which are uapi, and for the last
2 years we didn't add a single one) and 151 bpf kfuncs which are
not uapi as clearly documented in Documentation/bpf/kfuncs.rst
When developers want to add them they cc bpf@vger and relevant
subsystems just like we did with netfilter, xfrm, cgroup, hid.
kfunc deprecation rules are also documented in kfunc.rst

> You repeatedly threatening to go over the heads of people will not make
> them more amenable to happily integrate with your subsystem.

This is not it. We made our own mistakes with bpf_d_path safety, and now
file typesafe_by_rcu broke bpf safety assumptions.
We have to fix it one way or the other.
It's bpf that got affected by your changes.
But we don't demand that you fix bpf bits. We're fixing them.
But you have to provide technical reasons why file acquire/release
kfuncs are not suitable.
"Only 3 people understand typesafe_by_rcu" is not it.