On Mon, Mar 18, 2024 at 6:14 AM Christian Brauner <brauner@xxxxxxxxxx> wrote: > > On Wed, Mar 13, 2024 at 02:05:13PM -0700, Alexei Starovoitov wrote: > > But this whole polemic just illustrates that you simply didn't bother to > understand how the code works. The way you talk about UAF together with > SLAB_TYPESAFE_BY_RCU is telling. Please read the code instead of > guessing. Ok. Fair enough. I've read the code and old threads from Sep-Nov. I think the concerns about typesafe_by_rcu made folks believe in races that don't exist. if (unlikely(!atomic_long_inc_not_zero(&file->f_count))) .. * (a) the file ref already went down to zero and the * file hasn't been reused yet or the file count * isn't zero but the file has already been reused. .. if (unlikely(file != rcu_dereference_raw(*fdentry)) || The first part of the comment is partially incorrect. (it's in the wrong place). The file ref could have been down to zero and not reused yet, but it's before atomic_long_inc_not_zero. Once the code reaches 2nd check the file guaranteed to be reused and it went through init_file(), because that's the only code that brings it back from zero. This race is ok: cpu0 cpu1 file = rcu_dereference_raw(*fdentry); // file->f_count == 1 rcu_assign_pointer(fdt->fd[fd], NULL); fput() // reaches zero atomic_long_inc_not_zero() // will not succeed. This race is ok too: cpu0 cpu1 file = rcu_dereference_raw(*fdentry); // file->f_count == 1 rcu_assign_pointer(fdt->fd[fd], NULL); atomic_long_inc_not_zero() // succeeds. f_count == 2 fput() // f_count == 1 file != rcu_dereference_raw(*fdentry) // will fail but it doesn't have to. This is a safe race. It's no different if cpu0 went through these steps including successful last check and then cpu1 did close(fd) The file held by cpu0 is not on the path to zero. Similarly, back then, there was a concern about two parallel __fget_files_rcu() where one cpu incremented refcnt, failed some check and didn't do fput yet. In this case the file is not on the path to zero either. Both cpu-s saw non-zero f_count when they went through atomic_long_inc_not_zero. The file is not on the path to be reused. Now the second part of the comment "the file count isn't zero but the file has already been reused" is misleading. The (file != rcu_dereference_raw(*fdentry)) check is racy. Another cpu could have replaced that slot right after that check. Example: cpu0 doing lockless __fget_files_rcu() while cpu1 doing sys_close. __fget_files_rcu() will be returning a file that doesn't exist in fdt. And it's safe. This race is possible: cpu0 cpu1 file = rcu_dereference_raw(*fdentry); fput() reaches zero file_free alloc_empty_file // got same file init_file // f_count = 1 atomic_long_inc_not_zero() // succeeds. f_count == 2 file != rcu_dereference_raw(*fdentry)) // preventing a reuse of a file that was never in this fdt. The only value of this check is to make sure that this file either _is_ or _was_ at some point in this fdt. It's not preventing reuse per-se. This race is possible: cpu0 cpu1 file = rcu_dereference_raw(*fdentry); fput() reaches zero file_free alloc_empty_file // got same file init_file // f_count = 1 fd_install atomic_long_inc_not_zero() // succeeds. f_count == 2 file == rcu_dereference_raw(*fdentry)) // success, though the file _was reused_. I suggest to revise the comment. > > You've been provided: > > a) good reasons why the patchset in it's current form isn't acceptable > repeated multiple times We will improve commit logs in the next revision. > b) support for exporting a variant of bpf_d_path() that is safe to use good, but bpf_d_path kfunc alone is not useful. As Jann noted back in September: https://lore.kernel.org/all/CAG48ez2d5CW=CDi+fBOU1YqtwHfubN3q6w=1LfD+ss+Q1PWHgQ@xxxxxxxxxxxxxx/ The conversion of files to typesafe_by_rcu broke that verifier assumption about mm->exe_file and we need kfuncs to safely acquire/release file reference to fix that breakage. > c) a request that all kfunc exports for the vfs will have to be located > under fs/, not in kernel/bpf/ we've added kfuncs to net/netfilter/, net/xfrm/, net/ipv4/, kernel/cgroup/, drivers/hid/ because maintainers of those subsystems demonstrated understanding of what bpf is doing and what these kfuncs are for. We can put them in fs/, but you need to demonstrate willingness to understand the problem we're solving instead of arguing about how hard file typesafe_by_rcu is to understand. > d) a path on how to move forward with additional kfunc requests: > Clear and documented rules when it's ok for someone to come along and > request access to bpf kfuncs when it's to be rejected and when it's > ok to be supported. There are ~36500 EXPORT_SYMBOL in the kernel. Are there "clear documented rules when it's ok for someone to" add or remove them? There is a gentleman's agreement that maintainers of subsystems need to be cc-ed when new EXPORT_SYMBOL-s are added. In this case no new EXPORT_SYMBOLs are requested. Compare that to 221 bpf helpers (which are uapi, and for the last 2 years we didn't add a single one) and 151 bpf kfuncs which are not uapi as clearly documented in Documentation/bpf/kfuncs.rst When developers want to add them they cc bpf@vger and relevant subsystems just like we did with netfilter, xfrm, cgroup, hid. kfunc deprecation rules are also documented in kfunc.rst > You repeatedly threatening to go over the heads of people will not make > them more amenable to happily integrate with your subsystem. This is not it. We made our own mistakes with bpf_d_path safety, and now file typesafe_by_rcu broke bpf safety assumptions. We have to fix it one way or the other. It's bpf that got affected by your changes. But we don't demand that you fix bpf bits. We're fixing them. But you have to provide technical reasons why file acquire/release kfuncs are not suitable. "Only 3 people understand typesafe_by_rcu" is not it.