On 9/29/23, Christian Brauner <brauner@xxxxxxxxxx> wrote: > On Fri, Sep 29, 2023 at 03:31:29PM +0200, Jann Horn wrote: >> On Fri, Sep 29, 2023 at 11:20 AM Christian Brauner <brauner@xxxxxxxxxx> >> wrote: >> > > But yes, that protection would be broken by SLAB_TYPESAFE_BY_RCU, >> > > since then the "f_count is zero" is no longer a final thing. >> > >> > I've tried coming up with a patch that is simple enough so the pattern >> > is easy to follow and then converting all places to rely on a pattern >> > that combine lookup_fd_rcu() or similar with get_file_rcu(). The >> > obvious >> > thing is that we'll force a few places to now always acquire a >> > reference >> > when they don't really need one right now and that already may cause >> > performance issues. >> >> (Those places are probably used way less often than the hot >> open/fget/close paths though.) >> >> > We also can't fully get rid of plain get_file_rcu() uses itself because >> > of users such as mm->exe_file. They don't go from one of the rcu >> > fdtable >> > lookup helpers to the struct file obviously. They rcu replace the file >> > pointer in their struct ofc so we could change get_file_rcu() to take a >> > struct file __rcu **f and then comparing that the passed in pointer >> > hasn't changed before we managed to do atomic_long_inc_not_zero(). >> > Which >> > afaict should work for such cases. >> > >> > But overall we would introduce a fairly big and at the same time subtle >> > semantic change. The idea is pretty neat and it was fun to do but I'm >> > just not convinced we should do it given how ubiquitous struct file is >> > used and now to make the semanics even more special by allowing >> > refcounts. >> > >> > I've kept your original release_empty_file() proposal in vfs.misc which >> > I think is a really nice change. >> > >> > Let me know if you all passionately disagree. ;) > > So I'm appending the patch I had played with and a fix from Jann on top. > @Linus, if you have an opinion, let me know what you think. > > Also available here: > https://gitlab.com/brauner/linux/-/commits/vfs.file.rcu > > Might be interesting if this could be perfed to see if there is any real > gain for workloads with massive numbers of fds. > I would feel safer with a guaranteed way to tell that the file was reallocated. I think this could track allocs/frees with a sequence counter embedded into the object, say odd means deallocated and even means allocated. Then you would know for a fact whether you raced with the file getting whacked and would never have to wonder if you double-checked everything you needed (like that f_mode) thing. This would also mean that consumers which get away with poking around the file without getting a ref could still do it, this is at least true for tid_fd_mode. All of them would need patching though. Extending struct file is not ideal by any means, but the good news is that: 1. there is a 4 byte hole in there, if one is fine with an int-sized counter 2. if one insists on 8 bytes, the struct is 232 bytes on my kernel (debian). still some room up to 256, so it may be tolerable? -- Mateusz Guzik <mjguzik gmail.com>