On Tue, Aug 06, 2024 at 02:02:17AM GMT, Al Viro wrote: > On Mon, Aug 05, 2024 at 05:04:05PM -0700, Linus Torvalds wrote: > > On Mon, 5 Aug 2024 at 16:45, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > > > So... do we really need that indirect? The problem would be > > > seeing ->max_fds update before that of the array pointers. > > > > The reason is simply so that we can have one single allocation. > > > > In fact, quite often, it's zero allocations when you can use the > > 'struct fdtable fdtab' that is embedded in 'struct files_struct'. > > More to the point, we use arrays embedded into files_struct. > > >But > > the 'struct fdtable' was a convenient way to allocate all those > > bitmaps _and_ the 'struct file *' array all together. > > I don't think so - IIRC, it was introduced when we added RCU'd > file lookup. Let me check... Yep; badf16621c1f "[PATCH] files: > break up files struct", with RCU support as rationale. Followed > by ab2af1f50050 "[PATCH] files: files struct with RCU". > > Before those commits ->max_fds and ->fd used to live in > files_struct and fget() (OK, fcheck_files()) had been taking > ->files_lock, so that concurrent expand_files() would not > screw us over. > > The problem was not just the need to delay freeing old ->fd > array; that could be dealt with easily enough. Think what > would've happened if fcheck_files() ended up fetching > new value of ->max_fds and old value of ->fd, which pointed > to pre-expansion array. Indirection allowed to update > both in one store. > > The thing is, ->max_fds for successive ->fdt is monotonously > increasing. So a lockless reader seeing the old size is > fine with the new table - we just need to prevent the opposite. > > Would rcu_assign_pointer of pointers + smp_store_release of max_fds on expand > (all under ->files_lock, etc.) paired with > smp_load_acquire of max_fds + rcu_dereference of ->fd on file lookup side > be enough, or do we need an explicit smp_wmb/smp_rmb in there? Afair, smp_load_acquire() would be a barrier for both later loads and stores and smp_store_release() would be a barrier for both earlier loads and stores. Iiuc, here we only care about ordering stores to ->fd and max_fds on the write side and about ordering loads of max_fds and ->fd on the reader side. The reader doesn't actually write anything. In other words, we want to make ->fd visible before max_fds on the write side and we want to load max_fds after ->fd. So I think smp_wmb() and smp_rmb() would be sufficient. I also find it clearer in this case.