On Mon, Aug 05, 2024 at 05:04:05PM -0700, Linus Torvalds wrote: > On Mon, 5 Aug 2024 at 16:45, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > So... do we really need that indirect? The problem would be > > seeing ->max_fds update before that of the array pointers. > > The reason is simply so that we can have one single allocation. > > In fact, quite often, it's zero allocations when you can use the > 'struct fdtable fdtab' that is embedded in 'struct files_struct'. More to the point, we use arrays embedded into files_struct. >But > the 'struct fdtable' was a convenient way to allocate all those > bitmaps _and_ the 'struct file *' array all together. I don't think so - IIRC, it was introduced when we added RCU'd file lookup. Let me check... Yep; badf16621c1f "[PATCH] files: break up files struct", with RCU support as rationale. Followed by ab2af1f50050 "[PATCH] files: files struct with RCU". Before those commits ->max_fds and ->fd used to live in files_struct and fget() (OK, fcheck_files()) had been taking ->files_lock, so that concurrent expand_files() would not screw us over. The problem was not just the need to delay freeing old ->fd array; that could be dealt with easily enough. Think what would've happened if fcheck_files() ended up fetching new value of ->max_fds and old value of ->fd, which pointed to pre-expansion array. Indirection allowed to update both in one store. The thing is, ->max_fds for successive ->fdt is monotonously increasing. So a lockless reader seeing the old size is fine with the new table - we just need to prevent the opposite. Would rcu_assign_pointer of pointers + smp_store_release of max_fds on expand (all under ->files_lock, etc.) paired with smp_load_acquire of max_fds + rcu_dereference of ->fd on file lookup side be enough, or do we need an explicit smp_wmb/smp_rmb in there? > And yes, I think it's entirely a historical artifact of how that thing > grew to be. Long long long ago there was no secondary allocation at > all, and MAX_OPEN was fixed at 20. Yes - but that had changed way before 2005 when those patches went in. Separate allocation of bitmaps is 2.3.12pre7 and separate allocation of ->fd is even older - 2.1.90pre1. NR_OPEN had been 1024 at that point; earlier history is 0.97: 20 -> 32 0.98.4: 32 -> 256 2.1.26: 256 -> 1024