How about you post a working version of the patchset, even if it only looks at 0-63 and we are going to massage it afterwards. On Sat, Jun 29, 2024 at 5:41 PM Ma, Yu <yu.ma@xxxxxxxxx> wrote: > > > On 6/28/2024 5:12 PM, Jan Kara wrote: > > On Thu 27-06-24 21:59:12, Mateusz Guzik wrote: > >> On Thu, Jun 27, 2024 at 8:27 PM Ma, Yu <yu.ma@xxxxxxxxx> wrote: > >>> 2. For fast path implementation, the essential and simple point is to > >>> directly return an available bit if there is free bit in [0-63]. I'd > >>> emphasize that it does not only improve low number of open fds (even it > >>> is the majority case on system as Honza agreed), but also improve the > >>> cases that lots of fds open/close frequently with short task (as per the > >>> algorithm, lower bits will be prioritized to allocate after being > >>> recycled). Not only blogbench, a synthetic benchmark, but also the > >>> realistic scenario as claimed in f3f86e33dc3d("vfs: Fix pathological > >>> performance case for __alloc_fd()"), which literally introduced this > >>> 2-levels bitmap searching algorithm to vfs as we see now. > >> I don't understand how using next_fd instead is supposed to be inferior. > >> > >> Maybe I should clarify that by API contract the kernel must return the > >> lowest free fd it can find. To that end it maintains the next_fd field > >> as a hint to hopefully avoid some of the search work. > >> > >> In the stock kernel the first thing done in alloc_fd is setting it as > >> a starting point: > >> fdt = files_fdtable(files); > >> fd = start; > >> if (fd < files->next_fd) > >> fd = files->next_fd; > >> > >> that is all the calls which come here with 0 start their search from > >> next_fd position. > > Yup. > > > >> Suppose you implemented the patch as suggested by me and next_fd fits > >> the range of 0-63. Then you get the benefit of lower level bitmap > >> check just like in the patch you submitted, but without having to > >> first branch on whether you happen to be in that range. > >> > >> Suppose next_fd is somewhere higher up, say 80. With your general > >> approach the optimization wont be done whatsoever or it will be > >> attempted at the 0-63 range when it is an invariant it finds no free > >> fds. > >> > >> With what I'm suggesting the general idea of taking a peek at the > >> lower level bitmap can be applied across the entire fd space. Some > >> manual mucking will be needed to make sure this never pulls more than > >> one cacheline, easiest way out I see would be to align next_fd to > >> BITS_PER_LONG for the bitmap search purposes. > > Well, all you need to do is to call: > > > > bit = find_next_zero_bit(fdt->open_fds[start / BITS_PER_LONG], > > BITS_PER_LONG, start & (BITS_PER_LONG-1)); > > if (bit < BITS_PER_LONG) > > return bit + (start & ~(BITS_PER_LONG - 1)); > > > > > > in find_next_fd(). Not sure if this is what you meant by aligning next_fd > > to BITS_PER_LONG... > > > > Honza > > So the idea here is to take a peek at the word contains next_fd, return > directly if get free bit there. Not sure about the efficiency here if > open/close fd frequently and next_fd is distributed very randomly. Just > give a quick try for this part of code, seems kernel failed to boot > up😳, kind of out of expectation ... > -- Mateusz Guzik <mjguzik gmail.com>