On Tue 25-06-24 23:33:34, Ma, Yu wrote: > On 6/25/2024 8:53 PM, Jan Kara wrote: > > On Tue 25-06-24 13:52:57, Jan Kara wrote: > > > On Sat 22-06-24 11:49:02, Yu Ma wrote: > > > > There is available fd in the lower 64 bits of open_fds bitmap for most cases > > > > when we look for an available fd slot. Skip 2-levels searching via > > > > find_next_zero_bit() for this common fast path. > > > > > > > > Look directly for an open bit in the lower 64 bits of open_fds bitmap when a > > > > free slot is available there, as: > > > > (1) The fd allocation algorithm would always allocate fd from small to large. > > > > Lower bits in open_fds bitmap would be used much more frequently than higher > > > > bits. > > > > (2) After fdt is expanded (the bitmap size doubled for each time of expansion), > > > > it would never be shrunk. The search size increases but there are few open fds > > > > available here. > > > > (3) find_next_zero_bit() itself has a fast path inside to speed up searching > > > > when size<=64. > > > > > > > > Besides, "!start" is added to fast path condition to ensure the allocated fd is > > > > greater than start (i.e. >=0), given alloc_fd() is only called in two scenarios: > > > > (1) Allocating a new fd (the most common usage scenario) via > > > > get_unused_fd_flags() to find fd start from bit 0 in fdt (i.e. start==0). > > > > (2) Duplicating a fd (less common usage) via dup_fd() to find a fd start from > > > > old_fd's index in fdt, which is only called by syscall fcntl. > > > > > > > > With the fast path added in alloc_fd(), pts/blogbench-1.1.0 read is improved > > > > by 17% and write by 9% on Intel ICX 160 cores configuration with v6.10-rc4. > > > > > > > > Reviewed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> > > > > Signed-off-by: Yu Ma <yu.ma@xxxxxxxxx> > > > > --- > > > > fs/file.c | 35 +++++++++++++++++++++-------------- > > > > 1 file changed, 21 insertions(+), 14 deletions(-) > > > > > > > > diff --git a/fs/file.c b/fs/file.c > > > > index a3b72aa64f11..50e900a47107 100644 > > > > --- a/fs/file.c > > > > +++ b/fs/file.c > > > > @@ -515,28 +515,35 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags) > > > > if (fd < files->next_fd) > > > > fd = files->next_fd; > > > > - if (fd < fdt->max_fds) > > > > + error = -EMFILE; > > > > + if (likely(fd < fdt->max_fds)) { > > > > + if (~fdt->open_fds[0] && !start) { > > > > + fd = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, fd); > > > So I don't think this is quite correct. If files->next_fd is set, we could > > > end up calling find_next_zero_bit() starting from quite high offset causing > > > a regression? Also because we don't expand in this case, we could cause access > > > beyond end of fdtable? > > OK, I've misunderstood the next_fd logic. next_fd is the lowest 0-bit in > > the open_fds bitmap so if next_fd is big, the ~fdt->open_fds[0] should > > be false. As such the above condition could be rewritten as: > > > > if (!start && files->next_fd < BITS_PER_LONG) > > > > to avoid loading the first bitmap long if we know it is full? Or we could > > maybe go as far as: > > > > if (!start && fd < BITS_PER_LONG && !test_bit(fd, fdt->open_fds)) > > goto fastreturn; > > > > because AFAIU this should work in exactly the same cases as your code? > > > > Honza > > Thanks Honza for the good concern and suggestions here, while both above > conditions are not enough to ensure that there is available fd in the first > 64 bits of open_fds. As next_fd just means there is no available fd before > next_fd, just imagine that fd from 0 to 66 are already occupied, now fd=3 is > returned back, then next_fd would be set as 3 per fd recycling logic (i.e. > in __put_unused_fd()), next time when alloc_fd() being called, it would > return fd=3 to the caller and set next_fd=4. Then next time when alloc_fd() > being called again, next_fd==4, but actually it's already been occupied. So > find_next_zero_bit() is needed to find the real 0 bit anyway. Indeed, thanks for correcting me! next_fd is just a lower bound for the first free fd. > The conditions > should either be like it is in patch or if (!start && !test_bit(0, > fdt->full_fds_bits)), the latter should also have the bitmap loading cost, > but another point is that a bit in full_fds_bits represents 64 bits in > open_fds, no matter fd >64 or not, full_fds_bits should be loaded any way, > maybe we can modify the condition to use full_fds_bits ? So maybe I'm wrong but I think the biggest benefit of your code compared to plain find_next_fd() is exactly in that we don't have to load full_fds_bits into cache. So I'm afraid that using full_fds_bits in the condition would destroy your performance gains. Thinking about this with a fresh head how about putting implementing your optimization like: --- a/fs/file.c +++ b/fs/file.c @@ -490,6 +490,20 @@ static unsigned int find_next_fd(struct fdtable *fdt, unsigned int start) unsigned int maxbit = maxfd / BITS_PER_LONG; unsigned int bitbit = start / BITS_PER_LONG; + /* + * Optimistically search the first long of the open_fds bitmap. It + * saves us from loading full_fds_bits into cache in the common case + * and because BITS_PER_LONG > start >= files->next_fd, we have quite + * a good chance there's a bit free in there. + */ + if (start < BITS_PER_LONG) { + unsigned int bit; + + bit = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, start); + if (bit < BITS_PER_LONG) + return bit; + } + bitbit = find_next_zero_bit(fdt->full_fds_bits, maxbit, bitbit) * BITS_PER_LONG; if (bitbit >= maxfd) return maxfd; Plus your optimizations with likely / unlikely. This way the code flow in alloc_fd() stays more readable, we avoid loading the first open_fds long into cache if it is full, and we should get all the performance benefits? Honza > > > > + goto fastreturn; > > > > + } > > > > fd = find_next_fd(fdt, fd); > > > > + } > > > > + > > > > + if (unlikely(fd >= fdt->max_fds)) { > > > > + error = expand_files(files, fd); > > > > + if (error < 0) > > > > + goto out; > > > > + /* > > > > + * If we needed to expand the fs array we > > > > + * might have blocked - try again. > > > > + */ > > > > + if (error) > > > > + goto repeat; > > > > + } > > > > +fastreturn: > > > > /* > > > > * N.B. For clone tasks sharing a files structure, this test > > > > * will limit the total number of files that can be opened. > > > > */ > > > > - error = -EMFILE; > > > > - if (fd >= end) > > > > + if (unlikely(fd >= end)) > > > > goto out; > > > > - error = expand_files(files, fd); > > > > - if (error < 0) > > > > - goto out; > > > > - > > > > - /* > > > > - * If we needed to expand the fs array we > > > > - * might have blocked - try again. > > > > - */ > > > > - if (error) > > > > - goto repeat; > > > > - > > > > if (start <= files->next_fd) > > > > files->next_fd = fd + 1; > > > > -- > > > > 2.43.0 > > > > > > > -- > > > Jan Kara <jack@xxxxxxxx> > > > SUSE Labs, CR > > > > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR