On Sun, 4 Apr 2021 at 00:42, Clay Harris <bugs@xxxxxxxxxxx> wrote: > On Tue, Mar 30 2021 at 14:17:21 +0300, Lennert Buytenhek quoth thus: > > > ... > > > > - Make IORING_OP_GETDENTS read from the directory's current position > > if the specified offset value is -1 (IORING_FEAT_RW_CUR_POS). > > (Requested / pointed out by Tavian Barnes.) > > This seems like a good feature. As I understand it, this would > allow submitting pairs of IORING_OP_GETDENTS with IOSQE_IO_HARDLINK > wherein the first specifies the current offset and the second specifies > offset -1, thereby halfing the number of kernel round trips for N getdents64. Yep, that was my main motivation for this suggestion. > If the entire directory fits into the first buffer, the second would > indicate EOF. This would certainly seem like a win, but note there > are diminishing returns as the directory size increases, versus just > doubling the buffer size. True, but most directories are small, so I expect it would be a benefit most of the time. Even for big directories you still get two buffers filled with one syscall, same as if you did a conventional getdents64() with twice as big a buffer. > An alternate / additional idea you may wish to consider is changing > getdents64 itself. > > Ordinary read functions must return 0 length to indicate EOF, because > they can return arbitrary data. This is not the case for getdents64. > > 1) Define a struct linux_dirent of minimum size containing an abnormal > value as a sentinel. d_off = 0 or -1 should work. > > 2) Implement a flag for getdents64. Sadly getdents64() doesn't take a flags argument. We'd probably need a new syscall. > IF > the flag is set AND > we are returning a non-zero length buffer AND > there is room in the buffer for the sentinel structure AND > a getdents64 call using the d_off of the last struct in the > buffer would return EOF > THEN > append the sentinel struct to the buffer. > > > Using the arrangement, we would still handle a 0 length return as an > EOF, but if we see the sentinel struct, we can skip the additional call > altogether. The saves all of the pairing of buffers and extra logic, > and unless we're unlucky and the sentinel structure did not fit in > the buffer at EOF, would always reduce the number of getdents64 > calls by one. > > Moreover, if the flag was available outside of io_uring, for smaller > directories, this feature would cut the number of directory reads > of readdir(3) by up to half. If we need a new syscall anyway, the calling convention could be adjusted to indicate EOF more easily than that, e.g. int getdents2(int fd, void *buf, size_t *size, unsigned long flags); With 0 being EOF, 1 being not-EOF, and -1 for error, or something. -- Tavian Barnes