On 19/02/2021 12:05, Pavel Begunkov wrote: > On 18/02/2021 12:27, Lennert Buytenhek wrote: >> IORING_OP_GETDENTS behaves much like getdents64(2) and takes the same >> arguments, but with a small twist: it takes an additional offset >> argument, and reading from the specified directory starts at the given >> offset. >> >> For the first IORING_OP_GETDENTS call on a directory, the offset >> parameter can be set to zero, and for subsequent calls, it can be >> set to the ->d_off field of the last struct linux_dirent64 returned >> by the previous IORING_OP_GETDENTS call. >> >> Internally, if necessary, IORING_OP_GETDENTS will vfs_llseek() to >> the right directory position before calling vfs_getdents(). >> >> IORING_OP_GETDENTS may or may not update the specified directory's >> file offset, and the file offset should not be relied upon having >> any particular value during or after an IORING_OP_GETDENTS call. >> >> Signed-off-by: Lennert Buytenhek <buytenh@xxxxxxxxxxxxxx> >> --- >> fs/io_uring.c | 73 +++++++++++++++++++++++++++++++++++ >> include/uapi/linux/io_uring.h | 1 + >> 2 files changed, 74 insertions(+) >> >> diff --git a/fs/io_uring.c b/fs/io_uring.c >> index 056bd4c90ade..6853bf48369a 100644 >> --- a/fs/io_uring.c >> +++ b/fs/io_uring.c >> @@ -635,6 +635,13 @@ struct io_mkdir { >> struct filename *filename; >> }; >> > [...] >> +static int io_getdents(struct io_kiocb *req, unsigned int issue_flags) >> +{ >> + struct io_getdents *getdents = &req->getdents; >> + bool pos_unlock = false; >> + int ret = 0; >> + >> + /* getdents always requires a blocking context */ >> + if (issue_flags & IO_URING_F_NONBLOCK) >> + return -EAGAIN; >> + >> + /* for vfs_llseek and to serialize ->iterate_shared() on this file */ >> + if (file_count(req->file) > 1) { > > Looks racy, is it safe? E.g. can be concurrently dupped and used, or just > several similar IORING_OP_GETDENTS requests. > >> + pos_unlock = true; >> + mutex_lock(&req->file->f_pos_lock); >> + } >> + >> + if (req->file->f_pos != getdents->pos) { >> + loff_t res = vfs_llseek(req->file, getdents->pos, SEEK_SET); > > I may be missing the previous discussions, but can this ever become > stateless, like passing an offset? Including readdir.c and beyond. I mean without those seeks. An emulation would look like rewinding pos back after vfs_getdents, though might be awful on performance. > >> + if (res < 0) >> + ret = res; >> + } >> + >> + if (ret == 0) { >> + ret = vfs_getdents(req->file, getdents->dirent, >> + getdents->count); >> + } >> + >> + if (pos_unlock) >> + mutex_unlock(&req->file->f_pos_lock); >> + >> + if (ret < 0) { >> + if (ret == -ERESTARTSYS) >> + ret = -EINTR; >> + req_set_fail_links(req); >> + } >> + io_req_complete(req, ret); >> + return 0; >> +} > [...] > -- Pavel Begunkov