On Mon, Apr 24 2023 at 17:41:18 +0900, Dominique Martinet quoth thus: > Thanks! > > Clay Harris wrote on Mon, Apr 24, 2023 at 02:29:46AM -0500: > > This also seems like a good place to bring up a point I made with > > the last attempt at this code. You're missing an optimization here. > > getdents knows whether it is returning a buffer because the next entry > > won't fit versus because there are no more entries. As it doesn't > > return that information, callers must always keep calling it back > > until EOF. This means a completely unnecessary call is made for > > every open directory. In other words, for a directory scan where > > the buffers are large enough to not overflow, that literally twice > > as many calls are made to getdents as necessary. As io_uring is > > in-kernel, it could use an internal interface to getdents which would > > return an EOF indicator along with the (probably non-empty) buffer. > > io_uring would then return that flag with the CQE. > > Sorry I didn't spot that comment in the last iteration of the patch, > that sounds interesting. > > This isn't straightforward even in-kernel though: the ctx.actor callback > (filldir64) isn't called when we're done, so we only know we couldn't > fill in the buffer. > We could have the callback record 'buffer full' and consider we're done > if the buffer is full, or just single-handedly declare we are if we have > more than `MAXNAMLEN + sizeof(struct linux_dirent64)` left over, but I > assume a filesystem is allowed to return what it has readily available > and expect the user to come back later? > In which case we cannot use this as an heuristic... > > So if we do this, it'll require a way for filesystems to say they're > filling in as much as they can, or go the sledgehammer way of adding an > extra dir_context dir_context callback, either way I'm not sure I want > to deal with all that immediately unless I'm told all filesystems will > fill as much as possible without ever failing for any temporary reason > in the middle of iterate/iterate_shared(). I don't have a complete understanding of this area, but my thought was not that we would look for any buffer full condition, but rather that an iterator could be tested for next_entry == EOF. > Call me greedy but I believe such a flag in the CQE could also be added > later on without any bad side effects (as it's optional to check on it > to stop calling early and there's no harm in not setting it)? Certainly it could be added later, but I wanted to make sure some thought was put into it now. It would be nice to have it sooner rather than later though... > > > (* As an aside, the only place I've ever seen a non-zero lseek on a > > directory, is in a very resource limited environment, e.g. too small > > open files limit. In the case of a depth-first directory scan, it > > must close directories before completely reading them, and reopen / > > lseek to their previous position in order to continue. This scenario > > is certainly not worth bothering with for io_uring.) > > (I also thought of userspace NFS/9P servers are these two at least get > requests from clients with an arbitrary offset, but I'll be glad to > forget about them for now...) > > -- > Dominique Martinet | Asmadeus