On 7/27/23 16:52, Christian Brauner wrote:
On Thu, Jul 27, 2023 at 04:12:12PM +0100, Pavel Begunkov wrote:
On 7/27/23 15:27, Christian Brauner wrote:
On Thu, Jul 27, 2023 at 07:51:19PM +0800, Hao Xu wrote:
On 7/26/23 23:00, Christian Brauner wrote:
On Tue, Jul 18, 2023 at 09:21:10PM +0800, Hao Xu wrote:
From: Hao Xu <howeyxu@xxxxxxxxxxx>
This add support for getdents64 to io_uring, acting exactly like the
syscall: the directory is iterated from it's current's position as
stored in the file struct, and the file's position is updated exactly as
if getdents64 had been called.
For filesystems that support NOWAIT in iterate_shared(), try to use it
first; if a user already knows the filesystem they use do not support
nowait they can force async through IOSQE_ASYNC in the sqe flags,
avoiding the need to bounce back through a useless EAGAIN return.
Co-developed-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>
Signed-off-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>
Signed-off-by: Hao Xu <howeyxu@xxxxxxxxxxx>
---
[...]
I actually saw this semaphore, and there is another xfs lock in
file_accessed
--> touch_atime
--> inode_update_time
--> inode->i_op->update_time == xfs_vn_update_time
Forgot to point them out in the cover-letter..., I didn't modify them
since I'm not very sure about if we should do so, and I saw Stefan's
patchset didn't modify them too.
My personnal thinking is we should apply trylock logic for this
inode->i_rwsem. For xfs lock in touch_atime, we should do that since it
doesn't make sense to rollback all the stuff while we are almost at the
end of getdents because of a lock.
That manoeuvres around the problem. Which I'm slightly more sensitive
too as this review is a rather expensive one.
Plus, it seems fixable in at least two ways:
For both we need to be able to tell the filesystem that a nowait atime
update is requested. Simple thing seems to me to add a S_NOWAIT flag to
file_time_flags and passing that via i_op->update_time() which already
has a flag argument. That would likely also help kiocb_modified().
fwiw, we've just recently had similar problems with io_uring read/write
and atime/mtime in prod environment, so we're interested in solving that
regardless of this patchset. I.e. io_uring issues rw with NOWAIT, {a,m}time
touch ignores that, that stalls other submissions and completely screws
latency.
file_accessed()
-> touch_atime()
-> inode_update_time()
-> i_op->update_time == xfs_vn_update_time()
Then we have two options afaict:
(1) best-effort atime update
file_accessed() already has the builtin assumption that updating atime
might fail for other reasons - see the comment in there. So it is
somewhat best-effort already.
(2) move atime update before calling into filesystem
If we want to be sure that access time is updated when a readdir request
is issued through io_uring then we need to have file_accessed() give a
return value and expose a new helper for io_uring or modify
vfs_getdents() to do something like:
vfs_getdents()
{
if (nowait)
down_read_trylock()
if (!IS_DEADDIR(inode)) {
ret = file_accessed(file);
if (ret == -EAGAIN)
goto out_unlock;
f_op->iterate_shared()
}
}
It's not unprecedented to do update atime before the actual operation
has been done afaict. That's already the case in xfs_file_write_checks()
which is called before anything is written. So that seems ok.
Does any of these two options work for the xfs maintainers and Jens?
It doesn't look (2) will solve it for reads/writes, at least without
It would also solve it for writes which is what my kiocb_modified()
comment was about. So right now you have:
Great, I assumed there are stricter requirements for mtime not
transiently failing.
kiocb_modified(IOCB_NOWAI)
-> file_modified_flags(IOCB_NOWAI)
-> file_remove_privs(IOCB_NOWAIT) // already fully non-blocking
-> file_accessed(IOCB_NOWAIT)
-> i_op->update_time(S_ATIME | S_NOWAIT)
and since xfs_file_write_iter() calls xfs_file_write_checks() before
doing any actual work you'd now be fine.
For reads xfs_file_read_iter() would need to be changed to a similar
logic but that's for xfs to decide ultimately.
the pain of changing the {write,read}_iter callbacks. 1) sounds good
to me from the io_uring perspective, but I guess it won't work
for mtime?
I would prefer 2) which seems cleaner to me. But I might miss why this
won't work. So input needed/wanted.
Maybe I didn't fully grasp the (2) idea
2.1: all read_iter, write_iter, etc. callbacks should do file_accessed()
before doing IO, which sounds like a good option if everyone agrees with
that. Taking a look at direct block io, it's already like this.
2.2: Having io_uring doing file_accessed(). Since it's all currently
hidden behind {read,write}_iter() callbacks and not easily extractable,
it doesn't like a good option, unless I missed sth.
E.g. this ugliness comes to mind.
io_uring_read() {
file_accessed();
file->f_op->read_iter(DONT_TOUCH_ATIME);
...
}
read_iter_impl() {
// some pre processing
if (!(flags & DONT_TOUCH_ATIME))
file_accessed();
}
--
Pavel Begunkov