Re: [PATCH 3/5] io_uring: add support for getdents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/27/23 15:27, Christian Brauner wrote:
On Thu, Jul 27, 2023 at 07:51:19PM +0800, Hao Xu wrote:
On 7/26/23 23:00, Christian Brauner wrote:
On Tue, Jul 18, 2023 at 09:21:10PM +0800, Hao Xu wrote:
From: Hao Xu <howeyxu@xxxxxxxxxxx>

This add support for getdents64 to io_uring, acting exactly like the
syscall: the directory is iterated from it's current's position as
stored in the file struct, and the file's position is updated exactly as
if getdents64 had been called.

For filesystems that support NOWAIT in iterate_shared(), try to use it
first; if a user already knows the filesystem they use do not support
nowait they can force async through IOSQE_ASYNC in the sqe flags,
avoiding the need to bounce back through a useless EAGAIN return.

Co-developed-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>
Signed-off-by: Dominique Martinet <asmadeus@xxxxxxxxxxxxx>
Signed-off-by: Hao Xu <howeyxu@xxxxxxxxxxx>
---
[...]
I actually saw this semaphore, and there is another xfs lock in
file_accessed
   --> touch_atime
     --> inode_update_time
       --> inode->i_op->update_time == xfs_vn_update_time

Forgot to point them out in the cover-letter..., I didn't modify them
since I'm not very sure about if we should do so, and I saw Stefan's
patchset didn't modify them too.

My personnal thinking is we should apply trylock logic for this
inode->i_rwsem. For xfs lock in touch_atime, we should do that since it
doesn't make sense to rollback all the stuff while we are almost at the
end of getdents because of a lock.

That manoeuvres around the problem. Which I'm slightly more sensitive
too as this review is a rather expensive one.

Plus, it seems fixable in at least two ways:

For both we need to be able to tell the filesystem that a nowait atime
update is requested. Simple thing seems to me to add a S_NOWAIT flag to
file_time_flags and passing that via i_op->update_time() which already
has a flag argument. That would likely also help kiocb_modified().

fwiw, we've just recently had similar problems with io_uring read/write
and atime/mtime in prod environment, so we're interested in solving that
regardless of this patchset. I.e. io_uring issues rw with NOWAIT, {a,m}time
touch ignores that, that stalls other submissions and completely screws
latency.

file_accessed()
-> touch_atime()
    -> inode_update_time()
       -> i_op->update_time == xfs_vn_update_time()

Then we have two options afaict:

(1) best-effort atime update

file_accessed() already has the builtin assumption that updating atime
might fail for other reasons - see the comment in there. So it is
somewhat best-effort already.

(2) move atime update before calling into filesystem

If we want to be sure that access time is updated when a readdir request
is issued through io_uring then we need to have file_accessed() give a
return value and expose a new helper for io_uring or modify
vfs_getdents() to do something like:

vfs_getdents()
{
	if (nowait)
		down_read_trylock()

	if (!IS_DEADDIR(inode)) {
		ret = file_accessed(file);
		if (ret == -EAGAIN)
			goto out_unlock;

		f_op->iterate_shared()
	}
}

It's not unprecedented to do update atime before the actual operation
has been done afaict. That's already the case in xfs_file_write_checks()
which is called before anything is written. So that seems ok.

Does any of these two options work for the xfs maintainers and Jens?

It doesn't look (2) will solve it for reads/writes, at least without
the pain of changing the {write,read}_iter callbacks. 1) sounds good
to me from the io_uring perspective, but I guess it won't work
for mtime?

--
Pavel Begunkov



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux