Re: Possible io_uring related race leads to btrfs data csum mismatch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/16/23 8:33 AM, Jens Axboe wrote:
>> However I didn't see any io_uring related callback inside btrfs code,
>> any advice on the io_uring part would be appreciated.
> 
> io_uring doesn't do anything special here, it uses the normal page cache
> read/write parts for buffered IO. But you may get extra parallellism
> with io_uring here. For example, with the buffered write that this most
> likely is, libaio would be exactly the same as a pwrite(2) on the file.
> If this would've blocked, io_uring would offload this to a helper
> thread. Depending on the workload, you could have multiple of those in
> progress at the same time.

I poked a bit at fsstress, and it's a bit odd imho. For example, any aio
read/write seems to hardcode O_DIRECT. The io_uring side will be
buffered. Not sure why there are those differences and why buffered/dio
isn't a variable. But this does mean that these are certainly buffered
writes with io_uring.

Are any of the writes overlapping? You could have a situation where
writeA and writeB overlap, and writeA will get punted to io-wq for
execution and writeB will complete inline. In other words, writeA is
issued, writeB is issued. writeA goes to io-wq, writeB now completes
inline, and now writeA is done and completed. It may be exposing issues
in btrfs. You can try the below patch, which should serialize all the
writes to a given file. If this makes a difference for you, then I'd
strongly suspect that the issue is deeper than the delivery mechanism of
the write.

diff --git a/ltp/fsstress.c b/ltp/fsstress.c
index 6641a525fe5d..034cbba27c6e 100644
--- a/ltp/fsstress.c
+++ b/ltp/fsstress.c
@@ -2317,6 +2317,7 @@ do_uring_rw(opnum_t opno, long r, int flags)
 		off %= maxfsize;
 		memset(buf, nameseq & 0xff, len);
 		io_uring_prep_writev(sqe, fd, &iovec, 1, off);
+		sqe->flags |= IOSQE_ASYNC;
 	} else {
 		off = (off64_t)(lr % stb.st_size);
 		io_uring_prep_readv(sqe, fd, &iovec, 1, off);

-- 
Jens Axboe




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux