Hi Jens, I applied your patch (with a minor conflict in xfs_file_open() since FMODE_BUF_WASYNC isn't in v5.15) and did the same series of tests on the v5.15 kernel. All the io_uring benchmarks regressed 20-45% after it. I haven't tested on v6.1 yet. Thanks, Saeed ________________________________________ From: Jens Axboe <axboe@xxxxxxxxx> Sent: Thursday, January 19, 2023 8:12 PM To: Saeed Mirzamohammadi; io-uring@xxxxxxxxxxxxxxx Cc: asml.silence@xxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx Subject: Re: Phoronix pts fio io_uring test regression report on upstream v6.1 and v5.15 On 1/19/23 2:36?PM, Saeed Mirzamohammadi wrote: > Hello, > > I'm reporting a performance regression after the commit below on > phoronix pts/fio test and with the config that is added in the end of > this email: > > Link: https://urldefense.com/v3/__https://lore.kernel.org/all/20210913131123.597544850@xxxxxxxxxxxxxxxxxxx/__;!!ACWV5N9M2RV99hQ!KM7c30r4OiqvkxVW44cyWb3rZr85i28yKto8WAAcj8OgWAOp-ebzcrHuggGw96ivMFCARikAEWwjhUBYFqujONc$ > > commit 7b3188e7ed54102a5dcc73d07727f41fb528f7c8 > Author: Jens Axboe axboe@xxxxxxxxx > Date: Mon Aug 30 19:37:41 2021 -0600 > > io_uring: IORING_OP_WRITE needs hash_reg_file set > > We observed regression on the latest v6.1.y and v5.15.y upstream > kernels (Haven't tested other stable kernels). We noticed that > performance regression improved 45% after the revert of the commit > above. > > All of the benchmarks below have experienced around ~45% regression. > phoronix-pts-fio-1.15.0-RandomWrite-EngineIO_uring-BufferedNo-DirectYes-BlockSize4KB-MB-s_xfs > phoronix-pts-fio-1.15.0-SequentialWrite-EngineIO_uring-BufferedNo-DirectYes-BlockSize4KB-MB-s_xfs > phoronix-pts-fio-1.15.0-SequentialWrite-EngineIO_uring-BufferedYes-DirectNo-BlockSize4KB-MB-s_xfs > > We tend to see this regression on 4KB BlockSize tests. > > We tried out changing force_async but that has no effect on the > result. Also, backported a modified version of the patch mentioned > here (https://urldefense.com/v3/__https://lkml.org/lkml/2022/7/20/854__;!!ACWV5N9M2RV99hQ!KM7c30r4OiqvkxVW44cyWb3rZr85i28yKto8WAAcj8OgWAOp-ebzcrHuggGw96ivMFCARikAEWwjhUBYbOxQftI$ ) but that didn't affect > performance. > > Do you have any suggestions on any fixes or what else we can try to > narrow down the issue? This is really mostly by design - the previous approach of not hashing buffered writes on regular files would cause a lot of inode lock contention due to lots of threads hammering on that. That said, for XFS, we don't need to serialize on O_DIRECT writes. Don't think we currently have a way to detect this as it isn't really advertised. Something like the below might work, with the caveat that this is totally untested. diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 595a5bcf46b9..85fdc6f2efa4 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -1171,7 +1171,8 @@ xfs_file_open( { if (xfs_is_shutdown(XFS_M(inode->i_sb))) return -EIO; - file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC; + file->f_mode |= FMODE_NOWAIT | FMODE_BUF_RASYNC | FMODE_BUF_WASYNC | + FMODE_ODIRECT_PARALLEL; return generic_file_open(inode, file); } diff --git a/include/linux/fs.h b/include/linux/fs.h index c1769a2c5d70..8541b9e53c2d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -166,6 +166,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, /* File supports DIRECT IO */ #define FMODE_CAN_ODIRECT ((__force fmode_t)0x400000) +/* File supports parallel O_DIRECT writes */ +#define FMODE_ODIRECT_PARALLEL ((__force fmode_t)0x800000) + /* File was opened by fanotify and shouldn't generate fanotify events */ #define FMODE_NONOTIFY ((__force fmode_t)0x4000000) diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index e680685e8a00..1409f6f69b13 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -424,7 +424,12 @@ static void io_prep_async_work(struct io_kiocb *req) req->flags |= io_file_get_flags(req->file) << REQ_F_SUPPORT_NOWAIT_BIT; if (req->flags & REQ_F_ISREG) { - if (def->hash_reg_file || (ctx->flags & IORING_SETUP_IOPOLL)) + bool should_hash = def->hash_reg_file; + + if (should_hash && (req->file->f_flags & O_DIRECT) && + (req->file->f_mode & FMODE_ODIRECT_PARALLEL)) + should_hash = false; + if (should_hash || (ctx->flags & IORING_SETUP_IOPOLL)) io_wq_hash_work(&req->work, file_inode(req->file)); } else if (!req->file || !S_ISBLK(file_inode(req->file)->i_mode)) { if (def->unbound_nonreg_file) -- Jens Axboe