Re: [PATCH 3/3] io_uring: add support for zone-append

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19.06.2020 09:02, Jens Axboe wrote:
On 6/19/20 8:59 AM, Pavel Begunkov wrote:
On 19/06/2020 17:15, Jens Axboe wrote:
On 6/19/20 3:41 AM, javier.gonz@xxxxxxxxxxx wrote:
Jens,

Would you have time to answer a question below in this thread?

On 18.06.2020 11:11, javier.gonz@xxxxxxxxxxx wrote:
On 18.06.2020 08:47, Damien Le Moal wrote:
On 2020/06/18 17:35, javier.gonz@xxxxxxxxxxx wrote:
On 18.06.2020 07:39, Damien Le Moal wrote:
On 2020/06/18 2:27, Kanchan Joshi wrote:
From: Selvakumar S <selvakuma.s1@xxxxxxxxxxx>

Introduce three new opcodes for zone-append -

  IORING_OP_ZONE_APPEND     : non-vectord, similiar to IORING_OP_WRITE
  IORING_OP_ZONE_APPENDV    : vectored, similar to IORING_OP_WRITEV
  IORING_OP_ZONE_APPEND_FIXED : append using fixed-buffers

Repurpose cqe->flags to return zone-relative offset.

Signed-off-by: SelvaKumar S <selvakuma.s1@xxxxxxxxxxx>
Signed-off-by: Kanchan Joshi <joshi.k@xxxxxxxxxxx>
Signed-off-by: Nitesh Shetty <nj.shetty@xxxxxxxxxxx>
Signed-off-by: Javier Gonzalez <javier.gonz@xxxxxxxxxxx>
---
fs/io_uring.c                 | 72 +++++++++++++++++++++++++++++++++++++++++--
include/uapi/linux/io_uring.h |  8 ++++-
2 files changed, 77 insertions(+), 3 deletions(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 155f3d8..c14c873 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -649,6 +649,10 @@ struct io_kiocb {
	unsigned long		fsize;
	u64			user_data;
	u32			result;
+#ifdef CONFIG_BLK_DEV_ZONED
+	/* zone-relative offset for append, in bytes */
+	u32			append_offset;

this can overflow. u64 is needed.

We chose to do it this way to start with because struct io_uring_cqe
only has space for u32 when we reuse the flags.

We can of course create a new cqe structure, but that will come with
larger changes to io_uring for supporting append.

Do you believe this is a better approach?

The problem is that zone size are 32 bits in the kernel, as a number
of sectors.  So any device that has a zone size smaller or equal to
2^31 512B sectors can be accepted. Using a zone relative offset in
bytes for returning zone append result is OK-ish, but to match the
kernel supported range of possible zone size, you need 31+9 bits...
32 does not cut it.

Agree. Our initial assumption was that u32 would cover current zone size
requirements, but if this is a no-go, we will take the longer path.

Converting to u64 will require a new version of io_uring_cqe, where we
extend at least 32 bits. I believe this will need a whole new allocation
and probably ioctl().

Is this an acceptable change for you? We will of course add support for
liburing when we agree on the right way to do this.

If you need 64-bit of return value, then it's not going to work. Even
with the existing patches, reusing cqe->flags isn't going to fly, as
it would conflict with eg doing zone append writes with automatic
buffer selection.

Buffer selection is for reads/recv kind of requests, but appends
are writes. In theory they can co-exist using cqe->flags.

Yeah good point, since it's just writes, doesn't matter. But the other
point still stands, it could potentially conflict with other flags, but
I guess only to the extent where both flags would need extra storage in
->flags. So not a huge concern imho.

Very good point Pavel!

If co-existing with the current flags is an option, I'll explore this
for the next version.

Thanks Jens and Pavel for the time and ideas!

Javier



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux