Re: [PATCH v2 5/7] block: implement async discard as io_uring cmd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/22/24 07:46, Christoph Hellwig wrote:
On Thu, Aug 22, 2024 at 04:35:55AM +0100, Pavel Begunkov wrote:
io_uring allows to implement custom file specific operations via
fops->uring_cmd callback. Use it to wire up asynchronous discard
commands. Normally, first it tries to do a non-blocking issue, and if
fails we'd retry from a blocking context by returning -EAGAIN to
core io_uring.

Note, unlike ioctl(BLKDISCARD) with stronger guarantees against races,
we only do a best effort attempt to invalidate page cache, and it can
race with any writes and reads and leave page cache stale. It's the
same kind of races we allow to direct writes.

Can you please write up a man page for this that clear documents the
expecvted semantics?

Do we have it documented anywhere how O_DIRECT writes interact
with page cache, so I can refer to it?

+static void bio_cmd_end(struct bio *bio)

This is really weird function name.  blk_cmd_end_io or
blk_cmd_bio_end_io would be the usual choices.

Will change with other cosmetics.


+	while ((bio = blk_alloc_discard_bio(bdev, &sector, &nr_sects,
+					    GFP_KERNEL))) {

GFP_KERNEL can often will block.  You'll probably want a GFP_NOWAIT
allocation here for the nowait case.

I can change it for clarity, but I don't think it's much of a concern
since the read/write path and pretty sure a bunch of other places never
cared about it. It does the main thing, propagating it down e.g. for
tag allocation.


+		if (nowait) {
+			/*
+			 * Don't allow multi-bio non-blocking submissions as
+			 * subsequent bios may fail but we won't get direct
+			 * feedback about that. Normally, the caller should
+			 * retry from a blocking context.
+			 */
+			if (unlikely(nr_sects)) {
+				bio_put(bio);
+				return -EAGAIN;
+			}
+			bio->bi_opf |= REQ_NOWAIT;
+		}

And this really looks weird.  It first allocates a bio, potentially

That's what the write path does.

blocking, and then gives up?  I think you're much better off with
something like:

I'd rather avoid calling bio_discard_limit() an extra time, it does
too much stuff inside, when the expected case is a single bio and
for multi-bio that overhead would really matter.

Maybe I should uniline blk_alloc_discard_bio() and dedup it with
write zeroes, I'll see if can be done after other write zeroes
changes.


	if (nowait) {
		if (nr_sects > bio_discard_limit(bdev, sector))
			return -EAGAIN;
		bio = blk_alloc_discard_bio(bdev, &sector, &nr_sects,
						    GFP_NOWAIT);
		if (!bio)
			return -EAGAIN
		bio->bi_opf |= REQ_NOWAIT;
		goto submit;
	}

	/* submission loop here */

diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 753971770733..0016e38ed33c 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -208,6 +208,8 @@ struct fsxattr {
   * (see uapi/linux/blkzoned.h)
   */
+#define BLOCK_URING_CMD_DISCARD 0

Is fs.h the reight place for this?

Arguable, but I can move it to io_uring, makes things simpler
for me.

Curious:  how to we deal with conflicting uring cmds on different
device and how do we probe for them?  The NVMe uring_cmds
use the ioctl-style _IO* encoding which at least helps a bit with
that and which seem like a good idea.  Maybe someone needs to write
up a few lose rules on uring commands?

My concern is that we're sacrificing compiler optimisations
(well, jump tables are disabled IIRC) for something that doesn't even
guarantee uniqueness. I'd like to see some degree of reflection,
like user querying a file class in terms of what operations it
supports, but that's beyond the scope of the series.

--
Pavel Begunkov




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux