On Thu, Aug 22, 2024 at 04:35:55AM +0100, Pavel Begunkov wrote: > io_uring allows to implement custom file specific operations via > fops->uring_cmd callback. Use it to wire up asynchronous discard > commands. Normally, first it tries to do a non-blocking issue, and if > fails we'd retry from a blocking context by returning -EAGAIN to > core io_uring. > > Note, unlike ioctl(BLKDISCARD) with stronger guarantees against races, > we only do a best effort attempt to invalidate page cache, and it can > race with any writes and reads and leave page cache stale. It's the > same kind of races we allow to direct writes. Can you please write up a man page for this that clear documents the expecvted semantics? > +static void bio_cmd_end(struct bio *bio) This is really weird function name. blk_cmd_end_io or blk_cmd_bio_end_io would be the usual choices. > + while ((bio = blk_alloc_discard_bio(bdev, §or, &nr_sects, > + GFP_KERNEL))) { GFP_KERNEL can often will block. You'll probably want a GFP_NOWAIT allocation here for the nowait case. > + if (nowait) { > + /* > + * Don't allow multi-bio non-blocking submissions as > + * subsequent bios may fail but we won't get direct > + * feedback about that. Normally, the caller should > + * retry from a blocking context. > + */ > + if (unlikely(nr_sects)) { > + bio_put(bio); > + return -EAGAIN; > + } > + bio->bi_opf |= REQ_NOWAIT; > + } And this really looks weird. It first allocates a bio, potentially blocking, and then gives up? I think you're much better off with something like: if (nowait) { if (nr_sects > bio_discard_limit(bdev, sector)) return -EAGAIN; bio = blk_alloc_discard_bio(bdev, §or, &nr_sects, GFP_NOWAIT); if (!bio) return -EAGAIN bio->bi_opf |= REQ_NOWAIT; goto submit; } /* submission loop here */ > diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h > index 753971770733..0016e38ed33c 100644 > --- a/include/uapi/linux/fs.h > +++ b/include/uapi/linux/fs.h > @@ -208,6 +208,8 @@ struct fsxattr { > * (see uapi/linux/blkzoned.h) > */ > > +#define BLOCK_URING_CMD_DISCARD 0 Is fs.h the reight place for this? Curious: how to we deal with conflicting uring cmds on different device and how do we probe for them? The NVMe uring_cmds use the ioctl-style _IO* encoding which at least helps a bit with that and which seem like a good idea. Maybe someone needs to write up a few lose rules on uring commands?