On Thu, Aug 22, 2024 at 02:07:16PM +0100, Pavel Begunkov wrote: > > > Note, unlike ioctl(BLKDISCARD) with stronger guarantees against races, > > > we only do a best effort attempt to invalidate page cache, and it can > > > race with any writes and reads and leave page cache stale. It's the > > > same kind of races we allow to direct writes. > > > > Can you please write up a man page for this that clear documents the > > expecvted semantics? > > Do we have it documented anywhere how O_DIRECT writes interact > with page cache, so I can refer to it? I can't find a good writeup. Adding Dave as he tends to do long emails on topic like this so he might have one hiding somewhere. > > GFP_KERNEL can often will block. You'll probably want a GFP_NOWAIT > > allocation here for the nowait case. > > I can change it for clarity, but I don't think it's much of a concern > since the read/write path and pretty sure a bunch of other places never > cared about it. It does the main thing, propagating it down e.g. for > tag allocation. True, we're only doing the nowait allocation for larger data structures. Which is a bit odd indeed. > I'd rather avoid calling bio_discard_limit() an extra time, it does > too much stuff inside, when the expected case is a single bio and > for multi-bio that overhead would really matter. Compared to a memory allocation it's not really doing all the much. In the long run we really should move splitting discard bios down the stack like we do for normal I/O anyway. > Maybe I should uniline blk_alloc_discard_bio() and dedup it with uniline? I read that as unіnline, but as it's not inline I don't understand what you mean either. > > > +#define BLOCK_URING_CMD_DISCARD 0 > > > > Is fs.h the reight place for this? > > Arguable, but I can move it to io_uring, makes things simpler > for me. I would have expected a uapi/linux/blkdev.h for it (and I'm kinda surprised we don't have that yet). > > > Curious: how to we deal with conflicting uring cmds on different > > device and how do we probe for them? The NVMe uring_cmds > > use the ioctl-style _IO* encoding which at least helps a bit with > > that and which seem like a good idea. Maybe someone needs to write > > up a few lose rules on uring commands? > > My concern is that we're sacrificing compiler optimisations > (well, jump tables are disabled IIRC) for something that doesn't even > guarantee uniqueness. I'd like to see some degree of reflection, > like user querying a file class in terms of what operations it > supports, but that's beyond the scope of the series. We can't guaranteed uniqueness, but between the class, the direction, and the argument size we get a pretty good one. There is a reason pretty much all ioctls added in the last 25 years are using this scheme.