Re: [PATCH v2 5/7] block: implement async discard as io_uring cmd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/23/24 12:59, Christoph Hellwig wrote:
On Thu, Aug 22, 2024 at 02:07:16PM +0100, Pavel Begunkov wrote:
Note, unlike ioctl(BLKDISCARD) with stronger guarantees against races,
we only do a best effort attempt to invalidate page cache, and it can
race with any writes and reads and leave page cache stale. It's the
same kind of races we allow to direct writes.

Can you please write up a man page for this that clear documents the
expecvted semantics?

Do we have it documented anywhere how O_DIRECT writes interact
with page cache, so I can refer to it?

I can't find a good writeup.  Adding Dave as he tends to do long
emails on topic like this so he might have one hiding somewhere.

GFP_KERNEL can often will block.  You'll probably want a GFP_NOWAIT
allocation here for the nowait case.

I can change it for clarity, but I don't think it's much of a concern
since the read/write path and pretty sure a bunch of other places never
cared about it. It does the main thing, propagating it down e.g. for
tag allocation.

True, we're only doing the nowait allocation for larger data
structures.  Which is a bit odd indeed.

That's widespread, last time I looked into it no amount of patching
saved io_uring and tasks being killed by the oom reaper under memory
pressure.

I'd rather avoid calling bio_discard_limit() an extra time, it does
too much stuff inside, when the expected case is a single bio and
for multi-bio that overhead would really matter.

Compared to a memory allocation it's not really doing all the much.
In the long run we really should move splitting discard bios down
the stack like we do for normal I/O anyway.

Maybe I should uniline blk_alloc_discard_bio() and dedup it with

uniline?  I read that as unіnline, but as it's not inline I don't
understand what you mean either.

"Hand code" if you wish, but you can just ignore it


+#define BLOCK_URING_CMD_DISCARD			0

Is fs.h the reight place for this?

Arguable, but I can move it to io_uring, makes things simpler
for me.

I would have expected a uapi/linux/blkdev.h for it (and I'm kinda
surprised we don't have that yet).

I think that would be overkill, we don't need it for just these
commands, and it's only adds pain with probing the header with
autotools or so. If there is a future vision for it I'd say we
can drop a patch on top.

Curious:  how to we deal with conflicting uring cmds on different
device and how do we probe for them?  The NVMe uring_cmds
use the ioctl-style _IO* encoding which at least helps a bit with
that and which seem like a good idea.  Maybe someone needs to write
up a few lose rules on uring commands?

My concern is that we're sacrificing compiler optimisations
(well, jump tables are disabled IIRC) for something that doesn't even
guarantee uniqueness. I'd like to see some degree of reflection,
like user querying a file class in terms of what operations it
supports, but that's beyond the scope of the series.

We can't guaranteed uniqueness, but between the class, the direction,
and the argument size we get a pretty good one.  There is a reason
pretty much all ioctls added in the last 25 years are using this scheme.

which is likely because some people insisted on it and not because
the scheme is so great that everyone became acolytes. Not to mention
only 256 possible "types" and the endless mess of sharing them and
trying to find a range to use. I'll convert to have less headache,
but either way we're just propagating the problem into the future.

--
Pavel Begunkov




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux