Re: [RFC 5/5] block: implement io_uring discard cmd

Pavel Begunkov <asml.silence@xxxxxxxxx> · Tue, 20 Aug 2024 18:19:00 +0100

On 8/20/24 17:30, Jens Axboe wrote:
On 8/19/24 8:36 PM, Ming Lei wrote:
On Mon, Aug 19, 2024 at 02:01:21PM -0600, Jens Axboe wrote:
On 8/15/24 7:45 PM, Ming Lei wrote:
...
Meantime the handling has to move to io-wq for avoiding to block current
context, the interface becomes same with IORING_OP_FALLOCATE?

I think the current truncate is overkill, we should be able to get by
without. And no, I will not entertain an option that's "oh just punt it
to io-wq".

BTW, the truncate is added by 351499a172c0 ("block: Invalidate cache on discard v2"),
and block/009 serves as regression test for covering page cache
coherency and discard.

Here the issue is actually related with the exclusive lock of
filemap_invalidate_lock(). IMO, it is reasonable to prevent page read during
discard for not polluting page cache. block/009 may fail too without the lock.

It is just that concurrent discards can't be allowed any more by
down_write() of rw_semaphore, and block device is really capable of doing
that. It can be thought as one regression of 7607c44c157d ("block: Hold invalidate_lock in
BLKDISCARD ioctl").

Cc Jan Kara and Shin'ichiro Kawasaki.

Honestly I just think that's nonsense. It's like mixing direct and
buffered writes. Can you get corruption? Yes you most certainly can.
There should be no reason why we can't run discards without providing
page cache coherency. The sync interface attempts to do that, but that
doesn't mean that an async (or a different sync one, if that made sense)
should.

I don't see it as a problem either, it's a new interface, just need
to be upfront on what guarantees it provides (one more reason why
not fallocate), I'll elaborate on it in the commit message and so.

I think a reasonable thing to do is to have one rule for all write-like
operations starting from plain writes, which is currently allowing races
to happen and shift it to the user. Purely in theory we can get inventive
with likes of range lock trees, but that's unwarranted for all sorts of
reasons.

If you do discards to the same range as you're doing buffered IO, you
get to keep both potentially pieces. Fact is that most folks are doing
dio for performant IO exactly because buffered writes tend to be
horrible, and you could certainly use that with async discards and have
the application manage it just fine.

So I really think any attempts to provide page cache synchronization for
this is futile. And the existing sync one looks pretty abysmal, but it
doesn't really matter as it's a sync interfce. If one were to do

It should be a pain for sync as well, you can't even spin another process
and parallelise this way.

--
Pavel Begunkov