Re: [RFC 5/5] block: implement io_uring discard cmd

Ming Lei <ming.lei@xxxxxxxxxx> · Wed, 21 Aug 2024 10:55:06 +0800

On Tue, Aug 20, 2024 at 06:19:00PM +0100, Pavel Begunkov wrote:
> On 8/20/24 17:30, Jens Axboe wrote:
> > On 8/19/24 8:36 PM, Ming Lei wrote:
> > > On Mon, Aug 19, 2024 at 02:01:21PM -0600, Jens Axboe wrote:
> > > > On 8/15/24 7:45 PM, Ming Lei wrote:
> ...
> > > > > Meantime the handling has to move to io-wq for avoiding to block current
> > > > > context, the interface becomes same with IORING_OP_FALLOCATE?
> > > > 
> > > > I think the current truncate is overkill, we should be able to get by
> > > > without. And no, I will not entertain an option that's "oh just punt it
> > > > to io-wq".
> > > 
> > > BTW, the truncate is added by 351499a172c0 ("block: Invalidate cache on discard v2"),
> > > and block/009 serves as regression test for covering page cache
> > > coherency and discard.
> > > 
> > > Here the issue is actually related with the exclusive lock of
> > > filemap_invalidate_lock(). IMO, it is reasonable to prevent page read during
> > > discard for not polluting page cache. block/009 may fail too without the lock.
> > > 
> > > It is just that concurrent discards can't be allowed any more by
> > > down_write() of rw_semaphore, and block device is really capable of doing
> > > that. It can be thought as one regression of 7607c44c157d ("block: Hold invalidate_lock in
> > > BLKDISCARD ioctl").
> > > 
> > > Cc Jan Kara and Shin'ichiro Kawasaki.
> > 
> > Honestly I just think that's nonsense. It's like mixing direct and
> > buffered writes. Can you get corruption? Yes you most certainly can.
> > There should be no reason why we can't run discards without providing
> > page cache coherency. The sync interface attempts to do that, but that
> > doesn't mean that an async (or a different sync one, if that made sense)
> > should.
> 
> I don't see it as a problem either, it's a new interface, just need
> to be upfront on what guarantees it provides (one more reason why
> not fallocate), I'll elaborate on it in the commit message and so.

Fair enough.

> 
> I think a reasonable thing to do is to have one rule for all write-like
> operations starting from plain writes, which is currently allowing races
> to happen and shift it to the user. Purely in theory we can get inventive
> with likes of range lock trees, but that's unwarranted for all sorts of
> reasons.
> 
> > If you do discards to the same range as you're doing buffered IO, you
> > get to keep both potentially pieces. Fact is that most folks are doing
> > dio for performant IO exactly because buffered writes tend to be
> > horrible, and you could certainly use that with async discards and have
> > the application manage it just fine.
> > 
> > So I really think any attempts to provide page cache synchronization for
> > this is futile. And the existing sync one looks pretty abysmal, but it
> > doesn't really matter as it's a sync interfce. If one were to do
> 
> It should be a pain for sync as well, you can't even spin another process
> and parallelise this way.

Yes, this way has degraded some sync discard workloads perf a lot.

Thanks,
Ming