On Tue, Aug 20, 2024 at 06:19:00PM +0100, Pavel Begunkov wrote: > On 8/20/24 17:30, Jens Axboe wrote: > > On 8/19/24 8:36 PM, Ming Lei wrote: > > > On Mon, Aug 19, 2024 at 02:01:21PM -0600, Jens Axboe wrote: > > > > On 8/15/24 7:45 PM, Ming Lei wrote: > ... > > > > > Meantime the handling has to move to io-wq for avoiding to block current > > > > > context, the interface becomes same with IORING_OP_FALLOCATE? > > > > > > > > I think the current truncate is overkill, we should be able to get by > > > > without. And no, I will not entertain an option that's "oh just punt it > > > > to io-wq". > > > > > > BTW, the truncate is added by 351499a172c0 ("block: Invalidate cache on discard v2"), > > > and block/009 serves as regression test for covering page cache > > > coherency and discard. > > > > > > Here the issue is actually related with the exclusive lock of > > > filemap_invalidate_lock(). IMO, it is reasonable to prevent page read during > > > discard for not polluting page cache. block/009 may fail too without the lock. > > > > > > It is just that concurrent discards can't be allowed any more by > > > down_write() of rw_semaphore, and block device is really capable of doing > > > that. It can be thought as one regression of 7607c44c157d ("block: Hold invalidate_lock in > > > BLKDISCARD ioctl"). > > > > > > Cc Jan Kara and Shin'ichiro Kawasaki. > > > > Honestly I just think that's nonsense. It's like mixing direct and > > buffered writes. Can you get corruption? Yes you most certainly can. > > There should be no reason why we can't run discards without providing > > page cache coherency. The sync interface attempts to do that, but that > > doesn't mean that an async (or a different sync one, if that made sense) > > should. > > I don't see it as a problem either, it's a new interface, just need > to be upfront on what guarantees it provides (one more reason why > not fallocate), I'll elaborate on it in the commit message and so. Fair enough. > > I think a reasonable thing to do is to have one rule for all write-like > operations starting from plain writes, which is currently allowing races > to happen and shift it to the user. Purely in theory we can get inventive > with likes of range lock trees, but that's unwarranted for all sorts of > reasons. > > > If you do discards to the same range as you're doing buffered IO, you > > get to keep both potentially pieces. Fact is that most folks are doing > > dio for performant IO exactly because buffered writes tend to be > > horrible, and you could certainly use that with async discards and have > > the application manage it just fine. > > > > So I really think any attempts to provide page cache synchronization for > > this is futile. And the existing sync one looks pretty abysmal, but it > > doesn't really matter as it's a sync interfce. If one were to do > > It should be a pain for sync as well, you can't even spin another process > and parallelise this way. Yes, this way has degraded some sync discard workloads perf a lot. Thanks, Ming