On Tue, Aug 10, 2021 at 01:33:28PM +0100, Christoph Hellwig wrote: > On Tue, Aug 10, 2021 at 01:09:45PM +0100, Matthew Wilcox wrote: > > On Tue, Aug 10, 2021 at 09:15:28AM +0100, Christoph Hellwig wrote: > > > Stupid question, but where do we ever do page cache interaction from > > > soft irq context? > > > > test_clear_page_writeback() happens in _some_ interrupt context (ie > > the io completion path). We had been under the impression that it was > > always actually softirq context, and so this patch was safe. However, > > it's now clear that some drivers are calling it from hardirq context. > > Writeback completions are clearly not latency sensitive and so can > > be delayed from hardirq to softirq context without any problem, so I > > think fixing this is just going to be a matter of tagging requests as > > "complete in softirq context" and ensuring that blk_mq_raise_softirq() > > is called for them. > > > > Assuming that DIO write completions _are_ latency-sensitive, of course. > > Maybe all write completions could be run in softirqs. > > I really don't really see any benefit in introducing softirqs into > the game. The benefit is not having to disable interrupts while manipulating the page cache, eg delete_from_page_cache_batch(). > If we want to simplify the locking and do not care too much > about latency, we should just defer to workqueue/thread context. It's not a bad idea. I thought BH would be the better place for it because it wouldn't require scheduling in a task. If we are going to schedule in a task though, can we make it the task which submitted the I/O (assuming it still exists), or do we not have the infrastructure for that? > For example XFS already does that for all writeback except for pure > overwrites. Those OTOH can be latency critical for O_SYNC writes, but > you're apparently looking into that already. To my mind if you've asked for O_SYNC, you've asked for bad performance. The writethrough improvement that I'm working on skips dirtying the page, but still marks the page as writeback so that we don't submit overlapping writes to the device. The O_SYNC write will wait for the writeback to finish, so it'll still be delayed by one additional scheduling event ... unless we run the write completion in the context of this task.