On Mon, Feb 08, 2021 at 09:28:24AM -0800, Darrick J. Wong wrote: > On Mon, Feb 09, 2021 at 09:11:40AM -0800, Paul E. McKenney wrote: > > On Mon, Feb 08, 2021 at 10:44:58AM -0500, Brian Foster wrote: > > > There was a v2 inline that incorporated some directed feedback. > > > Otherwise there were questions and ideas about making the whole thing > > > faster, but I've no idea if that addresses the problem or not (if so, > > > that would be an entirely different set of patches). I'll wait and see > > > what Darrick thinks about this and rebase/repost if the approach is > > > agreeable.. > > > > There is always the school of thought that says that the best way to > > get people to focus on this is to rebase and repost. Otherwise, they > > are all too likely to assume that you lost interest in this. > > I was hoping that a better solution would emerge for clearing > PageWriteback on hundreds of thousands of pages, but nothing easy popped > out. > > The hardcoded threshold in "[PATCH v2 2/2] xfs: kick extra large ioends > to completion workqueue" gives me unease because who's to say if marking > 262,144 pages on a particular CPU will actually stall it long enough to > trip the hangcheck? Is the number lower on (say) some pokey NAS box > with a lot of storage but a slow CPU? It's also not the right thing to do given the IO completion workqueue is a bound workqueue. Anything that is doing large amounts of CPU intensive work should be on a unbound workqueue so that the scheduler can bounce it around different CPUs as needed. Quite frankly, the problem is a huge long ioend chain being built by the submission code. We need to keep ioend completion overhead down. It runs in either softirq or bound workqueue context and so individual items of work that are performed in this context must not be -unbounded- in size or time. Unbounded ioend chains are bad for IO latency, they are bad for memory reclaim and they are bad for CPU scheduling. As I've said previously, we gain nothing by aggregating ioends past a few tens of megabytes of submitted IO. The batching gains are completely diminished once we've got enough IO in flight to keep the submission queue full. We're talking here about gigabytes of sequential IOs in a single ioend chain which are 2-3 orders of magnitude larger than needed for optimal background IO submission and completion efficiency and throughput. IOWs, we really should be limiting the ioend chain length at submission time, not trying to patch over bad completion behaviour that results from sub-optimal IO submission behaviour... > That said, /some/ threshold is probably better than no threshold. Could > someone try to confirm if that series of Brian's fixes this problem too? 262144 pages is still too much work to be doing in a single softirq IO completion callback. It's likely to be too much work for a bound workqueue, too, especially when you consider that the workqueue completion code will merge sequential ioends into one ioend, hence making the IO completion loop counts bigger and latency problems worse rather than better... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx