On Wed, Sep 25, 2019 at 08:08:59AM -0400, Brian Foster wrote: > On Wed, Sep 25, 2019 at 08:29:01AM +1000, Dave Chinner wrote: > > That's in commit 80168676ebfe ("xfs: force background CIL push under > > sustained load") which went into 2.6.38 or so. The cause of the > > problem in that case was concurrent transaction commit load causing > > lock contention and preventing a background push from getting the > > context lock to do the actual push. > > > > More related to the next patch, but what prevents a similar but > generally unbound concurrent workload from exceeding the new hard limit > once transactions start to block post commit? The new code, like the original code, is not actually a "hard" limit. It's essentially just throttles ongoing work until the CIL push starts. In this case, it forces the current process to give up the CPU immediately once over the CIL high limit, which allows the workqueue to run the push work straight away. I thought about making it a "hard limit" by blocking before the CIL insert, but that's no guarantee that by the time we get woken and add the new commit to the CIL that this new context has not already gone over the hard limit. i.e. we block the unbound concurrency before commit, then let it all go in a thundering herd on the new context and immeidately punch that way over the hard threshold again. To avoid this, we'd probably need a CIL ticket and grant mechanism to make CIL insertion FIFO and wakeups limited by remaining space in the CIL. I'm not sure we actually need such a complex solution, especially considering the potential serialisation problems it introduces in what is a highly concurrent fast path... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx