On Thu, 02 Apr 2020 15:57:56 +1100 NeilBrown wrote: > > On Thu, Apr 02 2020, Hillf Danton wrote: > > > On Thu, 02 Apr 2020 10:53:20 +1100 NeilBrown wrote: > >>=20 > >> PF_LESS_THROTTLE exists for loop-back nfsd, and a similar need in the > >> loop block driver, where a daemon needs to write to one bdi in > >> order to free up writes queued to another bdi. > >>=20 > >> The daemon sets PF_LESS_THROTTLE and gets a larger allowance of dirty > >> pages, so that it can still dirty pages after other processses have been > >> throttled. > >>=20 > >> This approach was designed when all threads were blocked equally, > >> independently on which device they were writing to, or how fast it was. > >> Since that time the writeback algorithm has changed substantially with > >> different threads getting different allowances based on non-trivial > >> heuristics. This means the simple "add 25%" heuristic is no longer > >> reliable. > >>=20 > >> This patch changes the heuristic to ignore the global limits and > >> consider only the limit relevant to the bdi being written to. This > >> approach is already available for BDI_CAP_STRICTLIMIT users (fuse) and > >> should not introduce surprises. This has the desired result of > >> protecting the task from the consequences of large amounts of dirty data > >> queued for other devices. > >>=20 > >> This approach of "only consider the target bdi" is consistent with the > >> other use of PF_LESS_THROTTLE in current_may_throttle(), were it causes > >> attention to be focussed only on the target bdi. > >>=20 > >> So this patch > >> - renames PF_LESS_THROTTLE to PF_LOCAL_THROTTLE, > >> - remove the 25% bonus that that flag gives, and > >> - imposes 'strictlimit' handling for any process with PF_LOCAL_THROTTLE > >> set. > > > > /* > > * The strictlimit feature is a tool preventing mistrusted filesystems > > * from growing a large number of dirty pages before throttling. For > > > > Based on the comment snippet, I suspect it is applicable to IO flushers > > unless they are likely generating tons of dirty pages. If they are, > > however, cutting their bonuses seem questionable. > > The purpose of the strictlimit feature was to isolate one filesystem > (bdi) from all others, so that the one cannot create dirty pages which > unfairly disadvantage the others - this is what that comment says. > But the implementation appears to focus on the isolation, not the > specific purpose, and isolation works both ways. It protects the others > from the one, and the one from the others. > > fuse needs to be isolated so it doesn't harm others. > nfsd and loop need to be isolate so they aren't harmed by others. For those working in emergency services, extra N95 face masks and Covid-19 testing kits, say 25%, would be preserved, too, if isolation doesn't help them. > I'm less familiar with IO flushers but I suspect that have exactly the > same need as nfsd and loop - they need to be isolated from dirty pages > other than on the device they are writing to. > The 25% bonus was never about giving them a bonus because they need it. > It was about protecting them from excess usage elsewhere. For example, > I strongly > suspect that my change will provide a conceptually better service for IO > flushers. (whether it is better in a practical measurable sense I cannot > say, but I'd be surprised if it was worse).