On Fri, Mar 08, 2019 at 12:22:20PM -0500, Josef Bacik wrote: > On Thu, Mar 07, 2019 at 07:08:31PM +0100, Andrea Righi wrote: > > = Problem = > > > > When sync() is executed from a high-priority cgroup, the process is forced to > > wait the completion of the entire outstanding writeback I/O, even the I/O that > > was originally generated by low-priority cgroups potentially. > > > > This may cause massive latencies to random processes (even those running in the > > root cgroup) that shouldn't be I/O-throttled at all, similarly to a classic > > priority inversion problem. > > > > This topic has been previously discussed here: > > https://patchwork.kernel.org/patch/10804489/ > > > > Sorry to move the goal posts on you again Andrea, but Tejun and I talked about > this some more offline. > > We don't want cgroup to become the arbiter of correctness/behavior here. We > just want it to be isolating things. > > For you that means you can drop the per-cgroup flag stuff, and only do the > priority boosting for multiple sync(2) waiters. That is a real priority > inversion that needs to be fixed. io.latency and io.max are capable of noticing > that a low priority group is going above their configured limits and putting > pressure elsewhere accordingly. Alright, so IIUC that means we just need patch 1/3 for now (with the per-bdi lock instead of the global lock). If that's the case I'll focus at that patch then. > > Tejun said he'd rather see the sync(2) isolation be done at the namespace level. > That way if you have fs namespacing you are already isolated to your namespace. > If you feel like tackling that then hooray, but that's a separate dragon to slay > so don't feel like you have to right now. Makes sense. I can take a look and see what I can do after posting the new patch with the priority inversion fix only. > > This way we keep cgroup doing its job, controlling resources. Then we allow > namespacing to do its thing, isolating resources. Thanks, > > Josef Looks like a good plan to me. Thanks for the update. -Andrea