On Mon, Feb 11, 2019 at 09:40:29PM +0100, Andrea Righi wrote: > On Mon, Feb 11, 2019 at 10:39:34AM -0500, Josef Bacik wrote: > > On Sat, Feb 09, 2019 at 03:07:49PM +0100, Andrea Righi wrote: > > > This is an attempt to mitigate the priority inversion problem of a > > > high-priority blkcg issuing a sync() and being forced to wait the > > > completion of all the writeback I/O generated by any other low-priority > > > blkcg, causing massive latencies to processes that shouldn't be > > > I/O-throttled at all. > > > > > > The idea is to save a list of blkcg's that are waiting for writeback: > > > every time a sync() is executed the current blkcg is added to the list. > > > > > > Then, when I/O is throttled, if there's a blkcg waiting for writeback > > > different than the current blkcg, no throttling is applied (we can > > > probably refine this logic later, i.e., a better policy could be to > > > adjust the throttling I/O rate using the blkcg with the highest speed > > > from the list of waiters - priority inheritance, kinda). > > > > > > This topic has been discussed here: > > > https://lwn.net/ml/cgroups/20190118103127.325-1-righi.andrea@xxxxxxxxx/ > > > > > > But we didn't come up with any definitive solution. > > > > > > This patch is not a definitive solution either, but it's an attempt to > > > continue addressing this issue and handling the priority inversion > > > problem with sync() in a better way. > > > > > > Signed-off-by: Andrea Righi <righi.andrea@xxxxxxxxx> > > > > Talked with Tejun about this some and we agreed the following is probably the > > best way forward > > First of all thanks for the update! > > > > > 1) Track the submitter of the wb work to the writeback code. > > Are we going to track the cgroup that originated the dirty pages (or > maybe dirty inodes) or do you have any idea in particular? > The guy doing the sync(), so that way we can accomplish #3. But really this is an implementation detail, however you want to accomplish it is fine by me. > > 2) Sync() defaults to the root cg, and and it writes all the things as the root > > cg. > > OK. > > > 3) Add a flag to the cgroups that would make sync()'ers in that group only be > > allowed to write out things that belong to its group. > > So, IIUC, when this flag is enabled a cgroup that is doing sync() would > trigger the writeback of the pages that belong to that cgroup only and > it waits only for these pages to be sync-ed, right? In this case > writeback can still go at cgroup's speed. > > Instead when the flag is disabled, sync() would trigger writeback I/O > globally, as usual, and it goes at full speed (root cgroup's speed). > > Am I understanding correctly? > Yup that's exactly it. Thanks, Josef