Re: [RFC PATCH 0/3] cgroup: fsio throttle controller

Andrea Righi <righi.andrea@xxxxxxxxx> · Mon, 28 Jan 2019 18:41:29 +0100

Hi Vivek,

sorry for the late reply.

On Mon, Jan 21, 2019 at 04:47:15PM -0500, Vivek Goyal wrote:
> On Sat, Jan 19, 2019 at 11:08:27AM +0100, Andrea Righi wrote:
> 
> [..]
> > Alright, let's skip the root cgroup for now. I think the point here is
> > if we want to provide sync() isolation among cgroups or not.
> > 
> > According to the manpage:
> > 
> >        sync()  causes  all  pending  modifications  to filesystem metadata and cached file data to be
> >        written to the underlying filesystems.
> > 
> > And:
> >        According to the standard specification (e.g., POSIX.1-2001), sync() schedules the writes, but
> >        may  return  before  the actual writing is done.  However Linux waits for I/O completions, and
> >        thus sync() or syncfs() provide the same guarantees as fsync called on every file in the  sys‐
> >        tem or filesystem respectively.
> > 
> > Excluding the root cgroup, do you think a sync() issued inside a
> > specific cgroup should wait for I/O completions only for the writes that
> > have been generated by that cgroup?
> 
> Can we account I/O towards the cgroup which issued "sync" only if write
> rate of sync cgroup is higher than cgroup to which page belongs to. Will
> that solve problem, assuming its doable?

Maybe this would mitigate the problem, in part, but it doesn't solve it.

The thing is, if a dirty page belongs to a slow cgroup and a fast cgroup
issues "sync", the fast cgroup needs to wait a lot, because writeback is
happening at the speed of the slow cgroup.

Ideally in this case we should bump up the writeback speed, maybe even
temporarily inherit the write rate of the sync cgroup, similarly to a
priority-inversion locking scenario, but I think it's not doable at the
moment without applying big changes.

Or we could isolate the sync domain, meaning that a cgroup issuing a
sync will only wait for the syncing of the pages that belong to that
sync cgroup. But probably also this method requires big changes...

-Andrea