On Fri, Jan 18, 2019 at 06:07:45PM +0100, Paolo Valente wrote: > > > > Il giorno 18 gen 2019, alle ore 17:35, Josef Bacik <josef@xxxxxxxxxxxxxx> ha scritto: > > > > On Fri, Jan 18, 2019 at 11:31:24AM +0100, Andrea Righi wrote: > >> This is a redesign of my old cgroup-io-throttle controller: > >> https://lwn.net/Articles/330531/ > >> > >> I'm resuming this old patch to point out a problem that I think is still > >> not solved completely. > >> > >> = Problem = > >> > >> The io.max controller works really well at limiting synchronous I/O > >> (READs), but a lot of I/O requests are initiated outside the context of > >> the process that is ultimately responsible for its creation (e.g., > >> WRITEs). > >> > >> Throttling at the block layer in some cases is too late and we may end > >> up slowing down processes that are not responsible for the I/O that > >> is being processed at that level. > > > > How so? The writeback threads are per-cgroup and have the cgroup stuff set > > properly. So if you dirty a bunch of pages, they are associated with your > > cgroup, and then writeback happens and it's done in the writeback thread > > associated with your cgroup and then that is throttled. Then you are throttled > > at balance_dirty_pages() because the writeout is taking longer. > > > > IIUC, Andrea described this problem: certain processes in a certain group dirty a > lot of pages, causing write back to start. Then some other blameless > process in the same group experiences very high latency, in spite of > the fact that it has to do little I/O. > In that case the io controller isn't doing it's job and needs to be fixed (or reconfigured). io.latency guards against this, I assume io.max would keep this from happening if it were configured properly. > Does your blk_cgroup_congested() stuff solves this issue? > > Or simply I didn't get what Andrea meant at all :) > I _think_ Andrea is talking about the fact that we can generate IO indirectly and never get throttled for it, which is what blk_cgroup_congested() is meant to address. I added it specifically because some low prio task was just allocating all of the memory on the system and causing a lot of pressure because of swapping, but there was no direct feedback loop there. blk_cgroup_congested() provides that feedback loop. Course I could be wrong too and we're all just talking past each other ;). Thanks, Josef