On Tue, Sep 22, 2020 at 12:09 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Tue 22-09-20 11:10:17, Shakeel Butt wrote: > > On Tue, Sep 22, 2020 at 9:55 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > [...] > > > Last but not least the memcg > > > background reclaim is something that should be possible without a new > > > interface. > > > > So, it comes down to adding more functionality/semantics to > > memory.high or introducing a new simple interface. I am fine with > > either of one but IMO convoluted memory.high might have a higher > > maintenance cost. > > One idea would be to schedule a background worker (which work on behalf > on the memcg) to do the high limit reclaim with high limit target as > soon as the high limit is reached. There would be one work item for each > memcg. Userspace would recheck the high limit on return to the userspace > and do the reclaim if the excess is larger than a threshold, and sleep > as the fallback. > > Excessive consumers would get throttled if the background work cannot > keep up with the charge pace and most of them would return without doing > any reclaim because there is somebody working on their behalf - and is > accounted for that. > > The semantic of high limit would be preserved IMHO because high limit is > actively throttled. Where that work is done shouldn't matter as long as > it is accounted properly and memcg cannot outsource all the work to the > rest of the system. > > Would something like that (with many details to be sorted out of course) > be feasible? This is exactly how our "per-memcg kswapd" works. The missing piece is how to account the background worker (it is a kernel work thread) properly as what we discussed before. You mentioned such work is WIP in earlier email of this thread, I think once this is done the per-memcg background worker could be supported easily. > > If we do not want to change the existing semantic of high and want a new > api then I think having another limit for the background reclaim then > that would make more sense to me. It would resemble the global reclaim > and kswapd model and something that would be easier to reason about. > Comparing to echo $N > reclaim which might mean to reclaim any number > pages around N. > -- > Michal Hocko > SUSE Labs