On Mon, May 11, 2020 at 8:57 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Thu, May 07, 2020 at 10:00:07AM -0700, Shakeel Butt wrote: > > On Thu, May 7, 2020 at 9:47 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > > > On Thu 07-05-20 09:33:01, Shakeel Butt wrote: > > > [...] > > > > @@ -2600,8 +2596,23 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > > > > schedule_work(&memcg->high_work); > > > > break; > > > > } > > > > - current->memcg_nr_pages_over_high += batch; > > > > - set_notify_resume(current); > > > > + > > > > + if (gfpflags_allow_blocking(gfp_mask)) > > > > + reclaim_over_high(memcg, gfp_mask, batch); > > > > + > > > > + if (page_counter_read(&memcg->memory) <= > > > > + READ_ONCE(memcg->high)) > > > > + break; > > > > > > I am half way to a long weekend so bear with me. Shouldn't this be continue? The > > > parent memcg might be still in excess even the child got reclaimed, > > > right? > > > > > > > The reclaim_high() actually already does this walk up to the root and > > reclaim from ones who are still over their high limit. Though having > > 'continue' here is correct too. > > If reclaim was weak and failed to bring the child back in line, we > still do set_notify_resume(). We should do that for ancestors too. > > But it seems we keep adding hierarchy walks and it's getting somewhat > convoluted: page_counter does it, then we check high overage > recursively, and now we add the call to reclaim which itself is a walk > up the ancestor line. > > Can we hitchhike on the page_counter_try_charge() walk, which already > has the concept of identifying counters with overage? Rename the @fail > to @limited and return the first counter that is in excess of its high > as well, even when the function succeeds? > > Then we could ditch the entire high checking loop here and simply > replace it with > > done_restock: > ... > > if (*limited) { > if (gfpflags_allow_blocking()) > reclaim_over_high(memcg_from_counter(limited)); > /* Reclaim may not be able to do much, ... */ > set_notify_resume(); // or schedule_work() > }; > I will try to code the above and will give a shot to the following long-term suggestion as well. > In the long-term, the best thing might be to integrate memory.high > reclaim with the regular reclaim that try_charge() is already > doing. Especially the part where it retries several times - we > currently give up on memory.high unnecessarily early. Make > page_counter_try_charge() fail on high and max equally, and after > several reclaim cycles, instead of invoking the OOM killer, inject the > penalty sleep and force the charges. OOM killing and throttling is > supposed to be the only difference between the two, anyway, and yet > the code diverges far more than that for no apparent reason. > > But I also appreciate that this is a cleanup beyond the scope of this > patch here, so it's up to you how far you want to take it. Thanks, Shakeel