On Wed, Jan 16, 2019 at 6:41 PM Fam Zheng <zhengfeiran@xxxxxxxxxxxxx> wrote: > > > > > On Jan 17, 2019, at 05:06, Yang Shi <shy828301@xxxxxxxxx> wrote: > > > > On Tue, Jan 15, 2019 at 7:52 PM Fam Zheng <zhengfeiran@xxxxxxxxxxxxx> wrote: > >> > >> > >> > >>> On Jan 16, 2019, at 08:50, Yang Shi <shy828301@xxxxxxxxx> wrote: > >>> > >>> On Thu, Jan 10, 2019 at 12:30 AM Fam Zheng <zhengfeiran@xxxxxxxxxxxxx> wrote: > >>>> > >>>> > >>>> > >>>>> On Jan 10, 2019, at 13:36, Yang Shi <shy828301@xxxxxxxxx> wrote: > >>>>> > >>>>> On Sun, Jan 6, 2019 at 9:10 PM Fam Zheng <zhengfeiran@xxxxxxxxxxxxx> wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> On Jan 5, 2019, at 03:36, Yang Shi <shy828301@xxxxxxxxx> wrote: > >>>>>>> > >>>>>>> > >>>>>>> drop_caches would drop all page caches globally. You may not want to > >>>>>>> drop the page caches used by other memcgs. > >>>>>> > >>>>>> We’ve tried your async force_empty patch (with a modification to default it to true to make it transparently enabled for the sake of testing), and for the past few days the stale mem cgroups still accumulate, up to 40k. > >>>>>> > >>>>>> We’ve double checked that the force_empty routines are invoked when a mem cgroup is offlined. But this doesn’t look very effective so far. Because, once we do `echo 1 > /proc/sys/vm/drop_caches`, all the groups immediately go away. > >>>>>> > >>>>>> This is a bit unexpected. > >>>>>> > >>>>>> Yang, could you hint what are missing in the force_empty operation, compared to a blanket drop cache? > >>>>> > >>>>> Drop caches does invalidate pages inode by inode. But, memcg > >>>>> force_empty does call memcg direct reclaim. > >>>> > >>>> But force_empty touches things that drop_caches doesn’t? If so then maybe combining both approaches is more reliable. Since like you said, > >>> > >>> AFAICS, force_empty may unmap pages, but drop_caches doesn't. > >>> > >>>> dropping _all_ pages is usually too much thus not desired, we may want to somehow limit the dropped caches to those that are in the memory cgroup in question. What do you think? > >>> > >>> This is what force_empty is supposed to do. But, as your test shows > >>> some page cache may still remain after force_empty, then cause offline > >>> memcgs accumulated. I haven't figured out what happened. You may try > >>> what Michal suggested. > >> > >> None of the existing patches helped so far, but we suspect that the pages cannot be locked at the force_empty moment. We have being working on a “retry” patch which does solve the problem. We’ll do more tracing (to have a better understanding of the issue) and post the findings and/or the patch later. Thanks. > > > > You mean it solves the problem by retrying more times? Actually, I'm > > not sure if you have swap setup in your test, but force_empty does do > > swap if swap is on. This may cause it can't reclaim all the page cache > > in 5 retries. I have a patch within that series to skip swap. > > Basically yes, retrying solves the problem. But compared to immediate retries, a scheduled retry in a few seconds is much more effective. This may suggest doing force_empty in a worker is more effective in fact. Not sure if this is good enough to convince Johannes or not. > > We don’t have swap on. > > What do you mean by 5 retries? I’m still a bit lost in the LRU code and patches. MEM_CGROUP_RECLAIM_RETRIES is 5. Yang > > > > > Yang > > > >> > >> Fam > >> > >>> > >>> Yang > >>> > >>>> > >>>> > >>>>> > >>>>> Offlined memcgs will not go away if there is still page charged. Maybe > >>>>> relate to per cpu memcg stock. I recall there are some commits which > >>>>> do solve the per cpu page counter cache problem. > >>>>> > >>>>> 591edfb10a94 mm: drain memcg stocks on css offlining > >>>>> d12c60f64cf8 mm: memcontrol: drain memcg stock on force_empty > >>>>> bb4a7ea2b144 mm: memcontrol: drain stocks on resize limit > >>>>> > >>>>> Not sure if they would help out. > >>>> > >>>> These are all in 4.20, which is tested but not helpful. > >>>> > >>>> Fam >