On Fri, Jul 12, 2019 at 1:29 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Fri 12-07-19 09:47:14, Yafang Shao wrote: > > On Fri, Jul 12, 2019 at 7:42 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > On Thu, 11 Jul 2019 09:32:59 -0400 Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > > > > After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"), > > > > the local VM counters is not in sync with the hierarchical ones. > > > > > > > > Bellow is one example in a leaf memcg on my server (with 8 CPUs), > > > > inactive_file 3567570944 > > > > total_inactive_file 3568029696 > > > > We can find that the deviation is very great, that is because the 'val' in > > > > __mod_memcg_state() is in pages while the effective value in > > > > memcg_stat_show() is in bytes. > > > > So the maximum of this deviation between local VM stats and total VM > > > > stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable > > > > great value. > > > > > > > > We should keep the local VM stats in sync with the total stats. > > > > In order to keep this behavior the same across counters, this patch updates > > > > __mod_lruvec_state() and __count_memcg_events() as well. > > > > > > hm. > > > > > > So the local counters are presently more accurate than the hierarchical > > > ones because the hierarchical counters use batching. And the proposal > > > is to make the local counters less accurate so that the inaccuracies > > > will match. > > > > > > It is a bit counter intuitive to hear than worsened accuracy is a good > > > thing! We're told that the difference may be "unacceptably great" but > > > we aren't told why. Some additional information to support this > > > surprising assertion would be useful, please. What are the use-cases > > > which are harmed by this difference and how are they harmed? > > > > > > > Hi Andrew, > > > > Both local counter and the hierachical one are exposed to user. > > In a leaf memcg, the local counter should be equal with the hierarchical one, > > if they are different, the user may wondering what's wrong in this memcg. > > IOW, the difference makes these counters not reliable, if they are not > > reliable we can't use them to help us anylze issues. > > But those numbers are in flight anyway. We do not stop updating them > while they are read so there is no guarantee they will be consistent > anyway, right? Right. They can't be guaranted to be consistent. When we read them, may only the local counters are updated and the hierarchical ones are not updated yet. But the current deviation is so great that can't be ignored. So the question is similar like what about increasing the MEMCG_CHARGE_BATCH from 32 to 32 * 4096 ? Thanks Yafang