On Fri, Jul 12, 2019 at 2:53 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Fri 12-07-19 14:12:30, Yafang Shao wrote: > > On Fri, Jul 12, 2019 at 1:29 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > > > On Fri 12-07-19 09:47:14, Yafang Shao wrote: > > > > On Fri, Jul 12, 2019 at 7:42 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > On Thu, 11 Jul 2019 09:32:59 -0400 Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > > > > > > > > After commit 815744d75152 ("mm: memcontrol: don't batch updates of local VM stats and events"), > > > > > > the local VM counters is not in sync with the hierarchical ones. > > > > > > > > > > > > Bellow is one example in a leaf memcg on my server (with 8 CPUs), > > > > > > inactive_file 3567570944 > > > > > > total_inactive_file 3568029696 > > > > > > We can find that the deviation is very great, that is because the 'val' in > > > > > > __mod_memcg_state() is in pages while the effective value in > > > > > > memcg_stat_show() is in bytes. > > > > > > So the maximum of this deviation between local VM stats and total VM > > > > > > stats can be (32 * number_of_cpu * PAGE_SIZE), that may be an unacceptable > > > > > > great value. > > > > > > > > > > > > We should keep the local VM stats in sync with the total stats. > > > > > > In order to keep this behavior the same across counters, this patch updates > > > > > > __mod_lruvec_state() and __count_memcg_events() as well. > > > > > > > > > > hm. > > > > > > > > > > So the local counters are presently more accurate than the hierarchical > > > > > ones because the hierarchical counters use batching. And the proposal > > > > > is to make the local counters less accurate so that the inaccuracies > > > > > will match. > > > > > > > > > > It is a bit counter intuitive to hear than worsened accuracy is a good > > > > > thing! We're told that the difference may be "unacceptably great" but > > > > > we aren't told why. Some additional information to support this > > > > > surprising assertion would be useful, please. What are the use-cases > > > > > which are harmed by this difference and how are they harmed? > > > > > > > > > > > > > Hi Andrew, > > > > > > > > Both local counter and the hierachical one are exposed to user. > > > > In a leaf memcg, the local counter should be equal with the hierarchical one, > > > > if they are different, the user may wondering what's wrong in this memcg. > > > > IOW, the difference makes these counters not reliable, if they are not > > > > reliable we can't use them to help us anylze issues. > > > > > > But those numbers are in flight anyway. We do not stop updating them > > > while they are read so there is no guarantee they will be consistent > > > anyway, right? > > > > Right. > > They can't be guaranted to be consistent. > > When we read them, may only the local counters are updated and the > > hierarchical ones are not updated yet. > > But the current deviation is so great that can't be ignored. > > Is really 32 pages per cpu all that great? > As I has pointed out in the commit log, the local inactive_file is 3567570944 while the total_inactive_file is 3568029696, and the difference between these two values are 458752. > Please note that I am not objecting to the patch (yet) because I didn't > get to think about it thoroughly but I do agree with Andrew that the > changelog should state the exact problem including why it matters. > I do agree that inconsistencies are confusing but maybe we just need to > document the existing behavior better. I'm not sure whether document it is enough or not. What about removing all the hierarchical counters if this is a leaf memcg ? Don't calculate the hierarchical counters nor display them if this is a leaf memcg, I don't know whether it is worth to do. Thanks Yafang