On Mon, Feb 03, 2020 at 03:34:50PM -0500, Johannes Weiner wrote: > On Mon, Feb 03, 2020 at 10:25:06AM -0800, Roman Gushchin wrote: > > On Mon, Feb 03, 2020 at 12:58:18PM -0500, Johannes Weiner wrote: > > > On Mon, Jan 27, 2020 at 09:34:37AM -0800, Roman Gushchin wrote: > > > > Currently s8 type is used for per-cpu caching of per-node statistics. > > > > It works fine because the overfill threshold can't exceed 125. > > > > > > > > But if some counters are in bytes (and the next commit in the series > > > > will convert slab counters to bytes), it's not gonna work: > > > > value in bytes can easily exceed s8 without exceeding the threshold > > > > converted to bytes. So to avoid overfilling per-cpu caches and breaking > > > > vmstats correctness, let's use s32 instead. > > > > > > > > This doesn't affect per-zone statistics. There are no plans to use > > > > zone-level byte-sized counters, so no reasons to change anything. > > > > > > Wait, is this still necessary? AFAIU, the node counters will account > > > full slab pages, including free space, and only the memcg counters > > > that track actual objects will be in bytes. > > > > > > Can you please elaborate? > > > > It's weird to have a counter with the same name (e.g. NR_SLAB_RECLAIMABLE_B) > > being in different units depending on the accounting scope. > > So I do convert all slab counters: global, per-lruvec, > > and per-memcg to bytes. > > Since the node counters tracks allocated slab pages and the memcg > counter tracks allocated objects, arguably they shouldn't use the same > name anyway. > > > Alternatively I can fork them, e.g. introduce per-memcg or per-lruvec > > NR_SLAB_RECLAIMABLE_OBJ > > NR_SLAB_UNRECLAIMABLE_OBJ > > Can we alias them and reuse their slots? > > /* Reuse the node slab page counters item for charged objects */ > MEMCG_SLAB_RECLAIMABLE = NR_SLAB_RECLAIMABLE, > MEMCG_SLAB_UNRECLAIMABLE = NR_SLAB_UNRECLAIMABLE, Yeah, lgtm. Isn't MEMCG_ prefix bad because it makes everybody think that it belongs to the enum memcg_stat_item? > > > and keep global counters untouched. If going this way, I'd prefer to make > > them per-memcg, because it will simplify things on charging paths: > > now we do get task->mem_cgroup->obj_cgroup in the pre_alloc_hook(), > > and then obj_cgroup->mem_cgroup in the post_alloc_hook() just to > > bump per-lruvec counters. > > I don't quite follow. Don't you still have to update the global > counters? Global counters are updated only if an allocation requires a new slab page, which isn't the most common path. In generic case post_hook is required because it's the only place where we have both page (to get the node) and memcg pointer. If NR_SLAB_RECLAIMABLE is tracked only per-memcg (as MEMCG_SOCK), then post_hook can handle only the rare "allocation failed" case. I'm not sure here what's better. > > > Btw, I wonder if we really need per-lruvec counters at all (at least > > being enabled by default). For the significant amount of users who > > have a single-node machine it doesn't bring anything except performance > > overhead. > > Yeah, for single-node systems we should be able to redirect everything > to the memcg counters, without allocating and tracking lruvec copies. Sounds good. It can lead to significant savings on single-node machines. > > > For those who have multiple nodes (and most likely many many > > memory cgroups) it provides way too many data except for debugging > > some weird mm issues. > > I guess in the absolute majority of cases having global per-node + per-memcg > > counters will be enough. > > Hm? Reclaim uses the lruvec counters. Can you, please, provide some examples? It looks like it's mostly based on per-zone lruvec size counters. Anyway, it seems to be a little bit off from this patchset, so let's discuss it separately.