Re: [PATCH] mm/memcg: add allocstall to memory.stat

Yafang Shao <laoar.shao@xxxxxxxxx> · Fri, 12 Apr 2019 17:29:04 +0800

On Fri, Apr 12, 2019 at 5:09 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
>
> On Fri 12-04-19 16:10:29, Yafang Shao wrote:
> > On Fri, Apr 12, 2019 at 2:34 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > >
> > > On Fri 12-04-19 09:32:55, Yafang Shao wrote:
> > > > On Thu, Apr 11, 2019 at 11:10 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Thu 11-04-19 21:54:22, Yafang Shao wrote:
> > > > > > On Thu, Apr 11, 2019 at 9:39 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Thu 11-04-19 20:41:32, Yafang Shao wrote:
> > > > > > > > On Thu, Apr 11, 2019 at 8:27 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote:
> > > > > > > > >
> > > > > > > > > On Thu 11-04-19 19:59:51, Yafang Shao wrote:
> > > > > > > > > > The current item 'pgscan' is for pages in the memcg,
> > > > > > > > > > which indicates how many pages owned by this memcg are scanned.
> > > > > > > > > > While these pages may not scanned by the taskes in this memcg, even for
> > > > > > > > > > PGSCAN_DIRECT.
> > > > > > > > > >
> > > > > > > > > > Sometimes we need an item to indicate whehter the tasks in this memcg
> > > > > > > > > > under memory pressure or not.
> > > > > > > > > > So this new item allocstall is added into memory.stat.
> > > > > > > > >
> > > > > > > > > We do have memcg events for that purpose and those can even tell whether
> > > > > > > > > the pressure is a result of high or hard limit. Why is this not
> > > > > > > > > sufficient?
> > > > > > > > >
> > > > > > > >
> > > > > > > > The MEMCG_HIGH and MEMCG_LOW may not be tiggered by the tasks in this
> > > > > > > > memcg neither.
> > > > > > > > They all reflect the memory status of a memcg, rather than tasks
> > > > > > > > activity in this memcg.
> > > > > > >
> > > > > > > I do not follow. Can you give me an example when does this matter? I
> > > > > >
> > > > > > For example, the tasks in this memcg may encounter direct page reclaim
> > > > > > due to system memory pressure,
> > > > > > meaning it is stalling in page alloc slow path.
> > > > > > At the same time, maybe there's no memory pressure in this memcg, I
> > > > > > mean, it could succussfully charge memcg.
> > > > >
> > > > > And that is exactly what those events aim for. They are measuring
> > > > > _where_ the memory pressure comes from.
> > > > >
> > > > > Can you please try to explain what do you want to achieve again?
> > > >
> > > > To know the impact of this memory pressure.
> > > > The current events can tell us the source of this pressure, but can't
> > > > tell us the impact of this pressure.
> > >
> > > Can you give me a more specific example how you are going to use this
> > > counter in a real life please?
> >
> > When we find this counter is higher, we know that the applications in
> > this memcg is suffering memory pressure.
>
> We do have pgscan/pgsteal counters that tell you that the memcg is being
> reclaimed. If you see those numbers increasing then you know there is a
> memory pressure. Along with reclaim events you can tell wehther this is
> internal or external memory pressure. Sure you cannot distinguish
> kaswapd from the direct reclaim but is this really so important? You have
> other means to find out that the direct reclaim is happening and more
> importantly a higher latency might be a result of kswapd reclaiming
> memory as well (swap in or an expensive pagein from a remote storage
> etc.).
>
> The reason why I do not really like the new counter as you implemented
> it is that it mixes task/memcg scopes. Say you are hitting the memcg
> direct reclaim in a memcg A but the task is deeper in the A's hierarchy.
> Unless I have misread your patch it will be B to account for allocstall
> while it is the A's hierarchy to get directly reclaimed. B doesn't even
> have to be reclaimed at all if we manage to reclaim other others. So
> this is really confusing.
>

I have to admire that it really mixes task/memcg scopes,
so let's drop this part.

> > Then we can do some trace for this memcg, i.e. to trace how long the
> > applicatons may stall via tracepoint.
> > (but current tracepoints can't trace a specified cgroup only, that's
> > another point to be improved.)
>
> It is a task that is stalled, not a cgroup.
>

But these tracepoints can't filter a speficied task neither.

Thanks
Yafang