On Fri, Apr 12, 2019 at 5:09 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Fri 12-04-19 16:10:29, Yafang Shao wrote: > > On Fri, Apr 12, 2019 at 2:34 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > > > On Fri 12-04-19 09:32:55, Yafang Shao wrote: > > > > On Thu, Apr 11, 2019 at 11:10 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > > > > > > > On Thu 11-04-19 21:54:22, Yafang Shao wrote: > > > > > > On Thu, Apr 11, 2019 at 9:39 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > On Thu 11-04-19 20:41:32, Yafang Shao wrote: > > > > > > > > On Thu, Apr 11, 2019 at 8:27 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > > On Thu 11-04-19 19:59:51, Yafang Shao wrote: > > > > > > > > > > The current item 'pgscan' is for pages in the memcg, > > > > > > > > > > which indicates how many pages owned by this memcg are scanned. > > > > > > > > > > While these pages may not scanned by the taskes in this memcg, even for > > > > > > > > > > PGSCAN_DIRECT. > > > > > > > > > > > > > > > > > > > > Sometimes we need an item to indicate whehter the tasks in this memcg > > > > > > > > > > under memory pressure or not. > > > > > > > > > > So this new item allocstall is added into memory.stat. > > > > > > > > > > > > > > > > > > We do have memcg events for that purpose and those can even tell whether > > > > > > > > > the pressure is a result of high or hard limit. Why is this not > > > > > > > > > sufficient? > > > > > > > > > > > > > > > > > > > > > > > > > The MEMCG_HIGH and MEMCG_LOW may not be tiggered by the tasks in this > > > > > > > > memcg neither. > > > > > > > > They all reflect the memory status of a memcg, rather than tasks > > > > > > > > activity in this memcg. > > > > > > > > > > > > > > I do not follow. Can you give me an example when does this matter? I > > > > > > > > > > > > For example, the tasks in this memcg may encounter direct page reclaim > > > > > > due to system memory pressure, > > > > > > meaning it is stalling in page alloc slow path. > > > > > > At the same time, maybe there's no memory pressure in this memcg, I > > > > > > mean, it could succussfully charge memcg. > > > > > > > > > > And that is exactly what those events aim for. They are measuring > > > > > _where_ the memory pressure comes from. > > > > > > > > > > Can you please try to explain what do you want to achieve again? > > > > > > > > To know the impact of this memory pressure. > > > > The current events can tell us the source of this pressure, but can't > > > > tell us the impact of this pressure. > > > > > > Can you give me a more specific example how you are going to use this > > > counter in a real life please? > > > > When we find this counter is higher, we know that the applications in > > this memcg is suffering memory pressure. > > We do have pgscan/pgsteal counters that tell you that the memcg is being > reclaimed. If you see those numbers increasing then you know there is a > memory pressure. Along with reclaim events you can tell wehther this is > internal or external memory pressure. Sure you cannot distinguish > kaswapd from the direct reclaim but is this really so important? You have > other means to find out that the direct reclaim is happening and more > importantly a higher latency might be a result of kswapd reclaiming > memory as well (swap in or an expensive pagein from a remote storage > etc.). > > The reason why I do not really like the new counter as you implemented > it is that it mixes task/memcg scopes. Say you are hitting the memcg > direct reclaim in a memcg A but the task is deeper in the A's hierarchy. > Unless I have misread your patch it will be B to account for allocstall > while it is the A's hierarchy to get directly reclaimed. B doesn't even > have to be reclaimed at all if we manage to reclaim other others. So > this is really confusing. > I have to admire that it really mixes task/memcg scopes, so let's drop this part. > > Then we can do some trace for this memcg, i.e. to trace how long the > > applicatons may stall via tracepoint. > > (but current tracepoints can't trace a specified cgroup only, that's > > another point to be improved.) > > It is a task that is stalled, not a cgroup. > But these tracepoints can't filter a speficied task neither. Thanks Yafang