On Fri, Jul 17, 2020 at 1:04 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > On Wed, Jul 15, 2020 at 8:19 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > On Thu, Jul 16, 2020 at 12:36 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > > > > > Hi Yafang, > > > > > > On Tue, Mar 31, 2020 at 3:05 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > > > > > PSI gives us a powerful way to anaylze memory pressure issue, but we can > > > > make it more powerful with the help of tracepoint, kprobe, ebpf and etc. > > > > Especially with ebpf we can flexiblely get more details of the memory > > > > pressure. > > > > > > > > In orderc to achieve this goal, a new parameter is added into > > > > psi_memstall_{enter, leave}, which indicates the specific type of a > > > > memstall. There're totally ten memstalls by now, > > > > MEMSTALL_KSWAPD > > > > MEMSTALL_RECLAIM_DIRECT > > > > MEMSTALL_RECLAIM_MEMCG > > > > MEMSTALL_RECLAIM_HIGH > > > > MEMSTALL_KCOMPACTD > > > > MEMSTALL_COMPACT > > > > MEMSTALL_WORKINGSET_REFAULT > > > > MEMSTALL_WORKINGSET_THRASH > > > > MEMSTALL_MEMDELAY > > > > MEMSTALL_SWAPIO > > > > With the help of kprobe or tracepoint to trace this newly added agument we > > > > can know which type of memstall it is and then do corresponding > > > > improvement. I can also help us to analyze the latency spike caused by > > > > memory pressure. > > > > > > > > But note that we can't use it to build memory pressure for a specific type > > > > of memstall, e.g. memcg pressure, compaction pressure and etc, because it > > > > doesn't implement various types of task->in_memstall, e.g. > > > > task->in_memcgstall, task->in_compactionstall and etc. > > > > > > > > Although there're already some tracepoints can help us to achieve this > > > > goal, e.g. > > > > vmscan:mm_vmscan_kswapd_{wake, sleep} > > > > vmscan:mm_vmscan_direct_reclaim_{begin, end} > > > > vmscan:mm_vmscan_memcg_reclaim_{begin, end} > > > > /* no tracepoint for memcg high reclaim*/ > > > > compcation:mm_compaction_kcompactd_{wake, sleep} > > > > compcation:mm_compaction_begin_{begin, end} > > > > /* no tracepoint for workingset refault */ > > > > /* no tracepoint for workingset thrashing */ > > > > /* no tracepoint for use memdelay */ > > > > /* no tracepoint for swapio */ > > > > but psi_memstall_{enter, leave} gives us a unified entrance for all > > > > types of memstall and we don't need to add many begin and end tracepoints > > > > that hasn't been implemented yet. > > > > > > > > Patch #2 gives us an example of how to use it with ebpf. With the help of > > > > ebpf we can trace a specific task, application, container and etc. It also > > > > can help us to analyze the spread of latencies and whether they were > > > > clustered at a point of time or spread out over long periods of time. > > > > > > > > To summarize, with the pressure data in /proc/pressure/memroy we know that > > > > the system is under memory pressure, and then with the newly added tracing > > > > facility in this patchset we can get the reason of this memory pressure, > > > > and then thinks about how to make the change. > > > > The workflow can be illustrated as bellow. > > > > > > > > REASON ACTION > > > > | compcation | improve compcation | > > > > | vmscan | improve vmscan | > > > > Memory pressure -| workingset | improve workingset | > > > > | etc | ... | > > > > > > > > > > I have not looked at the patch series in detail but I wanted to get > > > your thoughts if it is possible to achieve what I am trying to do with > > > this patch series. > > > > > > At the moment I am only interested in global reclaim and I wanted to > > > enable alerts like "alert if there is process stuck in global reclaim > > > for x seconds in last y seconds window" or "alert if all the processes > > > are stuck in global reclaim for some z seconds". > > > > > > I see that using this series I can identify global reclaim but I am > > > wondering if alert or notifications are possible. Android is using psi > > > monitors for such alerts but it does not use cgroups, so, most of the > > > memstalls are related to global reclaim stall. For cgroup environment, > > > do we need for add support to psi monitor similar to this patch > > > series? > > > > > > > Hi Shakeel, > > > > We use the PSI tracepoints in our kernel to analyze the individual > > latency caused by memory pressure, but the PSI tracepoints are > > implemented with a new version as bellow: > > trace_psi_memstall_enter(_RET_IP_); > > trace_psi_memstall_leave(_RET_IP_); > > And then using the _RET_IP_ to identify the specific PSI type. > > > > If the _RET_IP_ is at try_to_free_mem_cgroup_pages(), then it means > > the pressure caused by the memory cgroup, IOW, the limit of memcg is > > reached and it has to do memcg reclaim. Otherwise we can consider it > > as global memory pressure. > > try_to_free_mem_cgroup_pages > > psi_memstall_enter > > if (static_branch_likely(&psi_disabled)) > > return; > > *flags = current->in_memstall; > > if (*flags) > > return; > > trace_psi_memstall_enter(_RET_IP_); <<<<< memcg pressure > > > > Thanks for the response. I am looking for 'always on' monitoring. More > specifically defining the system level SLIs based on PSI. My concern > with ftrace is its global shared state and also it is not really for > 'always on' monitoring. You have mentioned ebpf. Is ebpf fine for > 'always on' monitoring and is it possible to notify user space by ebpf > on specific conditions (e.g. a process stuck in global reclaim for 60 > seconds)? > ebpf is fine for 'always on' monitoring from my experience, but I'm not sure whether it is possible to notify user space on specific conditions. Notifying user space would be a useful feature, so I think we can have a try. -- Thanks Yafang