On Thu, Feb 22, 2024 at 9:59 AM Carlos Galo <carlosgalo@xxxxxxxxxx> wrote: > > On Thu, Feb 22, 2024 at 6:16 AM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > On Wed 21-02-24 13:30:51, Carlos Galo wrote: > > > On Tue, Feb 20, 2024 at 11:55 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > > > > > Hi, > > > > sorry I have missed this before. > > > > > > > > On Thu 11-01-24 21:05:30, Carlos Galo wrote: > > > > > The current implementation of the mark_victim tracepoint provides only > > > > > the process ID (pid) of the victim process. This limitation poses > > > > > challenges for userspace tools that need additional information > > > > > about the OOM victim. The association between pid and the additional > > > > > data may be lost after the kill, making it difficult for userspace to > > > > > correlate the OOM event with the specific process. > > > > > > > > You are correct that post OOM all per-process information is lost. On > > > > the other hand we do dump all this information to the kernel log. Could > > > > you explain why that is not suitable for your purpose? > > > > > > Userspace tools often need real-time visibility into OOM situations > > > for userspace intervention. Our use case involves utilizing BPF > > > programs, along with BPF ring buffers, to provide OOM notification to > > > userspace. Parsing kernel logs would be significant overhead as > > > opposed to the event based BPF approach. > > > > Please put that into the changelog. > > Will do. > > > > > > In order to mitigate this limitation, add the following fields: > > > > > > > > > > - UID > > > > > In Android each installed application has a unique UID. Including > > > > > the `uid` assists in correlating OOM events with specific apps. > > > > > > > > > > - Process Name (comm) > > > > > Enables identification of the affected process. > > > > > > > > > > - OOM Score > > > > > Allows userspace to get additional insights of the relative kill > > > > > priority of the OOM victim. > > > > > > > > What is the oom score useful for? > > > > > > > The OOM score provides us a measure of the victim's importance. On the > > > android side, it allows us to identify if top or foreground apps are > > > killed, which have user perceptible impact. > > > > But the value on its own (wihtout knowing scores of other tasks) doesn't > > really tell you anything, does it? > > Android uses the OOM adj_score ranges to categorize app state > (foreground, background, ...). I'll resend a v3 with the commit text > updated to include details about this. > > > > > Is there any reason to provide a different information from the one > > > > reported to the kernel log? > > > > __oom_kill_process: > > > > pr_err("%s: Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB, UID:%u pgtables:%lukB oom_score_adj:%hd\n", > > > > message, task_pid_nr(victim), victim->comm, K(mm->total_vm), > > > > K(get_mm_counter(mm, MM_ANONPAGES)), > > > > K(get_mm_counter(mm, MM_FILEPAGES)), > > > > K(get_mm_counter(mm, MM_SHMEMPAGES)), > > > > from_kuid(&init_user_ns, task_uid(victim)), > > > > mm_pgtables_bytes(mm) >> 10, victim->signal->oom_score_adj); > > > > > > > > > > We added these fields we need (UID, process name, and OOM score), but > > > we're open to adding the others if you prefer that for consistency > > > with the kernel log. > > > > yes, I think the consistency would be better here. For one it reports > > numbers which can tell quite a lot about the killed victim. It is a > > superset of what you already asking for. With a notable exception of the > > oom_score which is really dubious without a wider context. > > Sounds good, I'll resend a v3 that includes these fields as well. > > Thanks, > Carlos > I posted V3 here: https://lore.kernel.org/all/20240223173258.174828-1-carlosgalo@xxxxxxxxxx/ Thanks, Carlos > > -- > > Michal Hocko > > SUSE Labs