On 9/23/21 4:31 PM, Kees Cook wrote:
The /proc/$pid/wchan file has been broken by default on x86_64 for 4
years now[1]. As this remains a potential leak of either kernel
addresses (when symbolization fails) or limited observation of kernel
function progress, just remove the contents for good.
Unconditionally set the contents to "0" and also mark the wchan
field in /proc/$pid/stat with 0.
Hi all,
It looks like there's already been pushback on this idea, but I wanted
to add another voice from a frequent user of /proc/$pid/wchan (via PS).
Much of my job involves diagnosing kernel issues and performance issues
on stable kernels, frequently on production systems where I can't do
anything too invasive. wchan is incredibly useful for these situations,
so much so that we store regular snapshots of ps output, and we expand
the size of the WCHAN column to fit more data (e.g. ps -e -o
pid,wchan=WCHAN-WIDE-COLUMN). Disabling wchan would remove a critical
tool for me and my team.
From my our team's feedback:
1. It's fine if this needs to have CAP_SYS_ADMIN to read for tasks not
owned by the calling user; and for non-admin, if the symbolization
fails, to return 0 just like kallsyms does for unprivileged users.
2. We don't care about the stack of an actively running process
(/proc/$pid/stack is there for that). We only need WCHAN for
understanding why a task is blocked.
3. Keeping the function / symbol name in the wchan is ideal (so we can
pinpoint the exact area that a task is blocked at).
This leaves kernel/sched/fair.c as the only user of get_wchan(). But
again, since this was broken for 4 years, was this profiling logic
actually doing anything useful?
This was only broken with CONFIG_UNWINDER_ORC. You may say this is the
default, but Ubuntu's latest kernel (5.11 in Hirsute) still ships with
CONFIG_UNWINDER_FRAME_POINTER, and many other distributions are the
same. Stable distributions have a lag time picking up new code, and even
longer lag picking up new configurations -- even new defaults.
(Especially when frame pointers are so useful for debugging...) So
saying that this was broken for 4 years is at best misleading. Plenty of
users have been happily using recent kernels when this was supposedly
"broken", on valid configurations, without any issues.
It looks like we've backed off of the decision to rip out
/proc/$pid/wchan, but I just wanted to chime in, since it feels like the
discussion is happening without much input from users.
Thanks,
Stephen