On Fri, Sep 24, 2021 at 02:08:45AM +0200, Jann Horn wrote: > On Fri, Sep 24, 2021 at 1:59 AM Vito Caputo <vcaputo@xxxxxxxxxxx> wrote: > > On Thu, Sep 23, 2021 at 04:31:05PM -0700, Kees Cook wrote: > > > The /proc/$pid/wchan file has been broken by default on x86_64 for 4 > > > years now[1]. As this remains a potential leak of either kernel > > > addresses (when symbolization fails) or limited observation of kernel > > > function progress, just remove the contents for good. > > > > > > Unconditionally set the contents to "0" and also mark the wchan > > > field in /proc/$pid/stat with 0. > > > > > > This leaves kernel/sched/fair.c as the only user of get_wchan(). But > > > again, since this was broken for 4 years, was this profiling logic > > > actually doing anything useful? > > > > > > [1] https://lore.kernel.org/lkml/20210922001537.4ktg3r2ky3b3r6yp@treble/ > > > > > > Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx> > > > Cc: Vito Caputo <vcaputo@xxxxxxxxxxx> > > > Signed-off-by: Kees Cook <keescook@xxxxxxxxxxxx> > > <snip> > > > > > > Please don't deliberately break WCHANs wholesale. This is a very > > useful tool for sysadmins to get a vague sense of where processes are > > spending time in the kernel on production systems without affecting > > performance or having to restart things under instrumentation. > > Wouldn't /proc/$pid/stack be more useful for that anyway? As long as > you have root privileges, you can read that to get the entire stack, > not just a single method name. > > (By the way, I guess that might be an alternative to ripping wchan out > completely - require CAP_SYS_ADMIN like for /proc/$pid/stack?) WCHAN is a first-class concept of the OS. As a result we have long-standing useful tools exposing them in far more organized, documented, and discoverable ways than poking around linux-specific /proc files at the shell. Even `top` can show WCHAN in a column alongside everything else it exposes, complete with sorting etc, and I've already demonstrated the support in `ps`. I also think it's worth preserving the ability for regular users to observe the WCHAN of their own processes. It's unclear to me why this is such a worry. If the WCHAN as-implemented is granular enough to expose too much kernel inner workings, then it should be watered down to be more vague. Even if it just said "ioctl" when a process was blocked in D state through making an ioctl() it would still be much more useful than saying nothing at all. Can't regular users see this much about their own processes via strace/gdb anyways? Instead of unwinding stacks maybe the kernel should be sticking an entrypoint address in the current task struct for get_wchan() to access, whenever userspace enters the kernel? Regards, Vito Caputo