On Tue, Nov 19, 2024 at 2:36 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > On Tue, Nov 19, 2024 at 11:30 AM Pasha Tatashin > <pasha.tatashin@xxxxxxxxxx> wrote: > > > > On Tue, Nov 19, 2024 at 1:23 PM Roman Gushchin <roman.gushchin@xxxxxxxxx> wrote: > > > > > > On Tue, Nov 19, 2024 at 10:08:36AM -0500, Pasha Tatashin wrote: > > > > On Mon, Nov 18, 2024 at 8:09 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > On Mon, Nov 18, 2024 at 05:08:42PM -0500, Pasha Tatashin wrote: > > > > > > Additionally, using crash/drgn is not feasible for us at this time, it > > > > > > requires keeping external tools on our hosts, also it requires > > > > > > approval and a security review for each script before deployment in > > > > > > our fleet. > > > > > > > > > > So it's ok to add a totally insecure kernel feature to your fleet > > > > > instead? You might want to reconsider that policy decision :) > > > > > > > > Hi Greg, > > > > > > > > While some risk is inherent, we believe the potential for abuse here > > > > is limited, especially given the existing CAP_SYS_ADMIN requirement. > > > > But, even with root access compromised, this tool presents a smaller > > > > attack surface than alternatives like crash/drgn. It exposes less > > > > sensitive information, unlike crash/drgn, which could potentially > > > > allow reading all of kernel memory. > > > > > > The problem here is with using dmesg for output. No security-sensitive > > > information should go there. Even exposing raw kernel pointers is not > > > considered safe. > > > > I am OK in writing the output to a debugfs file in the next version, > > the only concern I have is that implies that dump_page() would need to > > be basically duplicated, as it now outputs everything via printk's. > > Perhaps you can refactor the code in dump_page() to use a seq_buf, > then have dump_page() printk that seq_buf using seq_buf_do_printk(), > and have page detective output that seq_buf to the debugfs file? Good idea, I will look into modifying it this way. > We do something very similar with memory_stat_format(). We use the void mem_cgroup_print_oom_meminfo(struct mem_cgroup *memcg) { /* Use static buffer, for the caller is holding oom_lock. */ static char buf[PAGE_SIZE]; .... seq_buf_init(&s, buf, sizeof(buf)); memory_stat_format(memcg, &s); seq_buf_do_printk(&s, KERN_INFO); } This is a callosal stack allocation, given that our fleet only has 8K stacks. :-) > same function to generate the memcg stats in a seq_buf, then we use > that seq_buf to output the stats to memory.stat as well as the OOM > log.