[ Cc'ing the proper folks ] -- Steve On Fri, 17 Jan 2025 11:36:05 +0100 Alexandre Ferrieux <alexandre.ferrieux@xxxxxxxxx> wrote: > Hi, > > Somewhere in the 6.13 branch (not bisected yet, sorry), it stopped being > possible to disassemble the running kernel from gdb through /proc/kcore. > > More precisely: > > - look up a function in /proc/kallsyms => 0xADDRESS > - tell gdb to "core /proc/kcore" > - tell gdb to "disass 0xADDRESS,+LENGTH" (no need for a symbol table) > > * if the function is within the main kernel text, it is okay > * if the function is within a module's text, an infinite loop happens: > > > Example: > > # egrep -w ice_process_skb_fields\|ksys_write /proc/kallsyms > ffffffffaf296c80 T ksys_write > ffffffffc0b67180 t ice_process_skb_fields [ice] > > # gdb -ex "core /proc/kcore" -ex "disass 0xffffffffaf296c80,+256" -ex quit > ... > Dump of assembler code from 0xffffffffaf296c80 to 0xffffffffaf296d80: > ... > End of assembler dump. > > # gdb -ex "core /proc/kcore" -ex "disass 0xffffffffc0b67180,+256" -ex quit > ... > Dump of assembler code from 0xffffffffc0b67180 to 0xffffffffc0b67280: > (***NOTHING***) > ^C <= inefficient, need kill -9 > > > Ftrace (see below) shows in this case read_kcore_iter() calls vread_iter() in an > infinite loop: > > while (true) { > read += vread_iter(iter, src, left); > if (read == tsz) > break; > > src += read; > left -= read; > > if (fault_in_iov_iter_writeable(iter, left)) { > ret = -EFAULT; > goto out; > } > } > > As it turns out, in the offending situation, vread_iter() keeps returning 0, > with "read" staying at its initial value of 0, and "tsz" nonzero. As a > consequence, "src" stays stuck in a place where vread_iter() fails. > > A cursory "git blame" shows that this interplay (vread_iter() legitimately > returning zero, and read_kcore_iter() *not* testing it) has been there from > quite some time. So, while this is arguably fragile, possibly the new situation > lies in the actual memory layout that triggers the failing path. > > Thanks for any insight, as this completely breaks debugging the running kernel > in 6.13. > > -Alex > > > ------------ > # tracer: nop > # > # entries-in-buffer/entries-written: 0/0 #P:48 > # > # TASK-PID CPU# TIMESTAMP FUNCTION > # | | | | | > <...>-3304 [045] 487.295283: kprobe_read_kcore_iter: > (read_kcore_iter+0x4/0xae0) pos=0x7fffc0b6b000 > <...>-3304 [045] 487.295298: kprobe_vread_iter: > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384 > <...>-3304 [045] 487.295326: kretprobe_vread_iter: > (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0 > <...>-3304 [045] 487.295329: kprobe_vread_iter: > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384 > <...>-3304 [045] 487.295338: kretprobe_vread_iter: > (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0 > <...>-3304 [045] 487.295339: kprobe_vread_iter: > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384 > <...>-3304 [045] 487.295345: kretprobe_vread_iter: > (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0 > <...>-3304 [045] 487.295347: kprobe_vread_iter: > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384 > <...>-3304 [045] 487.295352: kretprobe_vread_iter: > (read_kcore_iter+0x3e6/0xae0 <- vread_iter) arg1=0 > <...>-3304 [045] 487.295353: kprobe_vread_iter: > (vread_iter+0x4/0x4e0) addr=0xffffffffc0b67000 len=384 > ... >