On Sun, Aug 27, 2023 at 09:20:44PM +0200, Björn Töpel wrote: > Nam Cao <namcaov@xxxxxxxxx> writes: > > > On Sun, Aug 27, 2023 at 11:04:34AM +0200, Björn Töpel wrote: > >> Nam Cao <namcaov@xxxxxxxxx> writes: > >> > >> > On Sun, Aug 27, 2023 at 10:11:25AM +0200, Björn Töpel wrote: > >> >> The default implementation of is_trap_insn() which RISC-V is using calls > >> >> is_swbp_insn(), which is doing what your patch does. Your patch does not > >> >> address the issue. > >> > > >> > is_swbp_insn() does this: > >> > > >> > #ifdef CONFIG_RISCV_ISA_C > >> > return (*insn & 0xffff) == UPROBE_SWBP_INSN; > >> > #else > >> > return *insn == UPROBE_SWBP_INSN; > >> > #endif > >> > > >> > ...so it doesn't even check for 32-bit ebreak if C extension is on. My patch > >> > is not the same. > >> > >> Ah, was too quick. > >> > >> AFAIU uprobes *always* uses c.ebreak when CONFIG_RISCV_ISA_C is set, and > >> ebreak otherwise. That's the reason is_swbp_insn() is implemented like > >> that. > > > > That's what I understand too. > > > >> If that's not the case, there's a another bug, that your patches > >> addresses. > > > > I think it's a bug regardless. is_trap_insn() is used by uprobes to figure out > > if there is an instruction that generates trap exception, not just instructions > > that are "SWBP". The reason is because when there is a trap, but uprobes doesn't > > see a probe installed here, it needs is_trap_insn() to figure out if the trap > > is generated by ebreak from something else, or because the probe is just removed. > > In the latter case, uprobes will return back, because probe has already been removed, > > so it should be safe to do so. That's why I think the incorrect is_swbp_insn() > > would cause a hang, because uprobes incorrectly thinks there is no ebreak there, > > so it should be okay to go back, but there actually is. > > > > So, from my understanding, if uprobes encounter a 32-bit ebreak for any reason, > > the kernel would hang. I think your patch is a great addition nonetheless, but I > > am guessing that it only masks the problem by preventing uprobes from seeing the > > 32-bit ebreak in the specific test, not really solve it. So, if there is a 32-bit > > ebreak in userspace, the bug still causes the kernel to hang. > > > > I am still quite confident of my logic, so I would be very suprised if my fix > > doesn't solve the reported hang. Do you mind testing my patch? My potato of a > > laptop unfortunately cannot run the test :( > > Maybe I wasn't clear, sorry for that! I did take the patch for a spin, > and it did not solve this particular problem. Okay, thanks for the comfirmation! > When we're taking a trap from *kernel*mode, we should never deal with > uprobes at all. Have a look at uprobe_pre_sstep_notifier(), this > function returns 1, which then means that the trap handler exit > premature. > > The code you're referring to (called from uprobe_notify_resume()), and > will never be entered, because we're not exiting the trap to > userland. Have a look in kernel/entry/common.c (search for > e.g. TIF_UPROBE). I will think about this a bit and answer later. I will answer the below part first. > Now, for your concern, which I see as a potential different bug. Not at > all related to my issue "trap from kernelmode touches uprobe > incorrectly"; A "random" ebreak from *userland* is trapped, when uprobes > is enabled will set the kernel in a hang. I suggest you construct try to > write a simple program to reproduce this! > > I had a quick look in the uprobe handling code, and AFAIU the was used > when installing the uprobe as an additional check, and when searching > for an active uprobe. I'm still a bit puzzled how the issue you're > describing could trigger. A reproducer would help! I have just produced the problem, using this small program: .global _start _start: addi x0, x1, 0 addi x0, x1, 1 addi x0, x1, 2 .option push .option arch, -c ebreak .option pop ecall Compile that with riscv64-linux-gnu-gcc test.s -nostdlib -static -o ebreak And setup uprobes by: mount -t tracefs nodev /sys/kernel/tracing/ echo "p /ebreak:0x0000010c" > /sys/kernel/tracing/uprobe_events echo 1 > /sys/kernel/tracing/events/uprobes/enable (obviously you would have to edit the offset value to be _start symbol of your binary) Then I execute the program, and the kernel loop indefinitely (it keeps going in and out of exception handler). Then I apply my patch, then the kernel doesn't loop anymore. So I think it is a valid issue, and I will send a proper patch to fix this. Best regards, Nam