On Wed, 24 Apr 2019 08:20:12 +0200 Nicolai Stange <nstange@xxxxxxx> wrote: > Alright, if there's a use case beyond live patching, I can try to handle > 32 bits alongside, of course. > > However, some care will be needed when determining the actual context > from ftrace_int3_handler(): tracing anything before the addition or > after the subtraction of HARDIRQ_OFFSET to/from preempt_count in > irq_enter() resp. irq_exit() could otherwise clobber the "normal" state > in thread_info, correct? > > How about having a fixed size stack with three entries shared between > "normal", "irq" and "softirq" in thread_info, as well as a dedicated > slot for nmi context? (The stack would be modified/consumed only with > irqs disabled). > > Which makes me wonder: > - if we disabled preemption from ftrace_int3_handler() and reenabled it > again from the *_int3_tramp*s, these stacks could be made per-CPU, > AFAICS, Heh, here you go again, having some of the ideas I came up with as well. I thought about this use per-cpu variables and this preempt disable hack as well. But thinking more on it, it goes into the "too crazy, and too fragile" category. It breaks a lot of norms (but I do that a lot anyway ;) and decided that it would probably not be accepted by others. I much rather use the above thread info solution. As for the irq_enter/exit() issue, instead of using the preempt_count context, we could just have a counter in the thread info as well: depth = local_add_return(thread_info->ftrace_int_depth); thread_info->slot[depth] = *(pt_regs->sp); Where thread_info->slot is an array of 4 long words. And then in the trampoline: (pseudo code) push %rax // just a place holder push %rax push %rdx mov thread->depth, %rax mov thread->slot(%rax), %rdx mov %rdx, 16(%rsp) add -1,%rax mov %rax, thread->depth pop %rdx pop %rax jmp ftrace_caller If an interrupt were to come in at any time, it would just increment the depth, use the next slot, and then decrement it back to what the depth was when it interrupted the code. -- Steve > - and why not use this strategy on x86_64, too? The advantages would be > a common implemention between 32 and 64 bit and more importantly, not > relying on that %r11 hack. When doing the initial RFC patch, it > wasn't clear to me that at most one stack slot per context would be > needed, i.e. only four in total...