On 14/01/25 13:13, Sean Christopherson wrote: > On Tue, Jan 14, 2025, Valentin Schneider wrote: >> text_poke_bp_batch() sends IPIs to all online CPUs to synchronize >> them vs the newly patched instruction. CPUs that are executing in userspace >> do not need this synchronization to happen immediately, and this is >> actually harmful interference for NOHZ_FULL CPUs. > > ... > >> This leaves us with static keys and static calls. > > ... > >> @@ -2317,11 +2334,20 @@ static void text_poke_bp_batch(struct text_poke_loc *tp, unsigned int nr_entries >> * First step: add a int3 trap to the address that will be patched. >> */ >> for (i = 0; i < nr_entries; i++) { >> - tp[i].old = *(u8 *)text_poke_addr(&tp[i]); >> - text_poke(text_poke_addr(&tp[i]), &int3, INT3_INSN_SIZE); >> + void *addr = text_poke_addr(&tp[i]); >> + >> + /* >> + * There's no safe way to defer IPIs for patching text in >> + * .noinstr, record whether there is at least one such poke. >> + */ >> + if (is_kernel_noinstr_text((unsigned long)addr)) >> + cond = NULL; > > Maybe pre-check "cond", especially if multiple ranges need to be checked? I.e. > > if (cond && is_kernel_noinstr_text(...)) >> + >> + tp[i].old = *((u8 *)addr); >> + text_poke(addr, &int3, INT3_INSN_SIZE); >> } >> >> - text_poke_sync(); >> + __text_poke_sync(cond); >> >> /* >> * Second step: update all but the first byte of the patched range. > > ... > >> +/** >> + * is_kernel_noinstr_text - checks if the pointer address is located in the >> + * .noinstr section >> + * >> + * @addr: address to check >> + * >> + * Returns: true if the address is located in .noinstr, false otherwise. >> + */ >> +static inline bool is_kernel_noinstr_text(unsigned long addr) >> +{ >> + return addr >= (unsigned long)__noinstr_text_start && >> + addr < (unsigned long)__noinstr_text_end; >> +} > > This doesn't do the right thing for modules, which matters because KVM can be > built as a module on x86, and because context tracking understands transitions > to GUEST mode, i.e. CPUs that are running in a KVM guest will be treated as not > being in the kernel, and thus will have IPIs deferred. If KVM uses a static key > or branch between guest_state_enter_irqoff() and guest_state_exit_irqoff(), the > patching code won't wait for CPUs to exit guest mode, i.e. KVM could theoretically > use the wrong static path. > AFAICT guest_state_{enter,exit}_irqoff() are only used in noinstr functions and thus such a static key usage should at the very least be caught and warned about by objtool - when this isn't built as a module. I never really thought about noinstr sections for modules; I can get objtool to warn about a non-noinstr allowed key being used in e.g. vmx_vcpu_enter_exit() just by feeding it the vmx.o: arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_vcpu_enter_exit.isra.0+0x0: dummykey: non-RO static key usage in noinstr ...but that requires removing a lot of code first because objtool stops earlier in its noinstr checks as it hits functions it doesn't have full information on, e.g. arch/x86/kvm/vmx/vmx.o: warning: objtool: vmx_vcpu_enter_exit+0x21c: call to __ct_user_enter() leaves .noinstr.text section __ct_user_enter() *is* noinstr, but you don't get that from just the header prototype. > I don't expect this to ever cause problems in practice, because patching code in > KVM's VM-Enter/VM-Exit path that has *functional* implications, while CPUs are > actively running guest code, would be all kinds of crazy. But I do think we > should plug the hole. > > If this issue is unique to KVM, i.e. is not a generic problem for all modules (I > assume module code generally isn't allowed in the entry path, even via NMI?), one > idea would be to let KVM register its noinstr section for text poking.