On 27/09/17 23:08, Josh Poimboeuf wrote: > On Tue, Aug 08, 2017 at 01:09:08PM -0700, Andy Lutomirski wrote: >> On Tue, Aug 8, 2017 at 12:13 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: >>> On Tue, Aug 08, 2017 at 12:03:51PM -0700, Linus Torvalds wrote: >>>> On Tue, Aug 8, 2017 at 11:58 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: >>>>> >>>>> Take for example the lock_is_held_type() function. In vmlinux, it has >>>>> the following instruction: >>>>> >>>>> callq *0xffffffff85a94880 (pv_irq_ops.save_fl) >>>>> >>>>> At runtime, that instruction is patched and replaced with a fast inline >>>>> version of arch_local_save_flags() which eliminates the call: >>>>> >>>>> pushfq >>>>> pop %rax >>>>> >>>>> The problem is when an interrupt hits after the push: >>>>> >>>>> pushfq >>>>> --- irq --- >>>>> pop %rax >>>> >>>> That should actually be something easily fixable, for an odd reason: >>>> the instruction boundaries are different. >>>> >>>>> I'm not sure what the solution should be. It will probably need to be >>>>> one of the following: >>>>> >>>>> a) either don't allow runtime "alternative" patches to mess with the >>>>> stack pointer (objtool could enforce this); or >>>>> >>>>> b) come up with some way to register such patches with the ORC >>>>> unwinder at runtime. >>>> >>>> c) just add ORC data for the alternative statically and _unconditionally_. >>>> >>>> No runtime registration. Just an unconditional entry for the >>>> particular IP that comes after the "pushfq". It cannot match the >>>> "callq" instruction, since it would be in the middle of that >>>> instruction. >>>> >>>> Basically, just do a "union" of the ORC data for all the alternatives. >>>> >>>> Now, objtool should still verify that the instruction pointers for >>>> alternatives are unique - or that they share the same ORC unwinder >>>> information if they are not. >>>> >>>> But in cases like this, when the instruction boundaires are different, >>>> things should "just work", with no need for any special cases. >>>> >>>> Hmm? >>> >>> Yeah, that might work. Objtool already knows about alternatives, so it >>> might not be too hard. I'll try it. >> >> But this one's not an actual alternative, right? It's a pv op. >> >> I would advocate that we make it an alternative after all. I frickin' >> hate the PV irq ops. It would like roughly like this: >> >> ALTERNATIVE "pushfq; popq %rax", "callq *pv_irq_ops.save_fl", >> X86_FEATURE_GODDAMN_PV_IRQ_OPS >> >> (The obvious syntax error and the naming should probably be fixed. >> Also, this needs to live in an #ifdef because it needs to build on >> kernels with pv support. It should also properly register itself as a >> pv patch site.) > > I've got a prototype of the above working, where vmlinux shows: > > pushfq > pop %rax > nop > nop > nop > nop > nop > > instead of: > > callq *0xffffffff81e3a400 (pv_irq_ops.save_fl) > > Which is nice because the vmlinux disassembly now matches the most > common runtime cases (everything except Xen and vsmp). And it also > fixes the upthread objtool issue. > > The only slight issue with the patches is that hypervisors need access > to the pv ops much earlier than when alternatives are applied. So I had > to add a new .pv_alternatives section for these pv ops alternatives, so > they can be patched very early, if running in a hypervisor. > > Will clean up the code and post something relatively soon. > Are you combining alternatives and pvops then? I'm asking because in an up and running system under Xen the callq *... will be replaced with a much faster "call xen_save_fl". This should still be the case after your patch. Juergen -- To unsubscribe from this list: send the line "unsubscribe live-patching" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html