On 28/09/17 15:55, Josh Poimboeuf wrote: > On Thu, Sep 28, 2017 at 08:03:26AM +0200, Juergen Gross wrote: >> On 27/09/17 23:08, Josh Poimboeuf wrote: >>> On Tue, Aug 08, 2017 at 01:09:08PM -0700, Andy Lutomirski wrote: >>>> On Tue, Aug 8, 2017 at 12:13 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: >>>>> On Tue, Aug 08, 2017 at 12:03:51PM -0700, Linus Torvalds wrote: >>>>>> On Tue, Aug 8, 2017 at 11:58 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: >>>>>>> >>>>>>> Take for example the lock_is_held_type() function. In vmlinux, it has >>>>>>> the following instruction: >>>>>>> >>>>>>> callq *0xffffffff85a94880 (pv_irq_ops.save_fl) >>>>>>> >>>>>>> At runtime, that instruction is patched and replaced with a fast inline >>>>>>> version of arch_local_save_flags() which eliminates the call: >>>>>>> >>>>>>> pushfq >>>>>>> pop %rax >>>>>>> >>>>>>> The problem is when an interrupt hits after the push: >>>>>>> >>>>>>> pushfq >>>>>>> --- irq --- >>>>>>> pop %rax >>>>>> >>>>>> That should actually be something easily fixable, for an odd reason: >>>>>> the instruction boundaries are different. >>>>>> >>>>>>> I'm not sure what the solution should be. It will probably need to be >>>>>>> one of the following: >>>>>>> >>>>>>> a) either don't allow runtime "alternative" patches to mess with the >>>>>>> stack pointer (objtool could enforce this); or >>>>>>> >>>>>>> b) come up with some way to register such patches with the ORC >>>>>>> unwinder at runtime. >>>>>> >>>>>> c) just add ORC data for the alternative statically and _unconditionally_. >>>>>> >>>>>> No runtime registration. Just an unconditional entry for the >>>>>> particular IP that comes after the "pushfq". It cannot match the >>>>>> "callq" instruction, since it would be in the middle of that >>>>>> instruction. >>>>>> >>>>>> Basically, just do a "union" of the ORC data for all the alternatives. >>>>>> >>>>>> Now, objtool should still verify that the instruction pointers for >>>>>> alternatives are unique - or that they share the same ORC unwinder >>>>>> information if they are not. >>>>>> >>>>>> But in cases like this, when the instruction boundaires are different, >>>>>> things should "just work", with no need for any special cases. >>>>>> >>>>>> Hmm? >>>>> >>>>> Yeah, that might work. Objtool already knows about alternatives, so it >>>>> might not be too hard. I'll try it. >>>> >>>> But this one's not an actual alternative, right? It's a pv op. >>>> >>>> I would advocate that we make it an alternative after all. I frickin' >>>> hate the PV irq ops. It would like roughly like this: >>>> >>>> ALTERNATIVE "pushfq; popq %rax", "callq *pv_irq_ops.save_fl", >>>> X86_FEATURE_GODDAMN_PV_IRQ_OPS >>>> >>>> (The obvious syntax error and the naming should probably be fixed. >>>> Also, this needs to live in an #ifdef because it needs to build on >>>> kernels with pv support. It should also properly register itself as a >>>> pv patch site.) >>> >>> I've got a prototype of the above working, where vmlinux shows: >>> >>> pushfq >>> pop %rax >>> nop >>> nop >>> nop >>> nop >>> nop >>> >>> instead of: >>> >>> callq *0xffffffff81e3a400 (pv_irq_ops.save_fl) >>> >>> Which is nice because the vmlinux disassembly now matches the most >>> common runtime cases (everything except Xen and vsmp). And it also >>> fixes the upthread objtool issue. >>> >>> The only slight issue with the patches is that hypervisors need access >>> to the pv ops much earlier than when alternatives are applied. So I had >>> to add a new .pv_alternatives section for these pv ops alternatives, so >>> they can be patched very early, if running in a hypervisor. >>> >>> Will clean up the code and post something relatively soon. >>> >> >> Are you combining alternatives and pvops then? I'm asking because in an >> up and running system under Xen the callq *... will be replaced with a >> much faster "call xen_save_fl". This should still be the case after >> your patch. > > Right, it's not combining alternatives and pv ops, it's just adding > another step. So first the > > pushfq; pop %rax > > is replaced with > > callq *pv_irq_ops.save_fl > > and then later, after the xen ops structs are finalized, it's replaced > with > > callq xen_save_fl > Okay, thanks for confirming. Juergen -- To unsubscribe from this list: send the line "unsubscribe live-patching" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html