On 08/08/17 22:09, Andy Lutomirski wrote: > On Tue, Aug 8, 2017 at 12:13 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: >> On Tue, Aug 08, 2017 at 12:03:51PM -0700, Linus Torvalds wrote: >>> On Tue, Aug 8, 2017 at 11:58 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: >>>> >>>> Take for example the lock_is_held_type() function. In vmlinux, it has >>>> the following instruction: >>>> >>>> callq *0xffffffff85a94880 (pv_irq_ops.save_fl) >>>> >>>> At runtime, that instruction is patched and replaced with a fast inline >>>> version of arch_local_save_flags() which eliminates the call: >>>> >>>> pushfq >>>> pop %rax >>>> >>>> The problem is when an interrupt hits after the push: >>>> >>>> pushfq >>>> --- irq --- >>>> pop %rax >>> >>> That should actually be something easily fixable, for an odd reason: >>> the instruction boundaries are different. >>> >>>> I'm not sure what the solution should be. It will probably need to be >>>> one of the following: >>>> >>>> a) either don't allow runtime "alternative" patches to mess with the >>>> stack pointer (objtool could enforce this); or >>>> >>>> b) come up with some way to register such patches with the ORC >>>> unwinder at runtime. >>> >>> c) just add ORC data for the alternative statically and _unconditionally_. >>> >>> No runtime registration. Just an unconditional entry for the >>> particular IP that comes after the "pushfq". It cannot match the >>> "callq" instruction, since it would be in the middle of that >>> instruction. >>> >>> Basically, just do a "union" of the ORC data for all the alternatives. >>> >>> Now, objtool should still verify that the instruction pointers for >>> alternatives are unique - or that they share the same ORC unwinder >>> information if they are not. >>> >>> But in cases like this, when the instruction boundaires are different, >>> things should "just work", with no need for any special cases. >>> >>> Hmm? >> >> Yeah, that might work. Objtool already knows about alternatives, so it >> might not be too hard. I'll try it. > > But this one's not an actual alternative, right? It's a pv op. > > I would advocate that we make it an alternative after all. I frickin' > hate the PV irq ops. It would like roughly like this: > > ALTERNATIVE "pushfq; popq %rax", "callq *pv_irq_ops.save_fl", > X86_FEATURE_GODDAMN_PV_IRQ_OPS You are aware that at least some of the Xen irq pvops functionality is patched inline? Your modification would slow down pv guests quite a bit, I guess. > (The obvious syntax error and the naming should probably be fixed. > Also, this needs to live in an #ifdef because it needs to build on > kernels with pv support. It should also properly register itself as a > pv patch site.) > > Semi-serious question: can we maybe delete lguest and 32-bit Xen PV > support some time soon? As far as I know, 32-bit Xen PV *hosts* are > all EOL and have no security support, 32-bit Xen PV guest dom0 may not > work (I've never tried, but it would certainly be nutty on a 64-bit > hypervisor), and lguest is, um, not seriously maintained any more. [1] Hmm, I suggested drop of lguest support about 3 months ago and got no response. OTOH there was no objection either. :-) Regarding 32 bit Xen PV guests: even 32 bit dom0 is supposed to work. In case you want to drop support in Linux you might ask that question on xen-devel@xxxxxxxxxxxxxxxxxxxx (with PVH support for guests nearly complete your chances might be >0 to succeed). Juergen -- To unsubscribe from this list: send the line "unsubscribe live-patching" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html