On Tue, Aug 08, 2017 at 01:09:08PM -0700, Andy Lutomirski wrote: > On Tue, Aug 8, 2017 at 12:13 PM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: > > On Tue, Aug 08, 2017 at 12:03:51PM -0700, Linus Torvalds wrote: > >> On Tue, Aug 8, 2017 at 11:58 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote: > >> > > >> > Take for example the lock_is_held_type() function. In vmlinux, it has > >> > the following instruction: > >> > > >> > callq *0xffffffff85a94880 (pv_irq_ops.save_fl) > >> > > >> > At runtime, that instruction is patched and replaced with a fast inline > >> > version of arch_local_save_flags() which eliminates the call: > >> > > >> > pushfq > >> > pop %rax > >> > > >> > The problem is when an interrupt hits after the push: > >> > > >> > pushfq > >> > --- irq --- > >> > pop %rax > >> > >> That should actually be something easily fixable, for an odd reason: > >> the instruction boundaries are different. > >> > >> > I'm not sure what the solution should be. It will probably need to be > >> > one of the following: > >> > > >> > a) either don't allow runtime "alternative" patches to mess with the > >> > stack pointer (objtool could enforce this); or > >> > > >> > b) come up with some way to register such patches with the ORC > >> > unwinder at runtime. > >> > >> c) just add ORC data for the alternative statically and _unconditionally_. > >> > >> No runtime registration. Just an unconditional entry for the > >> particular IP that comes after the "pushfq". It cannot match the > >> "callq" instruction, since it would be in the middle of that > >> instruction. > >> > >> Basically, just do a "union" of the ORC data for all the alternatives. > >> > >> Now, objtool should still verify that the instruction pointers for > >> alternatives are unique - or that they share the same ORC unwinder > >> information if they are not. > >> > >> But in cases like this, when the instruction boundaires are different, > >> things should "just work", with no need for any special cases. > >> > >> Hmm? > > > > Yeah, that might work. Objtool already knows about alternatives, so it > > might not be too hard. I'll try it. > > But this one's not an actual alternative, right? It's a pv op. > > I would advocate that we make it an alternative after all. I frickin' > hate the PV irq ops. It would like roughly like this: > > ALTERNATIVE "pushfq; popq %rax", "callq *pv_irq_ops.save_fl", > X86_FEATURE_GODDAMN_PV_IRQ_OPS > > (The obvious syntax error and the naming should probably be fixed. > Also, this needs to live in an #ifdef because it needs to build on > kernels with pv support. It should also properly register itself as a > pv patch site.) I've got a prototype of the above working, where vmlinux shows: pushfq pop %rax nop nop nop nop nop instead of: callq *0xffffffff81e3a400 (pv_irq_ops.save_fl) Which is nice because the vmlinux disassembly now matches the most common runtime cases (everything except Xen and vsmp). And it also fixes the upthread objtool issue. The only slight issue with the patches is that hypervisors need access to the pv ops much earlier than when alternatives are applied. So I had to add a new .pv_alternatives section for these pv ops alternatives, so they can be patched very early, if running in a hypervisor. Will clean up the code and post something relatively soon. -- Josh -- To unsubscribe from this list: send the line "unsubscribe live-patching" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html