On Tue, 7 May 2019 14:50:26 +0000 David Laight <David.Laight@xxxxxxxxxx> wrote: > From: Steven Rostedt > > Sent: 07 May 2019 14:14 > > On Tue, 7 May 2019 12:57:15 +0000 > > David Laight <David.Laight@xxxxxxxxxx> wrote: > The 'user' (ie the kernel code that needs to emulate the call) doesn't > write the data to the stack, just to some per-cpu location. > (Actually it could be on the stack at the other end of pt-regs.) > So you get to the 'register restore and iret' code with the stack unaltered. > It is then a SMOP to replace the %flags saved by the int3 with the %ip > saved by the int3, the %ip with the address of the function to call, > restore the flags (push and popf) and issue a ret.f to remove the %ip and %cs. How would you handle NMIs doing the same thing? Yes, the NMI handlers have breakpoints that will need to emulated calls as well. > > (Actually you need to add 4 to the callers %ip address to allow for the > difference between the size of int3 (hopefully 0xcc, not 0xcd 0x3).) > > > > > For 32bit 'the gap' happens naturally when building a 5 entry frame. Yes > > > > it is possible to build a 5 entry frame on top of the old 3 entry one, > > > > but why bother... > > > > > > Presumably there is 'horrid' code to generate the gap in 64bit mode? > > > (less horrid than 32bit, but still horrid?) > > > Or does it copy the entire pt_regs into a local stack frame and use > > > that for the iret? > > > > On x86_64, the gap is only done for int3 and nothing else, thus it is > > much less horrid. That's because x86_64 has a sane pt_regs storage for > > all exceptions. > > Well, in particular, it always loads %sp as part of the iret. > So you can create a gap and the cpu will remove it for you. > > In 64bit mode you could overwrite the %ss with the return address > to the caller restore %eax and %flags, push the function address > and use ret.n to jump to the function subtracting the right amount > from %esp. > > Actually that means you can do the following in both modes: > if not emulated_call_address then pop %ax; iret else > # assume kernel<->kernel return > push emulated_call_address; > push flags_saved_by_int3 > load %ax, return_address_from_iret > add %ax,#4 > store %ax, first_stack_location_written_by_int3 > load %ax, value_saved_by_int3_entry > popf > ret.n > > The ret.n discards everything from the %ax to the required return address. > So 'n' is the size of the int3 frame, so 12 for i386 and 40 for amd64. > > If the register restore (done just before this code) finished with > 'add %sp, sizeof *pt_regs' then the emulated_call_address can be > loaded in %ax from the other end of pt_regs. > > This all reminds me of fixing up the in-kernel faults that happen > when loading the user segment registers during 'return to user' > fault in kernel space. This all sounds much more complex and fragile than the proposed solution. Why would we do this over what is being proposed? -- Steve