The patch titled update x86_64-mm-xen-use-iret-directly-where-possible has been added to the -mm tree. Its filename is update-x86_64-mm-xen-use-iret-directly-where-possible.patch *** Remember to use Documentation/SubmitChecklist when testing your code *** See http://www.zip.com.au/~akpm/linux/patches/stuff/added-to-mm.txt to find out what to do about this ------------------------------------------------------ Subject: update x86_64-mm-xen-use-iret-directly-where-possible From: Jeremy Fitzhardinge <jeremy@xxxxxxxx> There's only a minor code change from the version you've got, but the comments are more accurate. Signed-off-by: Jeremy Fitzhardinge <jeremy@xxxxxxxxxxxxx> Cc: Andi Kleen <ak@xxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- arch/i386/xen/xen-asm.S | 56 +++++++++++++++++++++++++------------- 1 files changed, 37 insertions(+), 19 deletions(-) diff -puN arch/i386/xen/xen-asm.S~update-x86_64-mm-xen-use-iret-directly-where-possible arch/i386/xen/xen-asm.S --- a/arch/i386/xen/xen-asm.S~update-x86_64-mm-xen-use-iret-directly-where-possible +++ a/arch/i386/xen/xen-asm.S @@ -108,14 +108,28 @@ ENDPATCH(xen_restore_fl_direct) 4: cs esp-> 0: eip - This attempts to make sure that any pending events are dealt with - on return to usermode, but there is a small window in which an event - can happen just before entering usermode. This has three effects: - - There can be interrupt recursion on the stack, which is - unbounded in theory (but very unlikely in practice) - - New softirq events can be queued up, but they won't get - processed until the cpu next enters and leaves the kernel. - - Signals likewise. + This attempts to make sure that any pending events are dealt + with on return to usermode, but there is a small window in + which an event can happen just before entering usermode. If + the nested interrupt ends up setting one of the TIF_WORK_MASK + pending work flags, they will not be tested again before + returning to usermode. This means that a process can end up + with pending work, which will be unprocessed until the process + enters and leaves the kernel again, which could be an + unbounded amount of time. This means that a pending signal or + reschedule event could be indefinitely delayed. + + The fix is to notice a nested interrupt in the critical + window, and if one occurs, then fold the nested interrupt into + the current interrupt stack frame, and re-process it + iteratively rather than recursively. This means that it will + exit via the normal path, and all pending work will be dealt + with appropriately. + + Because the nested interrupt handler needs to deal with the + current stack state in whatever form its in, we keep things + simple by only using a single register which is pushed/popped + on the stack. Non-direct iret could be done in the same way, but it would require an annoying amount of code duplication. We'll assume @@ -127,9 +141,6 @@ ENTRY(xen_iret_direct) testl $(X86_EFLAGS_VM | XEN_EFLAGS_NMI), 8(%esp) jnz hyper_iret - /* check IF state we're restoring */ - testb $X86_EFLAGS_IF>>8, 8+1(%esp) - push %eax ESP_OFFSET=4 # bytes pushed onto stack @@ -144,6 +155,9 @@ ENTRY(xen_iret_direct) movl $per_cpu__xen_vcpu_info, %eax #endif + /* check IF state we're restoring */ + testb $X86_EFLAGS_IF>>8, 8+1+ESP_OFFSET(%esp) + /* Maybe enable events. Once this happens we could get a recursive event, so the critical region starts immediately afterwards. However, if that happens we don't end up @@ -187,7 +201,7 @@ hyper_iret: The stack format at this point is: ---------------- - ss : + ss : (ss/esp may be present if we came from usermode) esp : eflags } outer exception info cs } @@ -219,17 +233,21 @@ hyper_iret: The only caveat is that if the outer eax hasn't been restored yet (ie, it's still on stack), we need to insert its value into the SAVE_ALL state before going on, since - its usermode state which we eventually need to restore. + it's usermode state which we eventually need to restore. */ ENTRY(xen_iret_crit_fixup) /* offsets +4 for return address */ - /* Paranoia: make sure we're really coming from userspace. - Once could imagine a case where userspace jumps into - the critical range address, but just before the CPU - delivers a GP, it decides to deliver an interrupt - instead. Unlikely? Definitely. Easy to avoid? - Yes. (Some virtual environments get this wrong.) */ + /* + Paranoia: Make sure we're really coming from userspace. + One could imagine a case where userspace jumps into the + critical range address, but just before the CPU delivers a GP, + it decides to deliver an interrupt instead. Unlikely? + Definitely. Easy to avoid? Yes. The Intel documents + explicitly say that the reported EIP for a bad jump is the + jump instruction itself, not the destination, but some virtual + environments get this wrong. + */ movl PT_CS+4(%esp), %ecx andl $SEGMENT_RPL_MASK, %ecx cmpl $USER_RPL, %ecx _ Patches currently in -mm which might be from jeremy@xxxxxxxx are git-kbuild.patch add-kstrndup-fix.patch xen-build-fix.patch fix-x86_64-mm-xen-xen-smp-guest-support.patch more-fix-x86_64-mm-xen-xen-smp-guest-support.patch fix-x86_64-mm-xen-add-xen-virtual-block-device-driver.patch fix-x86_64-mm-add-common-orderly_poweroff.patch tidy-up-usermode-helper-waiting-a-bit-fix.patch update-x86_64-mm-xen-use-iret-directly-where-possible.patch x86-use-elfnoteh-to-generate-vsyscall-notes-fix.patch paravirt-helper-to-disable-all-io-space-fix-2.patch paravirt-helper-to-disable-all-io-space-fix-3.patch maps2-uninline-some-functions-in-the-page-walker.patch maps2-eliminate-the-pmd_walker-struct-in-the-page-walker.patch maps2-remove-vma-from-args-in-the-page-walker.patch maps2-propagate-errors-from-callback-in-page-walker.patch maps2-add-callbacks-for-each-level-to-page-walker.patch maps2-move-the-page-walker-code-to-lib.patch maps2-simplify-interdependence-of-proc-pid-maps-and-smaps.patch maps2-move-clear_refs-code-to-task_mmuc.patch maps2-regroup-task_mmu-by-interface.patch maps2-make-proc-pid-smaps-optional-under-config_embedded.patch maps2-make-proc-pid-clear_refs-option-under-config_embedded.patch maps2-add-proc-pid-pagemap-interface.patch maps2-add-proc-kpagemap-interface.patch add-argv_split-fix.patch add-common-orderly_poweroff-fix.patch lguest-the-guest-code.patch - To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html