On Thu, Jan 16, 2020 at 07:38:54AM -0800, Sean Christopherson wrote: > On Wed, Jan 15, 2020 at 08:08:32PM -0500, Derek Yerger wrote: > > On 1/15/20 4:52 PM, Sean Christopherson wrote: > > >+cc Derek, who is hitting the same thing. > > > > > >On Wed, Jan 15, 2020 at 09:18:56PM +0000, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > > >>https://bugzilla.kernel.org/show_bug.cgi?id=206215 > > >*snip* > > >that's a big smoking gun pointing at commit ca7e6b286333 ("KVM: X86: Fix > > >fpu state crash in kvm guest"), which is commit e751732486eb upstream. > > > > > >1. Can you verify reverting ca7e6b286333 (or e751732486eb in upstream) > > > solves the issue? > > > > > >2. Assuming the answer is yes, on a buggy kernel, can you run with the > > > attached patch to try get debug info? > > I did these out of order since I had 5.3.11 built with the patch, ready to > > go for weeks now, waiting for an opportunity to test. > > > > Win10 guest immediately BSOD'ed with: > > > > WARNING: CPU: 2 PID: 9296 at include/linux/thread_info.h:55 > > kernel_fpu_begin+0x6b/0xc0 > > Can you provide the full stack trace of the WARN? I'm hoping that will > provide a hint as to what's going wrong. Aha! I found at least two cases where TIF_NEED_FPU_LOAD could be set without the vCPU being preempted. The comment on fpregs_lock() states that softirq can set TIF_NEED_FPU_LOAD, which would not be handled by the preempt notifier. /* * Use fpregs_lock() while editing CPU's FPU registers or fpu->state. * A context switch will (and softirq might) save CPU's FPU registers to ^^^^^^^^^^^^^^^^^^^ * fpu->state and set TIF_NEED_FPU_LOAD leaving CPU's FPU registers in * a random state. */ static inline void fpregs_lock(void) The other scenario is from a stack trace from commit f775b13eedee ("x86,kvm: move qemu/guest FPU switching out to vcpu_run"), which clearly shows that kernel_fpu_begin() can be invoked without KVM being preempted. __warn+0xcb/0xf0 warn_slowpath_null+0x1d/0x20 kernel_fpu_disable+0x3f/0x50 __kernel_fpu_begin+0x49/0x100 kernel_fpu_begin+0xe/0x10 crc32c_pcl_intel_update+0x84/0xb0 crypto_shash_update+0x3f/0x110 crc32c+0x63/0x8a [libcrc32c] dm_bm_checksum+0x1b/0x20 [dm_persistent_data] node_prepare_for_write+0x44/0x70 [dm_persistent_data] dm_block_manager_write_callback+0x41/0x50 [dm_persistent_data] submit_io+0x170/0x1b0 [dm_bufio] __write_dirty_buffer+0x89/0x90 [dm_bufio] __make_buffer_clean+0x4f/0x80 [dm_bufio] __try_evict_buffer+0x42/0x60 [dm_bufio] dm_bufio_shrink_scan+0xc0/0x130 [dm_bufio] shrink_slab.part.40+0x1f5/0x420 shrink_node+0x22c/0x320 do_try_to_free_pages+0xf5/0x330 try_to_free_pages+0xe9/0x190 __alloc_pages_slowpath+0x40f/0xba0 __alloc_pages_nodemask+0x209/0x260 alloc_pages_vma+0x1f1/0x250 do_huge_pmd_anonymous_page+0x123/0x660 handle_mm_fault+0xfd3/0x1330 __get_user_pages+0x113/0x640 get_user_pages+0x4f/0x60 __gfn_to_pfn_memslot+0x120/0x3f0 [kvm] try_async_pf+0x66/0x230 [kvm] tdp_page_fault+0x130/0x280 [kvm] kvm_mmu_page_fault+0x60/0x120 [kvm] handle_ept_violation+0x91/0x170 [kvm_intel] vmx_handle_exit+0x1ca/0x1400 [kvm_intel] Either of the above explains why pre-e751732486eb code waited until IRQs are disabled by vcpu_enter_guest() to do switch_fpu_return(). Properly fixing soley within KVM is going to be somewhat painful. The most common case, vcpu_enter_guest(), which is being hit here, is easy to handle by restoring the switch_fpu_return() that was removed by commit e751732486eb. The other obvious case I see is emulator's access of guest fpu state, which will effectively require reverting commit 6ab0b9feb82a ("x86,kvm: remove KVM emulator get_fpu / put_fpu") along with new implementations of the hooks to handle TIF_NEED_FPU_LOAD. > > Then stashed the patch, reverted ca7e6b286333, compile, reboot. > > > > Guest is running stable now on 5.3.11. Did test my CAD under the guest, did > > not experience the crashes that had me stuck at 5.1.