Hi, I'm running an x86_64 CentOS 7 kernel (3.10.0-327.28.3.7, slightly modified). We have not modified the kvm module. This machine is acting as an OpenStack compute node, using libvirt/qemu. While doing a live-migration stress test (where I just repeatedly live migrate instances between two compute nodes) I can reliably trigger the BUG below. I realize this is CentOS and not mainline, but I was wondering if anyone had seen something like this and fixed it in mainline. Based on the Oops, the value of %rdi is 0x1af08000, which is obviously not a valid kernel pointer for x86_64. I *think* it's choking while dereferencing "_spte_" in the second line of the for_each_rmap_spte() macro, which is called by __rmap_clear_dirty(). However, I don't have the context to know why this would be corrupted. Anyone have any ideas? The initial assembly in __rmap_clear_dirty() looks like this. The line at 0xffffffffa134b0ea is the one corresponding to __rmap_clear_dirty+0x4a. (This is from a different run, so the addresses are slightly different from the Oops below.) 0xffffffffa134b0a0: nopl 0x0(%rax,%rax,1) 0xffffffffa134b0a5: push %rbp 0xffffffffa134b0a6: mov %rsp,%rbp 0xffffffffa134b0a9: push %rbx 0xffffffffa134b0aa: sub $0x18,%rsp 0xffffffffa134b0ae: mov %gs:0x28,%rax 0xffffffffa134b0b7: mov %rax,-0x10(%rbp) 0xffffffffa134b0bb: xor %eax,%eax 0xffffffffa134b0bd: mov (%rsi),%rdi # rdi is *rmapp 0xffffffffa134b0c0: test %rdi,%rdi # "if (!rmap)" in rmap_get_first() 0xffffffffa134b0c3: je 0xffffffffa134b158 0xffffffffa134b0c9: test $0x1,%dil # "if (!(rmap & 1))" in rmap_get_first() 0xffffffffa134b0cd: je 0xffffffffa134b178 0xffffffffa134b0d3: and $0xfffffffffffffffe,%rdi # "iter->desc = (struct pte_list_desc *)(rmap & ~1ul);" in rmap_get_first() 0xffffffffa134b0d7: movl $0x0,-0x18(%rbp) 0xffffffffa134b0de: mov %rdi,-0x20(%rbp) 0xffffffffa134b0e2: mov (%rdi),%rdi # "return iter->desc->sptes[iter->pos];" in rmap_get_first() 0xffffffffa134b0e5: test %rdi,%rdi # bail out if _spte_ is NULL? 0xffffffffa134b0e8: je 0xffffffffa134b158 0xffffffffa134b0ea: mov (%rdi),%rdx # calculating *_spte_ in the 2nd line of for_each_rmap_spte(), I think. This chokes. 0xffffffffa134b0ed: test $0x1,%dl 0xffffffffa134b0f0: je 0xffffffffa134b14f Here's the BUG/Oops: 2016-10-06T23:57:39.264 controller-1 kernel: alert [260915.368163] BUG: unable to handle kernel paging request at 000000001af08000 2016-10-06T23:57:39.264 controller-1 kernel: alert [260915.376067] IP: [<ffffffffa12cb0ea>] __rmap_clear_dirty+0x4a/0xf0 [kvm] 2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.383590] PGD 0 2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.385947] Oops: 0000 [#1] PREEMPT SMP 2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.390461] Modules linked in: target_core_pscsi target_core_file target_core_iblock iscsi_target_mod target_core_mod dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio iptable_raw xt_CHECKSUM xt_connmark iptable_mangle nbd ebtable_filter ebtables igb_uio(OE) uio drbd(OE) libcrc32c nf_log_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_conntrack iptable_filter xt_nat xt_comment xt_multiport iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack virtio_net veth nfsv3 nfs fscache 8021q garp stp mrp llc cls_u32 sch_sfq sch_htb dm_mod iTCO_wdt iTCO_vendor_support ipmi_devintf intel_powerclamp coretemp kvm_intel kvm crc32_pclmul ghash_clmulni_intel aesni_intel glue_helper lrw gf128mul ablk_helper cryptd lpc_ich mfd_core mei_me mei i2c_i801 ipmi_si ipmi_msghandler acpi_power_meter wrs_avp(OE) nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables ex t4 jbd2 sd_mod crc_t10dif crct10dif_generic crct10dif_pclmul crct10dif_common crc32c_intel ixgbe(OE) ahci libahci libata igb i2c_algo_bit i2c_core dca i40e(OE) vxlan ip6_udp_tunnel udp_tunnel ptp pps_core 2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.503652] CPU: 0 PID: 134880 Comm: qemu-kvm Tainted: G OE ------------ 3.10.0-327.28.3.7.tis.x86_64 #1 2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.515605] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016 2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.527559] task: ffff880334019420 ti: ffff8803540cc000 task.ti: ffff8803540cc000 2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.536018] RIP: 0010:[<ffffffffa12cb0ea>] [<ffffffffa12cb0ea>] __rmap_clear_dirty+0x4a/0xf0 [kvm] 2016-10-06T23:57:39.264 controller-1 kernel: warning [260915.546264] RSP: 0018:ffff8803540cfbb8 EFLAGS: 00010206 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.552286] RAX: 0000000000000000 RBX: ffff8803572ac000 RCX: 0000000000000000 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.560357] RDX: 0000000000000ef6 RSI: ffffc9002bab1ea0 RDI: 000000001af08000 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568432] RBP: ffff8803540cfbd8 R08: 0000000000000000 R09: 0000000000000001 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568433] R10: 000000000000bbe8 R11: 0000000000000001 R12: 0000000000000000 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568433] R13: ffffffffa12cb0a0 R14: ffff8803540cffd8 R15: 0000000000000000 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568435] FS: 00007fb05e1ff700(0000) GS:ffff88103f200000(0000) knlGS:0000000000000000 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568436] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568436] CR2: 000000001af08000 CR3: 0000000359e03000 CR4: 00000000003427f0 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568437] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568438] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568439] Stack: 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568446] 0000000000000000 ffff8803540cfbd8 000000004c224769 ffff8803572ac000 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568452] ffff8803540cfc60 ffffffffa12ca9c8 000000008108f82a ffff8803540cfd60 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568458] 0000000000000100 000000000001ffff 0000000100000001 00000000000038d4 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568459] Call Trace: 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568481] [<ffffffffa12ca9c8>] slot_handle_level_range+0x98/0x110 [kvm] 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568495] [<ffffffffa12cbc04>] kvm_mmu_slot_leaf_clear_dirty+0x54/0x80 [kvm] 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568502] [<ffffffffa14901b7>] vmx_slot_enable_log_dirty+0x17/0x30 [kvm_intel] 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568515] [<ffffffffa12c7778>] kvm_arch_commit_memory_region+0x58/0xe0 [kvm] 2016-10-06T23:57:39.282 controller-1 kernel: warning [260915.568525] [<ffffffffa12af9eb>] __kvm_set_memory_region+0x60b/0x980 [kvm] 2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568536] [<ffffffffa12afd8b>] kvm_set_memory_region+0x2b/0x40 [kvm] 2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568547] [<ffffffffa12b02e4>] kvm_vm_ioctl+0x544/0x760 [kvm] 2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568553] [<ffffffff811b6295>] do_vfs_ioctl+0x2e5/0x4c0 2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568558] [<ffffffff8104b040>] ? __do_page_fault+0x1a0/0x3a0 2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568562] [<ffffffff811b6511>] SyS_ioctl+0xa1/0xc0 2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568566] [<ffffffff81678f89>] system_call_fastpath+0x16/0x1b 2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568588] Code: 3e 48 85 ff 0f 84 8f 00 00 00 40 f6 c7 01 0f 84 a5 00 00 00 48 83 e7 fe c7 45 e8 00 00 00 00 48 89 7d e0 48 8b 3f 48 85 ff 74 6e <48> 8b 17 f6 c2 01 74 5d 48 8b 05 37 ae 03 00 48 89 d1 31 db 48 2016-10-06T23:57:39.283 controller-1 kernel: alert [260915.568600] RIP [<ffffffffa12cb0ea>] __rmap_clear_dirty+0x4a/0xf0 [kvm] 2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568601] RSP <ffff8803540cfbb8> 2016-10-06T23:57:39.283 controller-1 kernel: warning [260915.568601] CR2: 000000001af08000 2016-10-06T23:57:39.284 controller-1 kernel: warning [260915.569212] ---[ end trace 7c95db025611e651 ]--- 2016-10-06T23:57:39.284 controller-1 kernel: info [260915.627598] note: qemu-kvm[134880] exited with preempt_count 1 Thanks, Chris -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html