Hi, On Tue, Jan 23, 2018 at 10:35:24AM -0800, Nathan March wrote: > > Thanks for the heads-up. It's been running through XenServer's tests > > as well as the XenProject's "osstest" -- I haven't heard of any > > additional issues, but I'll ask. > > Looks like I can reproduce this pretty easily, this happened upon ssh'ing > into the server while I had a VM migrating into it. The system goes > completely unresponsive (can't even enter a keystroke via console): > > [64722.291300] vlan208: port 4(vif5.0) entered forwarding state > [64722.291695] NOHZ: local_softirq_pending 08 > [64929.006981] BUG: unable to handle kernel paging request at > 0000000000002260 > [64929.007020] IP: [<ffffffff81533a24>] n_tty_receive_buf_common+0xa4/0x1f0 > [64929.007049] PGD 1f7a53067 [64929.007057] PUD 1ee0d4067 > PMD 0 [64929.007069] > [64929.007077] Oops: 0000 [#1] SMP > [64929.007088] Modules linked in: ebt_ip6 ebt_ip ebtable_filter ebtables > arptable_filter arp_tables bridge xen_pciback xen_gntalloc nfsd auth_rpcgss > nfsv3 nfs_acl nfs fscache lockd sunrpc grace 8021q mrp garp stp llc bonding > xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn > xenfs xen_privcmd dcdbas fjes pcspkr ipmi_devintf ipmi_si ipmi_msghandler > joydev i2c_i801 i2c_smbus lpc_ich shpchp mei_me mei ioatdma ixgbe mdio igb > dca ptp pps_core uas usb_storage wmi ttm > [64929.007327] CPU: 15 PID: 17696 Comm: kworker/u48:0 Not tainted > 4.9.75-30.el6.x86_64 #1 > [64929.007343] Hardware name: Dell Inc. PowerEdge C6220/03C9JJ, BIOS 2.7.1 > 03/04/2015 > [64929.007362] Workqueue: events_unbound flush_to_ldisc > [64929.007376] task: ffff8801fbc70580 task.stack: ffffc90048af8000 > [64929.007415] RIP: e030:[<ffffffff81533a24>] [<ffffffff81533a24>] > n_tty_receive_buf_common+0xa4/0x1f0 > [64929.007465] RSP: e02b:ffffc90048afbb08 EFLAGS: 00010296 > [64929.007476] RAX: 0000000000002260 RBX: 0000000000000000 RCX: > 0000000000000002 > [64929.007519] RDX: 0000000000000000 RSI: ffff8801dc0f3c20 RDI: > ffff8801f9b8acd8 > [64929.007563] RBP: ffffc90048afbb78 R08: 0000000000000001 R09: > ffffffff8210f1c0 > [64929.007577] R10: 0000000000007ff0 R11: 0000000000000000 R12: > 0000000000000002 > [64929.007620] R13: ffff8801f9b8ac00 R14: 0000000000000000 R15: > ffff8801dc0f3c20 > [64929.007675] FS: 00007fcfc0af8700(0000) GS:ffff880204dc0000(0000) > knlGS:0000000000000000 > [64929.007718] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > [64929.007759] CR2: 0000000000002260 CR3: 00000001f067b000 CR4: > 0000000000042660 > [64929.007782] Stack: > [64929.007806] ffffc90048afbb38 0000000000000000 ffff8801f9b8acd8 > 0000000104dda030 > [64929.007858] 0000000000002260 00000000fbc72700 ffff880204dc48c0 > 0000000000000000 > [64929.007941] ffff880204dce890 ffff8801dc0f3c00 ffff8801f7f25c00 > ffffc90048afbbf8 > [64929.007994] Call Trace: > [64929.008008] [<ffffffff81533b84>] n_tty_receive_buf2+0x14/0x20 > [64929.008048] [<ffffffff81536763>] tty_ldisc_receive_buf+0x23/0x50 > Hmm.. isn't this the ldisc bug that was discussed a few months ago on this list, and a patch was applied to virt-sig kernel aswell? Call trace looks similar.. -- Pasi > [64929.008088] [<ffffffff81536b88>] flush_to_ldisc+0xc8/0x100 > [64929.008133] [<ffffffff8102eb7b>] ? __switch_to+0x20b/0x690 > [64929.008176] [<ffffffff81025375>] ? xen_clocksource_read+0x15/0x20 > [64929.008222] [<ffffffff810c0030>] process_one_work+0x170/0x500 > [64929.008268] [<ffffffff818dac28>] ? __schedule+0x238/0x530 > [64929.008310] [<ffffffff818db00a>] ? schedule+0x3a/0xa0 > [64929.008324] [<ffffffff810c1ca6>] worker_thread+0x166/0x530 > [64929.008368] [<ffffffff810e6a69>] ? put_prev_entity+0x29/0x140 > [64929.008412] [<ffffffff818dac28>] ? __schedule+0x238/0x530 > [64929.008458] [<ffffffff810d4082>] ? default_wake_function+0x12/0x20 > [64929.008502] [<ffffffff810c1b40>] ? maybe_create_worker+0x120/0x120 > [64929.008518] [<ffffffff818db00a>] ? schedule+0x3a/0xa0 > [64929.008555] [<ffffffff818dedf6>] ? _raw_spin_unlock_irqrestore+0x16/0x20 > [64929.008599] [<ffffffff810c1b40>] ? maybe_create_worker+0x120/0x120 > [64929.008616] [<ffffffff810c6ae5>] kthread+0xe5/0x100 > [64929.008630] [<ffffffff810d1a16>] ? schedule_tail+0x56/0xc0 > [64929.008643] [<ffffffff810c6a00>] ? __kthread_init_worker+0x40/0x40 > [64929.008659] [<ffffffff810d1a16>] ? schedule_tail+0x56/0xc0 > [64929.008673] [<ffffffff818df5a1>] ret_from_fork+0x41/0x50 > [64929.008685] Code: 89 fe 4c 89 ef 89 45 98 e8 aa fb ff ff 8b 45 98 48 63 > d0 48 85 db 48 8d 0c 13 48 0f 45 d9 01 45 bc 49 01 d7 41 29 c4 48 8b 45 b0 > <48> 8b 30 48 89 75 c0 49 8b 0e 8d 96 00 10 00 00 29 ca 41 f6 85 > [64929.008894] RIP [<ffffffff81533a24>] n_tty_receive_buf_common+0xa4/0x1f0 > [64929.008914] RSP <ffffc90048afbb08> > [64929.008923] CR2: 0000000000002260 > [64929.009641] ---[ end trace e1da1cdf77fed144 ]--- > [64929.009785] BUG: unable to handle kernel paging request at > ffffffffffffffd8 > [64929.009804] IP: [<ffffffff810c62c0>] kthread_data+0x10/0x20 > [64929.009823] PGD 200d067 [64929.009831] PUD 200f067 > PMD 0 [64929.009842] > [64929.009850] Oops: 0000 [#2] SMP > [64929.009864] Modules linked in: ebt_ip6 ebt_ip ebtable_filter ebtables > arptable_filter arp_tables bridge xen_pciback xen_gntalloc nfsd auth_rpcgss > nfsv3 nfs_acl nfs fscache lockd sunrpc grace 8021q mrp garp stp llc bonding > xen_acpi_processor blktap xen_netback xen_blkback xen_gntdev xen_evtchn > xenfs xen_privcmd dcdbas fjes pcspkr ipmi_devintf ipmi_si ipmi_msghandler > joydev i2c_i801 i2c_smbus lpc_ich shpchp mei_me mei ioatdma ixgbe mdio igb > dca ptp pps_core uas usb_storage wmi ttm > [64929.010054] CPU: 15 PID: 17696 Comm: kworker/u48:0 Tainted: G D > 4.9.75-30.el6.x86_64 #1 > [64929.010068] Hardware name: Dell Inc. PowerEdge C6220/03C9JJ, BIOS 2.7.1 > 03/04/2015 > [64929.010127] task: ffff8801fbc70580 task.stack: ffffc90048af8000 > [64929.010138] RIP: e030:[<ffffffff810c62c0>] [<ffffffff810c62c0>] > kthread_data+0x10/0x20 > [64929.010153] RSP: e02b:ffffc90048afbdd8 EFLAGS: 00010086 > [64929.010162] RAX: 0000000000000000 RBX: ffff880204dd9fc0 RCX: > 000000000000000f > [64929.010174] RDX: ffff880200409400 RSI: ffff8801fbc70580 RDI: > ffff8801fbc70580 > [64929.010185] RBP: ffffc90048afbdd8 R08: ffff880204dc0000 R09: > 00000006401f55c3 > [64929.010197] R10: dead000000000200 R11: dead000000000200 R12: > 0000000000019fc0 > [64929.010208] R13: ffff8801fbc70580 R14: 0000000000000000 R15: > ffff8801fbc70f40 > [64929.010229] FS: 00007fcfc0af8700(0000) GS:ffff880204dc0000(0000) > knlGS:0000000000000000 > [64929.010241] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > [64929.010251] CR2: 0000000000000028 CR3: 00000001f067b000 CR4: > 0000000000042660 > [64929.010270] Stack: > [64929.010275] ffffc90048afbe08 ffffffff810bda72 ffffc90048afbdf8 > ffff880204dd9fc0 > [64929.010295] 0000000000019fc0 ffff8801fbc70580 ffffc90048afbe78 > ffffffff818dae04 > [64929.010314] 0000000000000001 ffff8801ef0a2400 ffffc90048afbe48 > ffff8801f0e67708 > [64929.010333] Call Trace: > [64929.010340] [<ffffffff810bda72>] wq_worker_sleeping+0x12/0xa0 > [64929.010352] [<ffffffff818dae04>] __schedule+0x414/0x530 > [64929.010362] [<ffffffff810d63ec>] do_task_dead+0x3c/0x40 > [64929.010373] [<ffffffff810aaade>] do_exit+0x24e/0x480 > [64929.010383] [<ffffffff810c6ae5>] ? kthread+0xe5/0x100 > [64929.010393] [<ffffffff810d1a16>] ? schedule_tail+0x56/0xc0 > [64929.010403] [<ffffffff810c6a00>] ? __kthread_init_worker+0x40/0x40 > [64929.010415] [<ffffffff818e0db7>] rewind_stack_do_exit+0x17/0x20 > [64929.010425] Code: 48 09 00 00 48 8b 40 c8 c9 48 c1 e8 02 83 e0 01 c3 66 > 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 66 66 66 90 48 8b 87 48 09 00 00 > <48> 8b 40 d8 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 66 > [64929.010607] RIP [<ffffffff810c62c0>] kthread_data+0x10/0x20 > [64929.010620] RSP <ffffc90048afbdd8> > [64929.010626] CR2: ffffffffffffffd8 > [64929.010638] ---[ end trace e1da1cdf77fed145 ]--- > [64929.010647] Fixing recursive fault but reboot is needed! > > This is a centos 6 system booting with: > > kernel /boot/xen.gz dom0_mem=6144M,max:6144M cpuinfo com1=115200,8n1 > console=com1,tty loglvl=all guest_loglvl=all msi=off com2=115200,8n1 > console=com2,tty1 > module /boot/vmlinuz-4.9.75-30.el6.x86_64 ro > root=UUID=ffab1fdf-28a4-4239-b112-5e920e3d6c36 rd_NO_LUKS rd_NO_LVM > LANG=en_US.UTF-8 rd_NO_MD SYSFONT=latarcyrheb-sun16 crashkernel=auto > KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM biosdevname=0 nomodeset max_loop=128 > max_loop=512 xencons=ttyS1 console=hvc0,tty0 > module /boot/initramfs-4.9.75-30.el6.x86_64.img > > Running xen-4.6.6-9.el6. Note that I have msi=off set in the xen parameters > due to hitting the same network card bug that Kevin Stange was hitting. > > Happy to grab any further info as required, or let me know if this is better > suited on xen-devel. > > Cheers, > Nathan > _______________________________________________ CentOS-virt mailing list CentOS-virt@xxxxxxxxxx https://lists.centos.org/mailman/listinfo/centos-virt