On Sat, Mar 12, 2016 at 11:47 PM, Sarah Newman <srn@xxxxxxxxx> wrote: > On 03/10/2016 12:05 AM, Sarah Newman wrote: >> On 03/09/2016 08:15 PM, Sarah Newman wrote: >>> I've been running 3.18.25-18.el6.x86_64 + our build of xen 4.4.3-9 on one host for the last couple of weeks and have gotten several soft lockups >>> within the last 24 hours. I am posting here first in case anyone else has experienced the same issue. >>> >> >> Here is mpstat from around the time of the issue: >> >> 0:08:56 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle >> 10:09:10 PM all 0.00 0.00 66.67 0.00 0.00 33.33 0.00 0.00 0.00 >> 10:09:11 PM all 2.17 0.00 5.43 32.61 0.00 58.70 1.09 0.00 0.00 >> 10:09:12 PM all 0.00 0.00 1.15 0.00 0.00 85.06 0.00 0.00 13.79 >> 10:09:13 PM all 0.00 0.00 1.08 0.00 0.00 83.87 0.00 0.00 15.05 >> 10:09:14 PM all 0.00 0.00 1.10 0.00 0.00 83.52 0.00 0.00 15.38 >> 10:09:15 PM all 1.09 0.00 1.09 0.00 0.00 85.87 0.00 0.00 11.96 >> 10:09:51 PM all 0.00 0.00 1.09 0.00 0.00 84.78 1.09 0.00 13.04 >> Message from syslogd at Mar 9 22:09:51 ... >> kernel:NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:0] >> 10:10:02 PM all 0.00 0.00 33.33 50.00 0.00 16.67 0.00 0.00 0.00 >> 10:10:03 PM all 3.16 0.00 10.53 8.42 0.00 2.11 1.05 0.00 74.74 >> 10:10:04 PM all 0.00 0.00 3.23 38.71 0.00 1.08 1.08 0.00 55.91 >> 10:10:05 PM all 0.00 0.00 4.30 11.83 0.00 3.23 1.08 0.00 79.57 >> >> Typical load: >> >> 10:22:15 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle >> 10:22:16 PM all 0.00 0.00 1.02 0.00 0.00 1.02 0.00 0.00 97.96 >> 10:22:17 PM all 0.00 0.00 0.00 0.00 0.00 0.00 1.04 0.00 98.96 >> 10:22:18 PM all 0.00 0.00 0.00 0.00 0.00 1.01 1.01 0.00 97.98 >> 10:22:19 PM all 0.00 0.00 1.01 0.00 0.00 1.01 0.00 0.00 97.98 >> 10:22:20 PM all 0.00 0.00 0.00 0.00 0.00 0.00 1.02 0.00 98.98 >> 10:22:21 PM all 0.00 0.00 1.02 0.00 0.00 1.02 0.00 0.00 97.96 >> 10:22:22 PM all 0.00 0.00 0.00 0.00 0.00 1.01 1.01 0.00 97.98 >> >> >> I reverted to an older kernel since the older kernel had run for a couple of months without issues. > > > This did not fix it. I isolated the issue to a vif rate limit of 100Mb/s being applied to one of the guests and am now able to reproduce on a > different machine. > > I will look into whether this has been fixed already; if so I will submit a pull request for the Xen4CentOS kernel and if not I will take it up with > the xen-devel list. Yes, I was going to suggest posting this to xen-users -- it's not unlikely someone has already run across this. -George _______________________________________________ CentOS-virt mailing list CentOS-virt@xxxxxxxxxx https://lists.centos.org/mailman/listinfo/centos-virt