> -----Original Message----- > From: Sitsofe Wheeler > Sent: Tuesday, August 26, 2014 1:42 AM > > > [ 7.645526] hv_vmbus: registering driver hyperv_fb > > > [ 7.657553] BUG: unable to handle kernel paging request at > > > ffff880077800004 > > > [ 7.658224] IP: [<ffffffff8159a7ac>] hv_ringbuffer_write+0x7c/0x150 > > > [ 7.658224] PGD 2da9067 PUD 2dac067 PMD 7fa27067 PTE > > > 8000000077800060 > > > [ 7.658224] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC > > It seems > > hv_ringbuffer_write() -> > > hv_get_ringbuffer_availbytes(): > > reading rbi->ring_buffer->read_index causes a page fault. > > > > It looks rbi->ring_buffer was unmapped somehow according to the > > semantics of CONFIG_DEBUG_PAGEALLOC??? Or, was there a memory > > corruption somewhere? > > > > It looks the panic will disappear if the guest isn't configured with a > > "Network Adapter ". IMO it has nothing to do with the hyperv netvsc, as here hypervfb is the first one to invoke vmbus_open(), and hyperv netvsc's vmbus_open() hasn't been invoked. > This sounds very fishy as if network setup has left things in a bad > state. Ditto. I doubt the network driver causes the issue. > What is baffles me is the whole UP vs SMP thing - why would UP > make this show up consistently? Perhaps some assertions could be added > to check that rbi->ring_buffer still has sane values in it after > operations on it are finished? With more tests, I found vcpus=2 has the same issue, despite a small possibility. vcpus=4 seems fine in my limited tests. > I guess you could try switching things around and using > kmemcheck (https://www.kernel.org/doc/Documentation/kmemcheck.txt ). > If > the whole area close to rbi->ring_buffer->read_index is being stomped on > it should show up. If it's just being set to a duff value or freed that > going to be harder to track down although poisoning before freeing > should allow us to distinguish that case... Thanks for the info. Actually I found the direct cause of the panic: sometimes vmbus_post_msg() can return 4 (HV_STATUS_INVALID_ALIGNMENT), but vmbus_open() doesn't propagate this error to the caller synthvid_connect_vsp(), and vmbus_open() " goto error1" and frees the ringbuffer! So later the access to ring_buffer->read_index is caught by CONFIG_DEBUG_PAGEALLOC. I don't see any "invalid alignment" here... and I can't explain why vcpus=4 seems OK... Debugging WIP. BTW, please try the attached patch. With it, the VM doesn't panic in my side with vcpus=1 and can boot to shell prompt(looks the boot-up is very slow. I have to wait for several minutes...) > From your analysis this doesn't sound framebuffer related - perhaps we > could drop the linuxfb CC's on these mails going forward? OK. I removed linuxfb and Jean. Thanks, -- Dexuan
Attachment:
fix_vmbus_open.patch
Description: fix_vmbus_open.patch
_______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel