On 28/08/2012 01:59, James Harper wrote:
On 27/08/2012 21:18, Joseph Glanville wrote:
On 28 August 2012 06:15, Jonathan Tripathy <jonnyt@xxxxxxxxxxx> wrote:
On 27/08/2012 21:07, Jonathan Tripathy wrote:
On 27/08/2012 21:00, Jonathan Tripathy wrote:
2) The windows setup didn't complain that it couldn't install on
the LV, but once I clicked 'next', the Dom0 crashed and the server
rebooted. A lot of output was displayed on screen but quickly
vanished as the system rebooted. I'm trying to see if the output
was saved anywhere. Any ideas why this could of happened and/or
where the output might be saved?
I'd also like to add that after the server came back up, the md
raid array started rebuilding. I wondering if that's just a
coincidence (due to the forced reboot), or a sign of something
wrong with the md integration with bcache?
I'm going to see if Windows installs natively on the md array (it's
RAID
10 btw) and post back here.
Ok, so trying to install Windows directly onto the spindles causes
the same thing to happen. I'm going to try and boot up into the
non-bcache kernel (The default ubuntu one) and see if it works
there. If it fails there, then this is clearly a xen and/or mdraid issue...
Thanks
Ok, so booting into the default Ubuntu kernel, the windows
installation seems to progress just fine.
Does this mean there is something wrong with the mdraid code in the
bcache kernel?
Actually, I'm not telling the whole story. The kernel I'm using is
the
bcache-3.2 tree (from evilpriate.org) with changes merged in from
kernel.org's 3.2.27 tree. There were no merge conflicts when I did
the git merge.
What do you think I should do?
Thanks
--
To unsubscribe from this list: send the line "unsubscribe
linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
I would recommend booting with the raw bcache-3.2 branch before
applying the stable patches (even though they should be fine) and
trying to catch the panic.
This is easiest done with a serial port and setting it to the kernel
console on the kernel command line in grub.
Joseph.
Hi There,
I can confirm that the problem occurs even when using the raw bcache-3.2
branch from evilpirate.org. Just to clarify, I am trying to install Windows
Server 2008 in a Xen HVM DomU, onto an LV which is on top of a MDRAID 10
array. Using the bcache-3.2 kernel, the system reboots (after
panicing) as soon as I click 'next' after selecting the drive to install windows
onto. Using the standard Ubuntu kernel everything works as normal. This
leads me to believe that there is an issue with the mdraid code inside the
bcache-3.2 tree. I'd like to stress that I wasn't doing any bcaching during this
test.
FWIW, i'm using the 3.2 patches applied to a Debian kernel with lvm on raid1 (not raid10) on bcache and it's all working fine since I changed to a 512 byte block size. I haven't done an install of 2008, just 2003, but there doesn't seem to be any problems.
What should my next step be? Try and find a serial cable to capture the
debug output?
Before tinkering with a serial cable, see if the system is alive enough to use netconsole - it can be a bit of a timesaver.
James
Hi Everyone,
Here is the trace as capture by netconsole:
[ 130.844069] ------------[ cut here ]------------
[ 130.844165] kernel BUG at fs/bio.c:420!
[ 130.844232] invalid opcode: 0000 [#1] SMP
[ 130.844404] CPU 4
[ 130.844448] Modules linked in: xen_netback xen_blkback 8021q garp
xt_physdev bridge stp ebtable_filter ebtables ip6table_filter ip6_tables
iptable_filter ip_tables x_tables xen_gntdev xen_evtchn xenfs
nls_iso8859_1 nls_cp437 vfat fat netconsole configfs psmouse lp joydev
parport video mac_hid serio_raw usb_storage uas raid456 async_pq
async_xor xor async_memcpy async_raid6_recov usbhid hid raid10 e1000e
raid6_pq async_tx raid1 raid0 multipath linear
[ 130.846688]
[ 130.846746] Pid: 0, comm: swapper/4 Not tainted 3.2.0+ #1 Supermicro
X9SCI/X9SCA/X9SCI/X9SCA
[ 130.846956] RIP: e030:[<ffffffff811a6ef7>] [<ffffffff811a6ef7>]
bio_put+0x27/0x30
[ 130.847089] RSP: e02b:ffff88005ff3cb80 EFLAGS: 00010246
[ 130.847155] RAX: 0000000000000000 RBX: 00000000fffffffb RCX:
00000000000003a6
[ 130.847224] RDX: 00000000000003a5 RSI: 0000000000016c00 RDI:
ffff880039b58918
[ 130.847293] RBP: ffff88005ff3cb80 R08: ffffffff81115e67 R09:
0000000000000100
[ 130.847362] R10: ffff88001a16eea0 R11: 0000000000000000 R12:
ffff880017fd4018
[ 130.847431] R13: ffff880039b58918 R14: ffff880017220028 R15:
ffff88001a6dd400
[ 130.847504] FS: 00007f507e004700(0000) GS:ffff88005ff39000(0000)
knlGS:0000000000000000
[ 130.847590] CS: e033 DS: 002b ES: 002b CR0: 000000008005003b
[ 130.847656] CR2: 00007f507d6910b0 CR3: 000000003989a000 CR4:
0000000000002660
[ 130.847725] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 130.847794] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400
[ 130.847863] Process swapper/4 (pid: 0, threadinfo ffff88003d646000,
task ffff88003d648000)
[ 130.847952] Stack:
[ 130.848012] ffff88005ff3cbc0 ffffffff8150238e ffff88003d6e40b0
ffff880039b58900
[ 130.848258] ffff880039b58920 ffff88001a278100 0000000000000018
0000000000000001
[ 130.848505] ffff88005ff3cbd0 ffffffff811a5f5d ffff88005ff3cbf0
ffffffff811a6f3e
[ 130.848749] Call Trace:
[ 130.848810] <IRQ>
[ 130.848914] [<ffffffff8150238e>] clone_endio+0x8e/0xd0
[ 130.848979] [<ffffffff811a5f5d>] bio_endio+0x1d/0x40
[ 130.849047] [<ffffffff811a6f3e>] bio_pair_release+0x3e/0x50
[ 130.849113] [<ffffffff811a6f6f>] bio_pair_end+0x1f/0x30
[ 130.849180] [<ffffffff811a5f5d>] bio_endio+0x1d/0x40
[ 130.849248] [<ffffffffa00dd3d2>] raid_end_bio_io+0xf2/0x100 [raid10]
[ 130.849319] [<ffffffffa00ddf38>] one_write_done+0x38/0x50 [raid10]
[ 130.849390] [<ffffffffa00deec4>] raid10_end_write_request+0xc4/0x130
[raid10]
[ 130.849476] [<ffffffff811a5f5d>] bio_endio+0x1d/0x40
[ 130.849543] [<ffffffff812e7a03>] req_bio_endio.isra.49+0xa3/0xe0
[ 130.849614] [<ffffffff812e839d>] blk_update_request+0xfd/0x480
[ 130.849681] [<ffffffff812e8751>] blk_update_bidi_request+0x31/0x90
[ 130.849751] [<ffffffff812e9a4c>] blk_end_bidi_request+0x2c/0x80
[ 130.849819] [<ffffffff812e9ae0>] blk_end_request+0x10/0x20
[ 130.849888] [<ffffffff8141d89f>] scsi_io_completion+0xaf/0x630
[ 130.849960] [<ffffffff8165c88e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[ 130.850031] [<ffffffff81413d31>] scsi_finish_command+0xc1/0x120
[ 130.850098] [<ffffffff8141d6fe>] scsi_softirq_done+0x13e/0x150
[ 130.850167] [<ffffffff812ef9f3>] blk_done_softirq+0x83/0xa0
[ 130.850237] [<ffffffff8106d4d8>] __do_softirq+0xa8/0x210
[ 130.850304] [<ffffffff8139ddf7>] ? __xen_evtchn_do_upcall+0x207/0x250
[ 130.850373] [<ffffffff816669ac>] call_softirq+0x1c/0x30
[ 130.850442] [<ffffffff81015195>] do_softirq+0x65/0xa0
[ 130.850507] [<ffffffff8106d8be>] irq_exit+0x8e/0xb0
[ 130.850574] [<ffffffff8139fbb5>] xen_evtchn_do_upcall+0x35/0x50
[ 130.850643] [<ffffffff816669fe>] xen_do_hypervisor_callback+0x1e/0x30
[ 130.850711] <EOI>
[ 130.850810] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 130.850881] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 130.850950] [<ffffffff8100a170>] ? xen_safe_halt+0x10/0x20
[ 130.851017] [<ffffffff8101b5a3>] ? default_idle+0x53/0x1d0
[ 130.851085] [<ffffffff81012236>] ? cpu_idle+0xd6/0x120
[ 130.851153] [<ffffffff8100a9a9>] ? xen_irq_enable_direct_reloc+0x4/0x4
[ 130.851223] [<ffffffff81643992>] ? cpu_bringup_and_idle+0xe/0x10
[ 130.851291] Code: 00 00 00 00 55 48 89 e5 66 66 66 66 90 8b 47 40 85
c0 74 17 f0 ff 4f 40 0f 94 c0 84 c0 75 05 5d c3 0f 1f 00 e8 2b ff ff ff
5d c3 <0f> 0b 0f 1f 80 00 00 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66
[ 130.854057] RIP [<ffffffff811a6ef7>] bio_put+0x27/0x30
[ 130.854163] RSP <ffff88005ff3cb80>
[ 130.854226] ---[ end trace fa3fbcc21926358a ]---
[ 130.855127] Kernel panic - not syncing: Fatal exception in interrupt
[ 130.855198] Pid: 0, comm: swapper/4 Tainted: G D 3.2.0+ #1
[ 130.855267] Call Trace:
[ 130.855327] <IRQ> [<ffffffff816516ef>] panic+0x91/0x1a7
[ 130.855438] [<ffffffff8165d85a>] oops_end+0xea/0xf0
[ 130.855505] [<ffffffff81016708>] die+0x58/0x90
[ 130.855571] [<ffffffff8165d194>] do_trap+0xc4/0x170
[ 130.855636] [<ffffffff81013e25>] do_invalid_op+0x95/0xb0
[ 130.855702] [<ffffffff811a6ef7>] ? bio_put+0x27/0x30
[ 130.855767] [<ffffffff8100a23d>] ? xen_force_evtchn_callback+0xd/0x10
[ 130.855838] [<ffffffff8100aa02>] ? check_events+0x12/0x20
[ 130.855906] [<ffffffff8166672b>] invalid_op+0x1b/0x20
[ 130.855974] [<ffffffff81115e67>] ? mempool_free_slab+0x17/0x20
[ 130.856041] [<ffffffff811a6ef7>] ? bio_put+0x27/0x30
[ 130.856109] [<ffffffff8150238e>] clone_endio+0x8e/0xd0
[ 130.856175] [<ffffffff811a5f5d>] bio_endio+0x1d/0x40
[ 130.856241] [<ffffffff811a6f3e>] bio_pair_release+0x3e/0x50
[ 130.856308] [<ffffffff811a6f6f>] bio_pair_end+0x1f/0x30
[ 130.856374] [<ffffffff811a5f5d>] bio_endio+0x1d/0x40
[ 130.856443] [<ffffffffa00dd3d2>] raid_end_bio_io+0xf2/0x100 [raid10]
[ 130.856513] [<ffffffffa00ddf38>] one_write_done+0x38/0x50 [raid10]
[ 130.856584] [<ffffffffa00deec4>] raid10_end_write_request+0xc4/0x130
[raid10]
[ 130.856671] [<ffffffff811a5f5d>] bio_endio+0x1d/0x40
[ 130.856738] [<ffffffff812e7a03>] req_bio_endio.isra.49+0xa3/0xe0
[ 130.856807] [<ffffffff812e839d>] blk_update_request+0xfd/0x480
[ 130.856877] [<ffffffff812e8751>] blk_update_bidi_request+0x31/0x90
[ 130.856946] [<ffffffff812e9a4c>] blk_end_bidi_request+0x2c/0x80
[ 130.857014] [<ffffffff812e9ae0>] blk_end_request+0x10/0x20
[ 130.857081] [<ffffffff8141d89f>] scsi_io_completion+0xaf/0x630
[ 130.857150] [<ffffffff8165c88e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[ 130.857219] [<ffffffff81413d31>] scsi_finish_command+0xc1/0x120
[ 130.857287] [<ffffffff8141d6fe>] scsi_softirq_done+0x13e/0x150
[ 130.857356] [<ffffffff812ef9f3>] blk_done_softirq+0x83/0xa0
[ 130.857423] [<ffffffff8106d4d8>] __do_softirq+0xa8/0x210
[ 130.857491] [<ffffffff8139ddf7>] ? __xen_evtchn_do_upcall+0x207/0x250
[ 130.857562] [<ffffffff816669ac>] call_softirq+0x1c/0x30
[ 130.857631] [<ffffffff81015195>] do_softirq+0x65/0xa0
[ 130.857697] [<ffffffff8106d8be>] irq_exit+0x8e/0xb0
[ 130.857761] [<ffffffff8139fbb5>] xen_evtchn_do_upcall+0x35/0x50
[ 130.857829] [<ffffffff816669fe>] xen_do_hypervisor_callback+0x1e/0x30
[ 130.857897] <EOI> [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 130.858010] [<ffffffff810013aa>] ? hypercall_page+0x3aa/0x1000
[ 130.858080] [<ffffffff8100a170>] ? xen_safe_halt+0x10/0x20
[ 130.858146] [<ffffffff8101b5a3>] ? default_idle+0x53/0x1d0
[ 130.858214] [<ffffffff81012236>] ? cpu_idle+0xd6/0x120
[ 130.858282] [<ffffffff8100a9a9>] ? xen_irq_enable_direct_reloc+0x4/0x4
[ 130.858351] [<ffffffff81643992>] ? cpu_bringup_and_idle+0xe/0x10
I hope this helps. Let me know if you need me to do any further testing,
or if you have any other questions regarding my environment.
Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html