Re: Issue #5876 : assertion failure in rbd_img_obj_callback()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Le mardi 25 mars 2014 à 22:54 +0100, Olivier Bonvalet a écrit :
> Le mardi 25 mars 2014 à 23:49 +0200, Ilya Dryomov a écrit :
> > On Tue, Mar 25, 2014 at 11:41 PM, Olivier Bonvalet <ceph.list@xxxxxxxxx> wrote:
> > > mmm the cluster seems to be in a really bad state now : all hosts are
> > > hanging. Is it possible that mounting images without the rbd_assert(0)
> > > broke some images ?
> > >
> > 
> > I don't think so.  As far as I can tell all occurrences that you
> > reported tripped over one of the asserts.  It's probably just that for
> > some reason you are now hitting this bug much more frequently than once
> > a week.
> > 
> > Thanks,
> > 
> >                 Ilya
> > --
> 
> Ok thanks, I'm «reassured».
> 
> At reboot VM are much more I/O loaded, because of cache flush. It's
> probably the reason why it now hang often.
> 
> I have to wait a little between starting each VM.
> 
> --

I now have this one very often (here 5 minutes after the host boot) :

Mar 25 23:14:45 rurkh kernel: [  330.054196] rbd_img_obj_callback: bad image object request information:
Mar 25 23:14:45 rurkh kernel: [  330.054205] obj_request ffff88025f3df058
Mar 25 23:14:45 rurkh kernel: [  330.054209]     ->object_name <(null)>
Mar 25 23:14:45 rurkh kernel: [  330.054211]     ->offset 0
Mar 25 23:14:45 rurkh kernel: [  330.054213]     ->length 4096
Mar 25 23:14:45 rurkh kernel: [  330.054216]     ->type 0x1
Mar 25 23:14:45 rurkh kernel: [  330.054218]     ->flags 0x3
Mar 25 23:14:45 rurkh kernel: [  330.054220]     ->which 4294967295
Mar 25 23:14:45 rurkh kernel: [  330.054222]     ->xferred 4096
Mar 25 23:14:45 rurkh kernel: [  330.054224]     ->result 0
Mar 25 23:14:45 rurkh kernel: [  330.054227] img_request ffff8802731f8448
Mar 25 23:14:45 rurkh kernel: [  330.054229]     ->snap 0xfffffffffffffffe
Mar 25 23:14:45 rurkh kernel: [  330.054231]     ->offset 2508181504
Mar 25 23:14:45 rurkh kernel: [  330.054233]     ->length 16384
Mar 25 23:14:45 rurkh kernel: [  330.054235]     ->flags 0x0
Mar 25 23:14:45 rurkh kernel: [  330.054237]     ->obj_request_count 0
Mar 25 23:14:45 rurkh kernel: [  330.054239]     ->next_completion 2
Mar 25 23:14:45 rurkh kernel: [  330.054241]     ->xferred 16384
Mar 25 23:14:45 rurkh kernel: [  330.054243]     ->result 0
Mar 25 23:14:45 rurkh kernel: [  330.054247] 
Mar 25 23:14:45 rurkh kernel: [  330.054247] Assertion failure in rbd_img_obj_callback() at line 2159:
Mar 25 23:14:45 rurkh kernel: [  330.054247] 
Mar 25 23:14:45 rurkh kernel: [  330.054247] 	rbd_assert(0);
Mar 25 23:14:45 rurkh kernel: [  330.054247] 
Mar 25 23:14:45 rurkh kernel: [  330.054495] ------------[ cut here ]------------
Mar 25 23:14:45 rurkh kernel: [  330.054585] kernel BUG at drivers/block/rbd.c:2159!
Mar 25 23:14:45 rurkh kernel: [  330.054676] invalid opcode: 0000 [#1] SMP 
Mar 25 23:14:45 rurkh kernel: [  330.054874] Modules linked in: cbc rbd libceph xen_gntdev xt_physdev iptable_filter ip_tables x_tables xfs libcrc32c bridge loop iTCO_wdt gpio_ich iTCO_vendor_support serio_raw sb_edac edac_core evdev i2c_i801 lpc_ich mfd_core ioatdma shpchp wmi ipmi_si ipmi_msghandler ac button dm_mod hid_generic usbhid hid sg sd_mod crc_t10dif crct10dif_common megaraid_sas isci ahci libsas libahci libata scsi_transport_sas ehci_pci ehci_hcd scsi_mod usbcore igb usb_common i2c_algo_bit ixgbe i2c_core dca ptp pps_core mdio
Mar 25 23:14:45 rurkh kernel: [  330.058433] CPU: 2 PID: 6365 Comm: kworker/2:3 Not tainted 3.13-dae-dom0 #22
Mar 25 23:14:45 rurkh kernel: [  330.058528] Hardware name: Supermicro X9DRW-7TPF+/X9DRW-7TPF+, BIOS 3.0 07/24/2013
Mar 25 23:14:45 rurkh kernel: [  330.058659] Workqueue: ceph-msgr con_work [libceph]
Mar 25 23:14:45 rurkh kernel: [  330.058805] task: ffff88026da5b820 ti: ffff88025dfe2000 task.ti: ffff88025dfe2000
Mar 25 23:14:45 rurkh kernel: [  330.058922] RIP: e030:[<ffffffffa0309cd9>]  [<ffffffffa0309cd9>] rbd_img_obj_callback+0x282/0x523 [rbd]
Mar 25 23:14:45 rurkh kernel: [  330.059107] RSP: e02b:ffff88025dfe3ce8  EFLAGS: 00010082
Mar 25 23:14:45 rurkh kernel: [  330.059199] RAX: 000000000000004c RBX: ffff88025f3df058 RCX: 0000000000000007
Mar 25 23:14:45 rurkh kernel: [  330.059300] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88025dfe00a8
Mar 25 23:14:45 rurkh kernel: [  330.059397] RBP: ffff8802731f8448 R08: 0000000000000000 R09: 0000000000000000
Mar 25 23:14:45 rurkh kernel: [  330.059491] R10: 0000000000000000 R11: ffff88025f712d66 R12: 0000000000000001
Mar 25 23:14:45 rurkh kernel: [  330.059587] R13: 0000000000000000 R14: ffff88025f712ad0 R15: 0000000000000000
Mar 25 23:14:45 rurkh kernel: [  330.059689] FS:  00007f2fd8882700(0000) GS:ffff88027fe40000(0000) knlGS:0000000000000000
Mar 25 23:14:45 rurkh kernel: [  330.059807] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 25 23:14:45 rurkh kernel: [  330.059899] CR2: 00007f7a1e28f000 CR3: 000000000160c000 CR4: 0000000000042660
Mar 25 23:14:45 rurkh kernel: [  330.059997] Stack:
Mar 25 23:14:45 rurkh kernel: [  330.060086]  ffff8802731f8484 ffff8802730f2c45 ffffffffffffffff ffff8802730f2c10
Mar 25 23:14:45 rurkh kernel: [  330.060339]  ffff88025f712ac8 ffff8802703b4718 0000000000000000 ffff88025f712ad0
Mar 25 23:14:45 rurkh kernel: [  330.060573]  0000000000000000 ffffffffa02f5595 0000000000000015 ffff8802703b4770
Mar 25 23:14:45 rurkh kernel: [  330.060811] Call Trace:
Mar 25 23:14:45 rurkh kernel: [  330.060878]  [<ffffffffa02f5595>] ? dispatch+0x3e4/0x55e [libceph]
Mar 25 23:14:45 rurkh kernel: [  330.060954]  [<ffffffffa02f00fc>] ? con_work+0xf6e/0x1a65 [libceph]
Mar 25 23:14:45 rurkh kernel: [  330.061029]  [<ffffffff81051f83>] ? mmdrop+0xd/0x1c
Mar 25 23:14:45 rurkh kernel: [  330.061098]  [<ffffffff8105265e>] ? finish_task_switch+0x4d/0x83
Mar 25 23:14:45 rurkh kernel: [  330.061171]  [<ffffffff810484d7>] ? process_one_work+0x15a/0x214
Mar 25 23:14:45 rurkh kernel: [  330.061243]  [<ffffffff8104895b>] ? worker_thread+0x139/0x1de
Mar 25 23:14:45 rurkh kernel: [  330.061313]  [<ffffffff81048822>] ? rescuer_thread+0x26e/0x26e
Mar 25 23:14:45 rurkh kernel: [  330.061385]  [<ffffffff8104cff6>] ? kthread+0x9e/0xa6
Mar 25 23:14:45 rurkh kernel: [  330.061454]  [<ffffffff8104cf58>] ? __kthread_parkme+0x55/0x55
Mar 25 23:14:45 rurkh kernel: [  330.061530]  [<ffffffff8137260c>] ? ret_from_fork+0x7c/0xb0
Mar 25 23:14:45 rurkh kernel: [  330.061606]  [<ffffffff8104cf58>] ? __kthread_parkme+0x55/0x55
Mar 25 23:14:45 rurkh kernel: [  330.061677] Code: cc 30 a0 31 c0 e8 8b e4 05 e1 48 c7 c1 5c cd 30 a0 31 c0 ba 6f 08 00 00 48 c7 c6 80 da 30 a0 48 c7 c7 1f c1 30 a0 e8 6a e4 05 e1 <0f> 0b 41 8b 45 5c ff c8 39 43 40 41 0f 92 c5 48 8b 5b 30 41 ff 
Mar 25 23:14:45 rurkh kernel: [  330.064345] RIP  [<ffffffffa0309cd9>] rbd_img_obj_callback+0x282/0x523 [rbd]
Mar 25 23:14:45 rurkh kernel: [  330.064481]  RSP <ffff88025dfe3ce8>
Mar 25 23:14:45 rurkh kernel: [  330.064562] ---[ end trace 74103a003e0d553e ]---


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux