Re: Collection of strange lockups on 0.51

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 12, 2012 at 10:33 AM, Andrey Korolyov <andrey@xxxxxxx> wrote:
> Hi,
> This is completely off-list, but I`m asking because only ceph trigger
> such a bug :) .
>
> With 0.51, following happens: if I kill an osd, one or more neighbor
> nodes may go to hanged state with cpu lockups, not related to
> temperature or overall interrupt count or la and it happens randomly
> over 16-node cluster. Almost sure that ceph triggerizing some hardware
> bug, but I don`t quite sure of which origin. Also after a short time
> after reset from such crash a new lockup may be created by any action.

>From the log, it looks like your ethernet driver is crapping out.

[172517.057886] NETDEV WATCHDOG: eth0 (igb): transmit queue 7 timed out
...
[172517.058622]  [<ffffffff812b2975>] ? netif_tx_lock+0x40/0x76

etc.

The later oopses are talking about paravirt_write_msr etc, which makes
me thing you're using Xen? You probably don't want to run Ceph servers
inside virtualization (for production).

[172696.503900]  [<ffffffff8100d025>] ? paravirt_write_msr+0xb/0xe
[172696.503942]  [<ffffffff810325f3>] ? leave_mm+0x3e/0x3e

and *then* you get

[172695.041709] sd 0:2:0:0: [sda] megasas: RESET cmd=2a retries=0
[172695.041745] megasas: [ 0]waiting for 35 commands to complete
[172696.045602] megaraid_sas: no pending cmds after reset
[172696.045644] megasas: reset successful

which just adds more awesomeness to the soup -- though I do wonder if
this could be caused by the soft hang from earlier.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux