[Issue]Ceph cluster hang due to network partition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sage,
      We recently meet a network partition problem, cause our ceph
cluster can not service rbd service.
       We are running 0.67.5 on our customer cluster. And the network
partition is 3 osd can connect mon, but can not connect with all other
osds.
       Then many PGs fall in peering status, and rbd I/O is hang.

        Before we operate the cluster , I set nout flag and stop the 3
OSDs. After operating the 3 OSDs memory and OS bootup, the network is
partition. The 3 OSDs start, then many PGs went to peering.
I stoped the 3 OSDs process, but the PGs fall in peering.
        After network partition is fixed, all PG get active+clean, all is OK.

        I can't explain this, because I think the OSD can judge if the
other OSD is alive, and I can see 3OSD is marked down in 'ceph osd
tree'.
        Why did these PGs fall in peering?

Thanks!
Ketor
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux