Re: Replacing a failed disk/OSD: unfound object

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 13 Jul 2011 09:19:40 -0700, Tommi Virtanen wrote:
On Wed, Jul 13, 2011 at 03:15, Meng Zhao <mzhao@xxxxxxxxxxxx> wrote:
active+clean; 349 MB data, 1394 MB used, 408 MB / 2046 MB avail; 49/224
degraded (21.875%)
=>for some reason osd2 failed during object replication

If you lose osds while in degraded mode, you very much can lose
objects permanently. Degraded means the replication has not completed. It's like losing a second disk in a RAID5 before it has healed, though
the scope of the loss is individual objects not the whole filesystem.


Thanks for going through the long log.

Please note that the system was considered clean (by ceph -w) before osd0 shutdown. 2011-07-13 15:01:17.355846 pg v1099: 602 pgs: 602 active+clean; 349 MB data, 1778 MB used, 920 MB / 3069 MB avail
The degrading happens after 5min time out when osd0 is considered out.
2011-07-13 16:18:03.746935 pg v1104: 602 pgs: 233 active+clean, 369 active+clean+degraded; 349 MB data, 1795 MB used, 910 MB / 3069 MB avail; 67/224 degraded (29.911%)
and halfway into the replication, osd2 is considered out.
But the osd2 log show that, it was in a bad stated half an hour ago, but the ceph system did not escalate that info and take any action until replication action start and eventually crash. In other words, its too late when cumulatively more than one ods fail. It seems that a self-diagnose mechanism is needed for osd to self check periodically.






--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux