Re: What happens if all replica OSDs journals are broken?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ok, thanks for your explanation!
I read those warnings about size 2 + min_size 1 (we are using ZFS as RAID6, called zraid2) as OSDs.
Time to raise replication!

Kevin

2016-12-13 0:00 GMT+01:00 Christian Balzer <chibi@xxxxxxx>:
On Mon, 12 Dec 2016 22:41:41 +0100 Kevin Olbrich wrote:

> Hi,
>
> just in case: What happens when all replica journal SSDs are broken at once?
>
That would be bad, as in BAD.

In theory you just "lost" all the associated OSDs and their data.

In practice everything but in the in-flight data at the time is still on
the actual OSDs (HDDs), but it's inconsistent and inaccessible as far as
Ceph is concerned.

So with some trickery and an experienced data-recovery Ceph consultant you
_may_ get things running with limited data loss/corruption, but that's
speculation and may be wishful thinking on my part.

Another data point to deploy only well known/monitored/trusted SSDs and
have a 3x replication.

> The PGs most likely will be stuck inactive but as I read, the journals just
> need to be replaced (http://ceph.com/planet/ceph-recover-osds-after-ssd-
> journal-failure/).
>
> Does this also work in this case?
>
Not really, no.

The above works by having still a valid state and operational OSDs from
which the "broken" one can recover.

Christian
--
Christian Balzer        Network/Systems Engineer
chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
http://www.gol.com/

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux