Ok, thanks for your explanation!
I read those warnings about size 2 + min_size 1 (we are using ZFS as RAID6, called zraid2) as OSDs.
Time to raise replication!
Kevin
2016-12-13 0:00 GMT+01:00 Christian Balzer <chibi@xxxxxxx>:
On Mon, 12 Dec 2016 22:41:41 +0100 Kevin Olbrich wrote:
> Hi,
>
> just in case: What happens when all replica journal SSDs are broken at once?
>
That would be bad, as in BAD.
In theory you just "lost" all the associated OSDs and their data.
In practice everything but in the in-flight data at the time is still on
the actual OSDs (HDDs), but it's inconsistent and inaccessible as far as
Ceph is concerned.
So with some trickery and an experienced data-recovery Ceph consultant you
_may_ get things running with limited data loss/corruption, but that's
speculation and may be wishful thinking on my part.
Another data point to deploy only well known/monitored/trusted SSDs and
have a 3x replication.
> The PGs most likely will be stuck inactive but as I read, the journals just
> need to be replaced (http://ceph.com/planet/ceph-recover-osds-after-ssd-
> journal-failure/).
>
> Does this also work in this case?
>
Not really, no.
The above works by having still a valid state and operational OSDs from
which the "broken" one can recover.
Christian
--
Christian Balzer Network/Systems Engineer
chibi@xxxxxxx Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com