Re: What happens if all replica OSDs journals are broken?

Christian Balzer <chibi@xxxxxxx> · Wed, 14 Dec 2016 10:37:45 +0900

Hello,

On Wed, 14 Dec 2016 00:06:14 +0100 Kevin Olbrich wrote:

> Ok, thanks for your explanation!
> I read those warnings about size 2 + min_size 1 (we are using ZFS as RAID6,
> called zraid2) as OSDs.
>
This is similar to my RAID6 or RAID10 backed OSDs with regards to having
very resilient, extremely unlikely to fail OSDs.
As such a Ceph replication of 2 with min_size is a calculated risk,
acceptable for me on others in certain use cases.
This is also with very few (2-3) journals per SSD.

If:

1. Your journal SSDs are well trusted and monitored (Intel DC S36xx, 37xx)
2. Your failure domain represented by a journal SSD is small enough
(meaning that replicating the lost OSDs can be done quickly)

it may be an acceptable risk for you as well.

> Time to raise replication!
>
If you can afford that (money, space, latency), definitely go for it.

Christian
> Kevin
> 
> 2016-12-13 0:00 GMT+01:00 Christian Balzer <chibi@xxxxxxx>:
> 
> > On Mon, 12 Dec 2016 22:41:41 +0100 Kevin Olbrich wrote:
> >
> > > Hi,
> > >
> > > just in case: What happens when all replica journal SSDs are broken at
> > once?
> > >
> > That would be bad, as in BAD.
> >
> > In theory you just "lost" all the associated OSDs and their data.
> >
> > In practice everything but in the in-flight data at the time is still on
> > the actual OSDs (HDDs), but it's inconsistent and inaccessible as far as
> > Ceph is concerned.
> >
> > So with some trickery and an experienced data-recovery Ceph consultant you
> > _may_ get things running with limited data loss/corruption, but that's
> > speculation and may be wishful thinking on my part.
> >
> > Another data point to deploy only well known/monitored/trusted SSDs and
> > have a 3x replication.
> >
> > > The PGs most likely will be stuck inactive but as I read, the journals
> > just
> > > need to be replaced (http://ceph.com/planet/ceph-recover-osds-after-ssd-
> > > journal-failure/).
> > >
> > > Does this also work in this case?
> > >
> > Not really, no.
> >
> > The above works by having still a valid state and operational OSDs from
> > which the "broken" one can recover.
> >
> > Christian
> > --
> > Christian Balzer        Network/Systems Engineer
> > chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> >

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com