> Op 7 december 2016 om 21:18 schreef Kevin Olbrich <ko@xxxxxxx>: > > > Is Ceph accepting this OSD if the other (newer) replica is down? > In this case I would assume that my cluster is instantly broken when rack > _after_ rack fails (power outage) and I just start in random order. > We have at least one MON on stand-alone UPS to resolv such an issue - I > just assumed this is safe regardless of full outage. > No, those PGs will stay down waiting for the OSD with the newest data to come back online. You could force to accept the old data, but that is a manual operation. Without doing anything the PG will be marked as down+incomplete Wido > > Mit freundlichen Grüßen / best regards, > Kevin Olbrich. > > 2016-12-07 21:10 GMT+01:00 Wido den Hollander <wido@xxxxxxxx>: > > > > > > Op 7 december 2016 om 21:04 schreef "Will.Boege" <Will.Boege@xxxxxxxxxx > > >: > > > > > > > > > Hi Wido, > > > > > > Just curious how blocking IO to the final replica provides protection > > from data loss? I’ve never really understood why this is a Ceph best > > practice. In my head all 3 replicas would be on devices that have roughly > > the same odds of physically failing or getting logically corrupted in any > > given minute. Not sure how blocking IO prevents this. > > > > > > > Say, disk #1 fails and you have #2 and #3 left. Now #2 fails leaving only > > #3 left. > > > > By block you know that #2 and #3 still have the same data. Although #2 > > failed it could be that it is the host which went down but the disk itself > > is just fine. Maybe the SATA cable broke, you never know. > > > > If disk #3 now fails you can still continue your operation if you bring #2 > > back. It has the same data on disk as #3 had before it failed. Since you > > didn't allow for any I/O on #3 when #2 went down earlier. > > > > If you would have accepted writes on #3 while #1 and #2 were gone you have > > invalid/old data on #2 by the time it comes back. > > > > Writes were made on #3 but that one really broke down. You managed to get > > #2 back, but it doesn't have the changes which #3 had. > > > > The result is corrupted data. > > > > Does this make sense? > > > > Wido > > > > > On 12/7/16, 9:11 AM, "ceph-users on behalf of LOIC DEVULDER" < > > ceph-users-bounces@xxxxxxxxxxxxxx on behalf of loic.devulder@xxxxxxxx> > > wrote: > > > > > > > -----Message d'origine----- > > > > De : Wido den Hollander [mailto:wido@xxxxxxxx] > > > > Envoyé : mercredi 7 décembre 2016 16:01 > > > > À : ceph-users@xxxxxxxx; LOIC DEVULDER - U329683 < > > loic.devulder@xxxxxxxx> > > > > Objet : RE: 2x replication: A BIG warning > > > > > > > > > > > > > Op 7 december 2016 om 15:54 schreef LOIC DEVULDER > > > > <loic.devulder@xxxxxxxx>: > > > > > > > > > > > > > > > Hi Wido, > > > > > > > > > > > As a Ceph consultant I get numerous calls throughout the year > > to > > > > > > help people with getting their broken Ceph clusters back > > online. > > > > > > > > > > > > The causes of downtime vary vastly, but one of the biggest > > causes is > > > > > > that people use replication 2x. size = 2, min_size = 1. > > > > > > > > > > We are building a Ceph cluster for our OpenStack and for data > > integrity > > > > reasons we have chosen to set size=3. But we want to continue to > > access > > > > data if 2 of our 3 osd server are dead, so we decided to set > > min_size=1. > > > > > > > > > > Is it a (very) bad idea? > > > > > > > > > > > > > I would say so. Yes, downtime is annoying on your cloud, but data > > loss if > > > > even worse, much more worse. > > > > > > > > I would always run with min_size = 2 and manually switch to > > min_size = 1 > > > > if the situation really requires it at that moment. > > > > > > > > Loosing two disks at the same time is something which doesn't > > happen that > > > > much, but if it happens you don't want to modify any data on the > > only copy > > > > which you still have left. > > > > > > > > Setting min_size to 1 should be a manual action imho when size = 3 > > and you > > > > loose two copies. In that case YOU decide at that moment if it is > > the > > > > right course of action. > > > > > > > > Wido > > > > > > Thanks for your quick response! > > > > > > That's make sense, I will try to convince my colleagues :-) > > > > > > Loic > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@xxxxxxxxxxxxxx > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com