Re: how possible is that ceph cluster crash

Samuel Just <sjust@xxxxxxxxxx> · Wed, 23 Nov 2016 07:55:45 -0800



Seems like that would be helpful.  I'm not really familiar with
ceph-disk though.
-Sam

On Wed, Nov 23, 2016 at 5:24 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> Hi Sam,
>
> Would a check in ceph-disk for "nobarrier" in the osd_mount_options_{fstype} variable be a good idea? It good either strip it out or
> fail to start the OSD unless an override flag is specified somewhere.
>
> Looking at ceph-disk code, I would imagine around here would be the right place to put the check
> https://github.com/ceph/ceph/blob/master/src/ceph-disk/ceph_disk/main.py#L2642
>
> I don't mind trying to get this done if its felt to be worthwhile.
>
> Nick
>
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Samuel Just
>> Sent: 19 November 2016 00:31
>> To: Nick Fisk <nick@xxxxxxxxxx>
>> Cc: ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  how possible is that ceph cluster crash
>>
>> Many reasons:
>>
>> 1) You will eventually get a DC wide power event anyway at which point probably most of the OSDs will have hopelessly corrupted
>> internal xfs structures (yes, I have seen this happen to a poor soul with a DC with redundant power).
>> 2) Even in the case of a single rack/node power failure, the biggest danger isn't that the OSDs don't start.  It's that they *do
> start*, but
>> forgot or arbitrarily corrupted a random subset of transactions they told other osds and clients that they committed.  The exact
> impact
>> would be random, but for sure, any guarantees Ceph normally provides would be out the window.  RBD devices could have random
>> byte ranges zapped back in time (not great if they're the offsets assigned to your database or fs journal...) for instance.
>> 3) Deliberately powercycling a node counts as a power failure if you don't stop services and sync etc first.
>>
>> In other words, don't mess with the definition of "committing a transaction" if you value your data.
>> -Sam "just say no" Just
>>
>> On Fri, Nov 18, 2016 at 4:04 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> > Yes, because these things happen
>> >
>> > http://www.theregister.co.uk/2016/11/15/memset_power_cut_service_inter
>> > ruption/
>> >
>> > We had customers who had kit in this DC.
>> >
>> > To use your analogy, it's like crossing the road at traffic lights but
>> > not checking cars have stopped. You might be OK 99%of the time, but
>> > sooner or later it will bite you in the arse and it won't be pretty.
>> >
>> > ________________________________
>> > From: "Brian ::" <bc@xxxxxxxx>
>> > Sent: 18 Nov 2016 11:52 p.m.
>> > To: sjust@xxxxxxxxxx
>> > Cc: Craig Chi; ceph-users@xxxxxxxxxxxxxx; Nick Fisk
>> > Subject: Re:  how possible is that ceph cluster crash
>> >
>> >> X-Assp-URIBLcache failed: '1e100.net'(black.uribl.com)
>> >> X-Assp-Spam-Level: *****
>> >> X-Assp-Envelope-From: bc@xxxxxxxx
>> >> X-Assp-Intended-For: nick@xxxxxxxxxx
>> >> X-Assp-ID: ASSP.fisk.me.uk (47951-11296)
>> >> X-Assp-Version: 1.9.1.4(1.0.00)
>> >>
>> >>
>> >> This is like your mother telling not to cross the road when you were
>> >> 4 years of age but not telling you it was because you could be
>> >> flattened by a car :)
>> >>
>> >> Can you expand on your answer? If you are in a DC with AB power,
>> >> redundant UPS, dual feed from the electric company, onsite
>> >> generators, dual PSU servers, is it still a bad idea?
>> >>
>> >>
>> >>
>> >>
>> >> On Fri, Nov 18, 2016 at 6:52 PM, Samuel Just <sjust@xxxxxxxxxx> wrote:
>> >>>
>> >>> Never *ever* use nobarrier with ceph under *any* circumstances.  I
>> >>> cannot stress this enough.
>> >>> -Sam
>> >>>
>> >>> On Fri, Nov 18, 2016 at 10:39 AM, Craig Chi <craigchi@xxxxxxxxxxxx>
>> >>> wrote:
>> >>>>
>> >>>> Hi Nick and other Cephers,
>> >>>>
>> >>>> Thanks for your reply.
>> >>>>
>> >>>>> 2) Config Errors
>> >>>>> This can be an easy one to say you are safe from. But I would say
>> >>>>> most outages and data loss incidents I have seen on the mailing
>> >>>>> lists have been due to poor hardware choice or configuring options
>> >>>>> such as size=2, min_size=1 or enabling stuff like nobarriers.
>> >>>>
>> >>>>
>> >>>> I am wondering the pros and cons of the nobarrier option used by Ceph.
>> >>>>
>> >>>> It is well known that nobarrier is dangerous when power outage
>> >>>> happens, but if we already have replicas in different racks or
>> >>>> PDUs, will Ceph reduce the risk of data lost with this option?
>> >>>>
>> >>>> I have seen many performance tuning articles providing nobarrier
>> >>>> option in xfs, but there are not many of then mention the trade-off
>> >>>> of nobarrier.
>> >>>>
>> >>>> Is it really unacceptable to use nobarrier in production
>> >>>> environment? I will be much grateful if you guys are willing to
>> >>>> share any experiences about nobarrier and xfs.
>> >>>>
>> >>>> Sincerely,
>> >>>> Craig Chi (Product Developer)
>> >>>> Synology Inc. Taipei, Taiwan. Ext. 361
>> >>>>
>> >>>> On 2016-11-17 05:04, Nick Fisk <nick@xxxxxxxxxx> wrote:
>> >>>>
>> >>>>> -----Original Message-----
>> >>>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
>> >>>>> Behalf Of Pedro Benites
>> >>>>> Sent: 16 November 2016 17:51
>> >>>>> To: ceph-users@xxxxxxxxxxxxxx
>> >>>>> Subject:  how possible is that ceph cluster crash
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> I have a ceph cluster with 50 TB, with 15 osds, it is working fine
>> >>>>> for one year and I would like to grow it and migrate all my old
>> >>>>
>> >>>> storage,
>> >>>>>
>> >>>>> about 100 TB to ceph, but I have a doubt. How possible is that the
>> >>>>> cluster fail and everything went very bad?
>> >>>>
>> >>>>
>> >>>> Everything is possible, I think there are 3 main risks
>> >>>>
>> >>>> 1) Hardware failure
>> >>>> I would say Ceph is probably one of the safest options in regards to
>> >>>> hardware failures, certainly if you start using 4TB+ disks.
>> >>>>
>> >>>> 2) Config Errors
>> >>>> This can be an easy one to say you are safe from. But I would say most
>> >>>> outages and data loss incidents I have seen on the mailing
>> >>>> lists have been due to poor hardware choice or configuring options such
>> >>>> as
>> >>>> size=2, min_size=1 or enabling stuff like nobarriers.
>> >>>>
>> >>>> 3) Ceph Bugs
>> >>>> Probably the rarest, but potentially the most scary as you have less
>> >>>> control. They do happen and it's something to be aware of
>> >>>>
>> >>>> How reliable is ceph?
>> >>>>>
>> >>>>> What is the risk about lose my data.? is necessary backup my data?
>> >>>>
>> >>>>
>> >>>> Yes, always backup your data, no matter solution you use. Just like RAID
>> >>>> !=
>> >>>> Backup, neither does ceph.
>> >>>>
>> >>>>>
>> >>>>> Regards.
>> >>>>> Pedro.
>> >>>>> _______________________________________________
>> >>>>> ceph-users mailing list
>> >>>>> ceph-users@xxxxxxxxxxxxxx
>> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> ceph-users mailing list
>> >>>> ceph-users@xxxxxxxxxxxxxx
>> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> Sent from Synology MailPlus
>> >>>> _______________________________________________
>> >>>> ceph-users mailing list
>> >>>> ceph-users@xxxxxxxxxxxxxx
>> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>>
>> >>> _______________________________________________
>> >>> ceph-users mailing list
>> >>> ceph-users@xxxxxxxxxxxxxx
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> >>
>> >
>> >
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com