Thanks Nick / Samuel, It's definitely worthwhile to explain exactly why this is such a bad idea. I think it will prevent people from ever doing it - rather than just telling people not to do it. On Sat, Nov 19, 2016 at 12:30 AM, Samuel Just <sjust@xxxxxxxxxx> wrote: > Many reasons: > > 1) You will eventually get a DC wide power event anyway at which point > probably most of the OSDs will have hopelessly corrupted internal xfs > structures (yes, I have seen this happen to a poor soul with a DC with > redundant power). > 2) Even in the case of a single rack/node power failure, the biggest > danger isn't that the OSDs don't start. It's that they *do start*, > but forgot or arbitrarily corrupted a random subset of transactions > they told other osds and clients that they committed. The exact > impact would be random, but for sure, any guarantees Ceph normally > provides would be out the window. RBD devices could have random byte > ranges zapped back in time (not great if they're the offsets assigned > to your database or fs journal...) for instance. > 3) Deliberately powercycling a node counts as a power failure if you > don't stop services and sync etc first. > > In other words, don't mess with the definition of "committing a > transaction" if you value your data. > -Sam "just say no" Just > > On Fri, Nov 18, 2016 at 4:04 PM, Nick Fisk <nick@xxxxxxxxxx> wrote: >> Yes, because these things happen >> >> http://www.theregister.co.uk/2016/11/15/memset_power_cut_service_interruption/ >> >> We had customers who had kit in this DC. >> >> To use your analogy, it's like crossing the road at traffic lights but not >> checking cars have stopped. You might be OK 99%of the time, but sooner or >> later it will bite you in the arse and it won't be pretty. >> >> ________________________________ >> From: "Brian ::" <bc@xxxxxxxx> >> Sent: 18 Nov 2016 11:52 p.m. >> To: sjust@xxxxxxxxxx >> Cc: Craig Chi; ceph-users@xxxxxxxxxxxxxx; Nick Fisk >> Subject: Re: how possible is that ceph cluster crash >> >>> X-Assp-URIBLcache failed: '1e100.net'(black.uribl.com) >>> X-Assp-Spam-Level: ***** >>> X-Assp-Envelope-From: bc@xxxxxxxx >>> X-Assp-Intended-For: nick@xxxxxxxxxx >>> X-Assp-ID: ASSP.fisk.me.uk (47951-11296) >>> X-Assp-Version: 1.9.1.4(1.0.00) >>> >>> >>> This is like your mother telling not to cross the road when you were 4 >>> years of age but not telling you it was because you could be flattened >>> by a car :) >>> >>> Can you expand on your answer? If you are in a DC with AB power, >>> redundant UPS, dual feed from the electric company, onsite generators, >>> dual PSU servers, is it still a bad idea? >>> >>> >>> >>> >>> On Fri, Nov 18, 2016 at 6:52 PM, Samuel Just <sjust@xxxxxxxxxx> wrote: >>>> >>>> Never *ever* use nobarrier with ceph under *any* circumstances. I >>>> cannot stress this enough. >>>> -Sam >>>> >>>> On Fri, Nov 18, 2016 at 10:39 AM, Craig Chi <craigchi@xxxxxxxxxxxx> >>>> wrote: >>>>> >>>>> Hi Nick and other Cephers, >>>>> >>>>> Thanks for your reply. >>>>> >>>>>> 2) Config Errors >>>>>> This can be an easy one to say you are safe from. But I would say most >>>>>> outages and data loss incidents I have seen on the mailing >>>>>> lists have been due to poor hardware choice or configuring options such >>>>>> as >>>>>> size=2, min_size=1 or enabling stuff like nobarriers. >>>>> >>>>> >>>>> I am wondering the pros and cons of the nobarrier option used by Ceph. >>>>> >>>>> It is well known that nobarrier is dangerous when power outage happens, >>>>> but >>>>> if we already have replicas in different racks or PDUs, will Ceph reduce >>>>> the >>>>> risk of data lost with this option? >>>>> >>>>> I have seen many performance tuning articles providing nobarrier option >>>>> in >>>>> xfs, but there are not many of then mention the trade-off of nobarrier. >>>>> >>>>> Is it really unacceptable to use nobarrier in production environment? I >>>>> will >>>>> be much grateful if you guys are willing to share any experiences about >>>>> nobarrier and xfs. >>>>> >>>>> Sincerely, >>>>> Craig Chi (Product Developer) >>>>> Synology Inc. Taipei, Taiwan. Ext. 361 >>>>> >>>>> On 2016-11-17 05:04, Nick Fisk <nick@xxxxxxxxxx> wrote: >>>>> >>>>>> -----Original Message----- >>>>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf >>>>>> Of >>>>>> Pedro Benites >>>>>> Sent: 16 November 2016 17:51 >>>>>> To: ceph-users@xxxxxxxxxxxxxx >>>>>> Subject: how possible is that ceph cluster crash >>>>>> >>>>>> Hi, >>>>>> >>>>>> I have a ceph cluster with 50 TB, with 15 osds, it is working fine for >>>>>> one >>>>>> year and I would like to grow it and migrate all my old >>>>> >>>>> storage, >>>>>> >>>>>> about 100 TB to ceph, but I have a doubt. How possible is that the >>>>>> cluster >>>>>> fail and everything went very bad? >>>>> >>>>> >>>>> Everything is possible, I think there are 3 main risks >>>>> >>>>> 1) Hardware failure >>>>> I would say Ceph is probably one of the safest options in regards to >>>>> hardware failures, certainly if you start using 4TB+ disks. >>>>> >>>>> 2) Config Errors >>>>> This can be an easy one to say you are safe from. But I would say most >>>>> outages and data loss incidents I have seen on the mailing >>>>> lists have been due to poor hardware choice or configuring options such >>>>> as >>>>> size=2, min_size=1 or enabling stuff like nobarriers. >>>>> >>>>> 3) Ceph Bugs >>>>> Probably the rarest, but potentially the most scary as you have less >>>>> control. They do happen and it's something to be aware of >>>>> >>>>> How reliable is ceph? >>>>>> >>>>>> What is the risk about lose my data.? is necessary backup my data? >>>>> >>>>> >>>>> Yes, always backup your data, no matter solution you use. Just like RAID >>>>> != >>>>> Backup, neither does ceph. >>>>> >>>>>> >>>>>> Regards. >>>>>> Pedro. >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>>> >>>>> >>>>> >>>>> Sent from Synology MailPlus >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >> >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com