Since you didn¹t hear much from the successful crowd, I¹ll chime in. At my previous employer, we ran some pretty large clusters (over 1PB) successfully on Hammer. Some were upgraded from Firefly, and by no means do I consider myself to be a developer. We totaled over 15 production clusters. I¹m not saying there weren¹t some rocky times, but they were generally not directly due to Ceph code, but things ancillary to it, like kernel bugs, customers driving traffic, hardware selection/failures, or minor config issues. We never lost a cluster, though we did lose access to them on occasion. It does require you to stay up to date on what¹s going on with the community, but I don¹t think that it¹s too different from OpenStack in that regard. If support is a concern, there¹s always the Red Hat option, or purchase a Ceph appliance like the Sandisk Infiniflash, which comes with solid support from folks like Somnath. FWIW, Hammer¹s write performance isn¹t awful. My coworker borrowed some compute nodes, and ran a pretty large scale test with 400 SSDs across 50 nodes, and the results were pretty encouraging. Warren On 10/1/15, 10:01 PM, "J David" <j.david.lists@xxxxxxxxx> wrote: >This is all very helpful feedback, thanks so much. > >Also it sounds like you guys have done a lot of work on this, so >thanks for that as well! > >Is Hammer generally considered stable enough for production in an >RBD-only environment? The perception around here is that the number >of people who report lost data or inoperable clusters due to bugs in >Hammer on this list is troubling enough to cause hesitation. There's >a specific term for overweighting the probability of catastrophic >negative outcomes, and maybe that's what's happening. People tend not >to post to the list "Hey we have a cluster, it's running great!" >instead waiting until things are not great, so the list paints an >artificially depressing picture of stability. But when we ask around >quietly to other places we know running Ceph in production, which is >admittedly a very small sample, they're all also still running >Firefly. > >Admittedly, it doesn't help that "On my recommendation, we performed a >non-reversible upgrade on the production cluster which, despite our >best testing efforts, wrecked things causing us to lose 4 hours of >data and requiring 2 days of downtime while we rebuilt the cluster and >restored the backups" is pretty much guaranteed to be followed by, >"You're fired." > >So, do medium-sized IT organizations (i.e. those without the resources >to have a Ceph developer on staff) run Hammer-based deployments in >production successfully? > >Please understand this is not meant to be sarcastic or critical of the >project in any way. Ceph is amazing, and we love it. Some features >of Ceph, like CephFS, have been considered not-production-quality for >a long time, and that is to be expected. These things are incredibly >complex and take time to get right. So organizations in our position >just don't use that stuff. As a relative outsider for whom the Ceph >source code is effectively a foreign language, it's just *really* hard >to tell if Hammer in general is in that same "still baking" category. > >Thanks! > > >On Wed, Sep 30, 2015 at 3:33 PM, Somnath Roy <Somnath.Roy@xxxxxxxxxxx> >wrote: >> David, >> You should move to Hammer to get all the benefits of performance. It's >>all added to Giant and migrated to the present hammer LTS release. >> FYI, focus was so far with read performance improvement and what we saw >>in our environment with 6Gb SAS SSDs so far that we are able to saturate >>drives BW wise with 64K onwards. But, with smaller block like 4K we are >>not able to saturate the SAS SSD drives yet. >> But, considering Ceph's scale out nature you can get some very good >>numbers out of a cluster. For example, with 8 SAS SSD drives (in a JBOF) >>and having 2 heads in front (So, a 2 node Ceph cluster) we are able to >>hit ~300K Random read iops while 8 SSD aggregated performance would be >>~400K. Not too bad. At this point we are saturating host cpus. >> We have seen almost linear scaling if you add similar setups i.e adding >>say ~3 of the above setup, you could hit ~900K RR iops. So, I would say >>it is definitely there in terms read iops and more improvement are >>coming. >> But, write path is very awful compare to read and that's where the >>problem is. Because, in the mainstream, no workload is 100% RR (IMO). >>So, even if you have say 90-10 read/write the performance numbers would >>be ~6/7 X slower. >> So, it is very much dependent on your workload/application access >>pattern and obviously the cost you are willing to spend. >> >> Thanks & Regards >> Somnath >> >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf >>Of Mark Nelson >> Sent: Wednesday, September 30, 2015 12:04 PM >> To: ceph-users@xxxxxxxxxxxxxx >> Subject: Re: Ceph, SSD, and NVMe >> >> On 09/30/2015 09:34 AM, J David wrote: >>> Because we have a good thing going, our Ceph clusters are still >>> running Firefly on all of our clusters including our largest, all-SSD >>> cluster. >>> >>> If I understand right, newer versions of Ceph make much better use of >>> SSDs and give overall much higher performance on the same equipment. >>> However, the impression I get of newer versions is that they are also >>> not as stable as Firefly and should only be used with caution. >>> >>> Given our storage consumers have an effectively unlimited appetite for >>> IOPs and throughput, more performance would be very welcome. But not >>> if it leads to cluster crashes and lost data. >>> >>> What really prompts this is that we are starting to see large-scale >>> NVMe equipment appearing in the channel ( e.g. >>> http://www.supermicro.com/products/system/1U/1028/SYS-1028U-TN10RT_.cf >>> m ). The cost is significantly higher with commensurately higher >>> theoretical perfomance. But if we're already not pushing our SSD's to >>> the max over SAS, the added benefit of NVMe would largely be lost. >>> >>> On the other hand, if we could safely upgrade to a more recent version >>> that is as stable and bulletproof as Firefly has been for us, but has >>> better performance with SSDs, that would not only benefit our current >>> setup, it would be a necessary first step for moving onto NVMe. >>> >>> So this raises three questions: >>> >>> 1) Have I correctly understood that one or more post-FireFly releases >>> exist that (c.p.) perform significantly better with all-SSD setups? >>> >>> 2) Is there any such release that (generally) is as rock-solid as >>> FireFly. Of course this is somewhat situationally dependent, so I >>> would settle for: is there any such release that doesn't have any >>> known minding-my-own-business-suddenly-lost-data bugs in a 100% RBD >>> use case? >>> >>> 3) Has anyone done anything with NVMe as storage (not just journals) >>> who would care to share what kind of performance they experienced? >>> >>> (Of course if we do upgrade we will do so carefully, do a test cluster >>> first, have backups standing by, etc. But if it's already known that >>> doing so will either not improve anything or is likely to blow up in >>> our faces, it would be better to leave well enough alone. The current >>> performance is by no means bad, we're just always greedy for more. :) >>> ) >>> >>> Thanks for any advice/suggestions! >> >> Hi David, >> >> The single biggest performance improvement we've seen for SSDs has >>resulted from the memory allocator investigation that Chaitanya Hulgol >>and Somnath Roy spearheaded at Sandisk and others including myself have >>followed up and tried to expand on since then. >> >> See: >> >> http://www.spinics.net/lists/ceph-devel/msg25823.html >> https://www.mail-archive.com/ceph-devel@xxxxxxxxxxxxxxx/msg23100.html >> http://www.spinics.net/lists/ceph-devel/msg21582.html >> >> I haven't tested firefly, but there's a good chance that you may see a >>significant performance improvement simply by upgrading your systems to >>tcmalloc 2.4 and loading the OSDs with 128MB of thread cache or >>LD_PRELOAD jemalloc. This isn't something we officially support in RHCS >>yet, but we'll likely be moving toward it for future releases based on >>the very positive results we are seeing. The biggest thing to keep in >>mind is that this does increase per-OSD memory usage by several hundred >>MB, so 3-4X IOPS increase does come with a cost. On the plus side, it >>also reduces CPU usage, sometimes dramatically. You may be able to >>offset the increased memory usage somewhat by disabling transparent huge >>pages (especially with jemalloc). >> >> See: >> >> http://www.spinics.net/lists/ceph-devel/msg26483.html >> >> FWIW, between Sage's newstore work, and recent work by Somnath Roy to >>optimize the write path, we may see further improvement, but neither of >>those are ready for production yet. >> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> ________________________________ >> >> PLEASE NOTE: The information contained in this electronic mail message >>is intended only for the use of the designated recipient(s) named above. >>If the reader of this message is not the intended recipient, you are >>hereby notified that you have received this message in error and that >>any review, dissemination, distribution, or copying of this message is >>strictly prohibited. If you have received this communication in error, >>please notify the sender by telephone or e-mail (as shown above) >>immediately and destroy any and all copies of this message in your >>possession (whether hard copies or electronically stored copies). >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >_______________________________________________ >ceph-users mailing list >ceph-users@xxxxxxxxxxxxxx >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com This email and any files transmitted with it are confidential and intended solely for the individual or entity to whom they are addressed. If you have received this email in error destroy it immediately. *** Walmart Confidential *** _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com