Re: Ceph, SSD, and NVMe

Mark Nelson <mnelson@xxxxxxxxxx> · Wed, 30 Sep 2015 14:03:57 -0500

On 09/30/2015 09:34 AM, J David wrote:
Because we have a good thing going, our Ceph clusters are still
running Firefly on all of our clusters including our largest, all-SSD
cluster.

If I understand right, newer versions of Ceph make much better use of
SSDs and give overall much higher performance on the same equipment.
However, the impression I get of newer versions is that they are also
not as stable as Firefly and should only be used with caution.

Given our storage consumers have an effectively unlimited appetite for
IOPs and throughput, more performance would be very welcome.  But not
if it leads to cluster crashes and lost data.

What really prompts this is that we are starting to see large-scale
NVMe equipment appearing in the channel ( e.g.
http://www.supermicro.com/products/system/1U/1028/SYS-1028U-TN10RT_.cfm
).  The cost is significantly higher with commensurately higher
theoretical perfomance.  But if we're already not pushing our SSD's to
the max over SAS, the added benefit of NVMe would largely be lost.

On the other hand, if we could safely upgrade to a more recent version
that is as stable and bulletproof as Firefly has been for us, but has
better performance with SSDs, that would not only benefit our current
setup, it would be a necessary first step for moving onto NVMe.

So this raises three questions:

1) Have I correctly understood that one or more post-FireFly releases
exist that (c.p.) perform significantly better with all-SSD setups?

2) Is there any such release that (generally) is as rock-solid as
FireFly.  Of course this is somewhat situationally dependent, so I
would settle for: is there any such release that doesn't have any
known minding-my-own-business-suddenly-lost-data bugs in a 100% RBD
use case?

3) Has anyone done anything with NVMe as storage (not just journals)
who would care to share what kind of performance they experienced?

(Of course if we do upgrade we will do so carefully, do a test cluster
first, have backups standing by, etc.  But if it's already known that
doing so will either not improve anything or is likely to blow up in
our faces, it would be better to leave well enough alone.  The current
performance is by no means bad, we're just always greedy for more. :)
)

Thanks for any advice/suggestions!

Hi David,

The single biggest performance improvement we've seen for SSDs has 
resulted from the memory allocator investigation that Chaitanya Hulgol 
and Somnath Roy spearheaded at Sandisk and others including myself have 
followed up and tried to expand on since then.

See:

http://www.spinics.net/lists/ceph-devel/msg25823.html
https://www.mail-archive.com/ceph-devel@xxxxxxxxxxxxxxx/msg23100.html
http://www.spinics.net/lists/ceph-devel/msg21582.html

I haven't tested firefly, but there's a good chance that you may see a 
significant performance improvement simply by upgrading your systems to 
tcmalloc 2.4 and loading the OSDs with 128MB of thread cache or 
LD_PRELOAD jemalloc.  This isn't something we officially support in RHCS 
yet, but we'll likely be moving toward it for future releases based on 
the very positive results we are seeing.  The biggest thing to keep in 
mind is that this does increase per-OSD memory usage by several hundred 
MB, so 3-4X IOPS increase does come with a cost.  On the plus side, it 
also reduces CPU usage, sometimes dramatically.  You may be able to 
offset the increased memory usage somewhat by disabling transparent huge 
pages (especially with jemalloc).

See:

http://www.spinics.net/lists/ceph-devel/msg26483.html

FWIW, between Sage's newstore work, and recent work by Somnath Roy to 
optimize the write path, we may see further improvement, but neither of 
those are ready for production yet.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com