Re: NVRAM cards as OSD journals

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

On Fri, 20 May 2016 15:52:45 +0000 EP Komarla wrote:

> Hi,
> 
> I am contemplating using a NVRAM card for OSD journals in place of SSD
> drives in our ceph cluster.
> 
> Configuration:
> 
> *         4 Ceph servers
> 
> *         Each server has 24 OSDs (each OSD is a 1TB SAS drive)
> 
> *         1 PCIe NVRAM card of 16GB capacity per ceph server
> 
> *         Both Client & cluster network is 10Gbps
>
Since you were afraid of loosing just 5 OSDs if a single journal SSD would
fail, putting all your eggs in one NVRAM basket is quite the leap.

Your failure domains should match your cluster size and abilities and 4
nodes is small cluster, loosing one because your NVRAM card failed will
have massive impacts during re-balancing and then you'll have a 3 cluster
node with less overall performance until you can fix things.

And while a node can of course fail as well in it's entirety (like bad
Mainboard, CPU, RAM) these things often times can be fixed quickly
(especially if you have spares on hand) and don't need to involve a full
re-balancing if Ceph is configured accordingly
(mon_osd_down_out_subtree_limit = host). 

As for your question, this has been discussed to some extend less than two
months ago, especially concerning journal size and usage:
https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg28003.html

That being said, it would be best to have a comparison between a normal
sized journal on a fast SSD/NVMe versus the 600MB NVRAM journals.

I'd expect small write IOPS to be faster with the NVRAM and _maybe_ to see
some slowdown compared to SSDs when comes to large writes, like during a
backfill.

> As per ceph documents:
> The expected throughput number should include the expected disk
> throughput (i.e., sustained data transfer rate), and network throughput.
> For example, a 7200 RPM disk will likely have approximately 100 MB/s.
> Taking the min() of the disk and network throughput should provide a
> reasonable expected throughput. Some users just start off with a 10GB
> journal size. For example: osd journal size = 10000 Given that I have a
> single 16GB card per server that has to be carved among all 24OSDs, I
> will have to configure each OSD journal to be much smaller around 600MB,
> i.e., 16GB/24 drives.  This value is much smaller than 10GB/OSD journal
> that is generally used.  So, I am wondering if this configuration and
> journal size is valid.  Is there a performance benefit of having a
> journal that is this small?  Also, do I have to reduce the default
> "filestore maxsync interval" from 5 seconds to a smaller value say 2
> seconds to match the smaller journal size?
> 
Yes, just to be on the safe side.

Regards,

Christian

> Have people used NVRAM cards in the Ceph clusters as journals?  What is
> their experience?
> 
> Any thoughts?
> 
> 
> 
> Legal Disclaimer:
> The information contained in this message may be privileged and
> confidential. It is intended to be read only by the individual or entity
> to whom it is addressed or by their designee. If the reader of this
> message is not the intended recipient, you are on notice that any
> distribution of this message, in any form, is strictly prohibited. If
> you have received this message in error, please immediately notify the
> sender and delete or destroy any copy of this message!


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux