Hello List To confirm what Christian has said. We have been playing with a 3 node 4 SSD (3610) per node cluster. Putting the journals on the OSD SSDs we were getting 770MB /s sustained with large sequential writes, and 35 MB/s and about 9200 IOPS with small random writes. Putting an NVME as journals decreased the sustained throughput marginally, probably by 40MB/s and increased consistently the small random writes by about 10 MB/s and 3100 IOPS or so. But now with my small cluster I've got a huge failure domain in each OSD server. As the number of OSDs increase I would imagine the value of backing SSDs with NVME journals diminishes. B On Tue, May 24, 2016 at 3:28 AM, Christian Balzer <chibi@xxxxxxx> wrote: > > Hello, > > On Fri, 20 May 2016 15:52:45 +0000 EP Komarla wrote: > >> Hi, >> >> I am contemplating using a NVRAM card for OSD journals in place of SSD >> drives in our ceph cluster. >> >> Configuration: >> >> * 4 Ceph servers >> >> * Each server has 24 OSDs (each OSD is a 1TB SAS drive) >> >> * 1 PCIe NVRAM card of 16GB capacity per ceph server >> >> * Both Client & cluster network is 10Gbps >> > Since you were afraid of loosing just 5 OSDs if a single journal SSD would > fail, putting all your eggs in one NVRAM basket is quite the leap. > > Your failure domains should match your cluster size and abilities and 4 > nodes is small cluster, loosing one because your NVRAM card failed will > have massive impacts during re-balancing and then you'll have a 3 cluster > node with less overall performance until you can fix things. > > And while a node can of course fail as well in it's entirety (like bad > Mainboard, CPU, RAM) these things often times can be fixed quickly > (especially if you have spares on hand) and don't need to involve a full > re-balancing if Ceph is configured accordingly > (mon_osd_down_out_subtree_limit = host). > > As for your question, this has been discussed to some extend less than two > months ago, especially concerning journal size and usage: > https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg28003.html > > That being said, it would be best to have a comparison between a normal > sized journal on a fast SSD/NVMe versus the 600MB NVRAM journals. > > I'd expect small write IOPS to be faster with the NVRAM and _maybe_ to see > some slowdown compared to SSDs when comes to large writes, like during a > backfill. > >> As per ceph documents: >> The expected throughput number should include the expected disk >> throughput (i.e., sustained data transfer rate), and network throughput. >> For example, a 7200 RPM disk will likely have approximately 100 MB/s. >> Taking the min() of the disk and network throughput should provide a >> reasonable expected throughput. Some users just start off with a 10GB >> journal size. For example: osd journal size = 10000 Given that I have a >> single 16GB card per server that has to be carved among all 24OSDs, I >> will have to configure each OSD journal to be much smaller around 600MB, >> i.e., 16GB/24 drives. This value is much smaller than 10GB/OSD journal >> that is generally used. So, I am wondering if this configuration and >> journal size is valid. Is there a performance benefit of having a >> journal that is this small? Also, do I have to reduce the default >> "filestore maxsync interval" from 5 seconds to a smaller value say 2 >> seconds to match the smaller journal size? >> > Yes, just to be on the safe side. > > Regards, > > Christian > >> Have people used NVRAM cards in the Ceph clusters as journals? What is >> their experience? >> >> Any thoughts? >> >> >> >> Legal Disclaimer: >> The information contained in this message may be privileged and >> confidential. It is intended to be read only by the individual or entity >> to whom it is addressed or by their designee. If the reader of this >> message is not the intended recipient, you are on notice that any >> distribution of this message, in any form, is strictly prohibited. If >> you have received this message in error, please immediately notify the >> sender and delete or destroy any copy of this message! > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com