Re: Ceph Journal Disk Size

Shane Gibson <Shane_Gibson@xxxxxxxxxxxx> · Thu, 2 Jul 2015 09:16:25 -0700

I'd def be happy to share what numbers I can get out of it.  I'm still a neophyte w/ Ceph, and learning how to operate it, set it up ... etc...  

My limited performance testing to date has been with "stock" XFS ceph-disk built filesystem for the OSDs, basic PG/CRUSH map stuff - and using "dd" across RBD mounted volumes ...  I'm learning how to scale it up, and start tweaking and tuning. 

If anyone on the list is interested in specific tests and can provide specific detailed instructions on configuration, test patterns, etc ... I"m happy to run them if I can ...  We're baking in automation around the Ceph deoployment from fresh build using the Open Crowbar deployment tooling, with a Ceph work load on it.  RIght now, modifying the Ceph work load to work across multple L3 rack boundaries in the cluster.  

Physical servers are Dell R720xd platforms, with 12 spinning (4TB 7200 rpm) data disks, and 2x 10k 600 GB mirrired OS disks.  Memory is 128 GB, and dual 6-core HT CPUs.

~~shane 

On 7/1/15, 5:24 PM, "German Anders" <ganders@xxxxxxxxxxxx> wrote:

I'm interested in such a configuration, can you share some perfomance test/numbers?

Thanks in advance,

Best regards,

German

2015-07-01 21:16 GMT-03:00 Shane Gibson <Shane_Gibson@xxxxxxxxxxxx>:

It also depends a lot on the size of your cluster ... I have a test cluster I'm standing up right now with 60 nodes - a total of 600 OSDs each at 4 TB ... If I lose 4 TB - that's a very small fraction of the data.  My replicas are going to be spread out across a lot of spindles, and replicating that missing 4 TB isn't much of an issue, across 3 racks each with 80 gbit/sec ToR uplinks to Spine.  Each node has 20 gbit/sec to ToR in a bond.  

On the other hand ... if you only have 4 .. or 8 ... or 10 servers ... and a smaller number of OSDs - you have fewer spindles replicating that loss, and it might be more of an issue.  

It just depends on the size/scale of  your environment.  

We're going to 8 TB drives - and that will ultimately be spread over a 100 or more physical servers w/ 10 OSD disks per server.   This will be across 7 to 10 racks (same network topology) ... so an 8 TB drive loss isn't too big of an issue.   Now that assumes that replication actually works well in that size cluster.  We're still cessing out this part of the PoC engagement. 

~~shane

On 7/1/15, 5:05 PM, "ceph-users on behalf of German Anders" <ceph-users-bounces@xxxxxxxxxxxxxx on behalf of ganders@xxxxxxxxxxxx> wrote:

ask the other guys on the list, but for me to lose 4TB of data is to much, the cluster will still running fine, but in some point you need to recover that disk, and also if you lose one server with all the 4TB disk in that case yeah it will hurt the cluster, also take into account that with that kind of disk you will get no more than 100-110 iops per disk

German Anders
Storage System Engineer Leader
Despegar | IT Team
office +54 11 4894 3500 x3408
mobile +54 911 3493 7262
mail ganders@xxxxxxxxxxxx

2015-07-01 20:54 GMT-03:00 Nate Curry <curry@xxxxxxxxxxxxx>:
4TB is too much to lose?  Why would it matter if you lost one 4TB with the redundancy?  Won't it auto recover from the disk failure?
Nate CurryOn Jul 1, 2015 6:12 PM, "German Anders" <ganders@xxxxxxxxxxxx> wrote:
I would probably go with less size osd disks, 4TB is to much to loss in case of a broken disk, so maybe more osd daemons with less size, maybe 1TB or 2TB size. 4:1 relationship is good enough, also i think that 200G disk for the journals would be ok, so you can save some money there, the osd's of course configured them as a JBOD, don't use any RAID under it, and use two different networks for public and cluster net.

German

2015-07-01 18:49 GMT-03:00 Nate Curry <curry@xxxxxxxxxxxxx>:
I would like to get some clarification on the size of the journal disks that I should get for my new Ceph cluster I am planning.  I read about the journal settings on http://ceph.com/docs/master/rados/configuration/osd-config-ref/#journal-settings but that didn't really clarify it for me that or I just didn't get it.  I found in the Learning Ceph Packt book it states that you should have one disk for journalling for every 4 OSDs.  Using that as a reference I was planning on getting multiple systems with 8 x 6TB inline SAS drives for OSDs with two SSDs for journalling per host as well as 2 hot spares for the 6TB drives and 2 drives for the OS.  I was thinking of 400GB SSD drives but am wondering if that is too much.  Any informed opinions would be appreciated.

Thanks,

Nate Curry

_______________________________________________

ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com