Hello, On Sun, 27 Mar 2016 18:44:41 +0200 (CEST) Daniel Delin wrote: > Hi, > > I have ordered three 240GB Samsung SM863 SSD to my 3 OSD hosts, each > with 4 OSDs, to improve write performance. Did you test these SSDs in advance? While I'm pretty sure they are suitable for Ceph journals I haven't seen any sync write results for them, so if you did test them or will when you get them, by all means share those results with us. > When looking at the docs, > there is a formula for journal size (osd journal size = {2 * (expected > throughput * filestore max sync interval)}) that I intend to use. If I > understand this correctly it would in my case be (2*(4*100MB/s)*5 > seconds)=4GB journal size if I keep the default filestore max sync > interval of 5 seconds. Since the SSDs are 240GB, I plan to use > significantly larger journals of maybe 40GB, and with the above logic I > would increase filestore max sync interval to 50 seconds. Is this the > correct way of calculating ? In essence, yes. >Is there any downsides of having a long filestore max sync interval ? > In and by itself, not so much. However your goal here seems to be to not "waste" lots of empty SSD space and use it for journaling. And a long max sync interval won't give you that. For starters remember that Ceph journals are write only in normal operation, they only ever get read from if there was a crash. Writes happen to the journal(s) of all OSDs involved, then get ACK'ed to the client, then filestore_min_sync_interval and the various filestore_queue parameters determine when the data gets written (from RAM) to the filestore. Which is pretty damn instantly, the reasoning here by the Ceph developers is to not let the OSD fall behind too much and then have it overwhelmed by many competing operations. On a cluster with filestore_min_sync_interval set to 0.5 (up from it's default of 0.01) I still don't see more than 40MB journal utilization at peak times(sequential writes at full cluster speed), though I didn't modify the queue parameters. The largest utilization I've ever seen (collectd/graphite are your friends) is 100MB in an other cluster when doing backfills. I size my journals 10-20GB, but that's basically because I have the space. Since SSD write speeds are pretty much tied to their size (due to internal parallelism), only moderately large ones give you the speeds needed to journal for several HDDs, resulting in "waste". One of the reasons I tend to put the OS on the same SSDs as well in RAID1 or 10 form depending on the number of SSDs. Christian > //Daniel > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com