Re: How to estimate whether putting a journal on SSD will help with performance?

Christian Balzer <chibi@xxxxxxx> · Fri, 1 May 2015 23:17:04 +0900

Hello,

On Fri, 1 May 2015 15:45:41 +0200 Piotr Wachowicz wrote:

> > yes SSD-Journal helps a lot (if you use the right SSDs)
> >
> 
> What SSDs to avoid for journaling from your experience? Why?
> 
Read the rather countless SSD threads on this ML, use the archive and your
google foo.
Like the _current_ thread:
"Possible improvements for a slow write speed (excluding independent SSD
journals)"

In short anything w/o "power caps" and the resulting speed degradation
when it comes to DSYNC writes. 

> >
> > > We're seeing very disappointing Ceph performance. We have 10GigE
> > > interconnect (as a shared public/internal network).
> > Which kind of CPU do you use for the OSD-hosts?
> >
> >
> Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
> 
> FYI, we are hosting VMs on our OSD nodes, but the VMs use very small
> amounts of CPUs and RAM
>
Small write IOPS with Ceph cost a LOT of CPU cycles, but I suppose you're
not limited by your CPU, yet. You may be once you have SSD journals.

> >
> > > We're wondering whether it makes sense to buy SSDs and put journals
> > > on them. But we're looking for a way to verify that this will
> > > actually help BEFORE we splash cash on SSDs.
> > I can recommend the Intel DC S3700 SSD for journaling! In the beginning
> > I started with different much cheaper models, but this was the wrong
> > decision.
> >
> 
> What, apart from the price, made the difference? sustained read/write
> bandwidth? IOPS?
> 
See the thread mentioned above.

> We're considering this one (PCI-e SSD). What do you think?
> http://www.plextor-digital.com/index.php/en/M6e-BK/m6e-bk.html
> PX-128M6e-BK
> 
I think... Gamer consumer toy.
And a website that doesn't give you actual endurance information, other
than a flimsy 5 year "warranty". 

Depending on how many OSDs (how many are in your storage nodes?) you plan
to put behind a journal SSD and the amount of writes you expect, there
will be better, known to work options. 
Like the Intel DC models.

> 
> Also, we're thinking about sharing one SSD between two OSDs. Any reason
> why this would be a bad idea?
> 
Failure domain. How many storage nodes and OSDs do you have?
A 1:2 ratio is rather conservative actually, a lot of people are happy
with 1:3 to 1:5. 
But that means your cluster must be able to survive the loss of 5 OSDs if
that SSD fails and the resulting rebuild storm, something that small budget
clusters don't tend to be capable of.

Other than that, sufficient write bandwidth (about 70MB/s per SATA HDD).
IOPS (see thread above) are unlikely to be the limiting factor with SSD
journals. 

> 
> > > We're using Ceph for OpenStack storage (kvm). Enabling RBD cache
> > > didn't really help all that much.
> > The read speed can be optimized with an bigger read ahead cache inside
> > the VM, like:
> > echo 4096 > /sys/block/vda/queue/read_ahead_kb
> 
Yup, that can help alot with reads, but I think Nick has your problem
nailed.

Christian
> 
> 
> Thanks, we will try that.

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com