Re: OSD Performance

Christian Balzer <chibi@xxxxxxx> · Wed, 25 Feb 2015 09:40:41 +0900

On Wed, 25 Feb 2015 02:50:59 +0400 Kevin Walker wrote:

> Hi Mark
> 
> Thanks for the info, 22k is not bad, but still massively below what a
> pcie ssd can achieve. Care to expand on why the write IOPS are so low?

Aside from what Mark mentioned in his reply there's also latency to be
considered in the overall picture.

But my (and other people's tests, including Mark's recent PDF posted here)
clearly indicate where the problem with small write (4k) IOPS is, the
CPU utilization by mostly Ceph code (but significant OS time, too).

To quote myself:
I did some brief tests with a machine having 8 DC S3700 100GB for OSDs
(replica 1) under 0.80.6 and the right (make that wrong) type of load
(small, 4k I/Os) did melt all of the 8 3.5GHz cores in that box.
While never exceeding 15% utilization of the SSDs.

Even with further optimizations I predict the CPUs() to remain the limiting
factor for small write IOPS. 
So with that in mind, a pure SSD storage node design will have to consider
that and spend money where it actually improves things.

> Was this with a separate RAM disk pcie device or SLC SSD for the journal?
> 
> That fragmentation percentage looks good. We are considering using just
> SSD's for OSD's and RAM disk pcie devices for the Journals so this would
> be ok.
> 
For starters, you clearly have too much money.
You're not going to see a good return on investment, as per what I wrote
above. Even faster journals are pointless, having the journal on the
actual OSD SSDs is a non-issue performance wise and makes things a lot
more straightforward. 
I could totally see a much more primitive (HDD OSDs, journal SSDs) but
more balanced and parallelized cluster outperform your design at the same
cost (but admittedly more space usage). 

Secondly, why would you even care one iota about file system fragmentation
when using SSDs for all your storage?

Regards,

Christian

> Kind regards
> 
> Kevin Walker
> +968 9765 1742
> 
> On 25 Feb 2015, at 02:35, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
> 
> > On 02/24/2015 04:21 PM, Kevin Walker wrote:
> > Hi All
> > 
> > Just recently joined the list and have been reading/learning about ceph
> > for the past few months. Overall it looks to be well suited to our
> > cloud platform but I have stumbled across a few worrying items that
> > hopefully you guys can clarify the status of.
> > 
> > Reading through various mailing list archives, it would seem an OSD
> > caps out at about 3k IOPS. Dieter Kasper from Fujistu made an
> > interesting observation about the size of the OSD code(20k plus lines
> > at that time), is this being optimized further and has this IOPS limit
> > been improved in Giant?
> 
> In recent tests under fairly optimal conditions, I'm seeing performance
> topping out at about 4K object writes/s and 22K object reads/s against
> an OSD with a very fast PCIe SSD.  There are several reasons writes are
> slower than reads, but this is something we are working on improving in
> a variety of ways.
> 
> I believe others may have achieved even higher results.
> 
> > 
> > Is there a way to over come the XFS fragmentation problems other users
> > have experienced?
> 
> Setting the newish filestore_xfs_extsize parameter to true appears to
> help in testing we did a couple months ago.  We filled up a cluster to
> near capacity (~70%) and then did 12 hours of random writes.  After the
> test completed, with filestore_xfs_extsize disabled we were seeing
> something like 13% fragmentation, while with it enabled we were seeing
> around 0.02% fragmentation.
> 
> > 
> > Kind regards
> > 
> > Kevin
> > 
> > 
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com