Re: Dramatic performance drop at certain number of objects in pool

Christian Balzer <chibi@xxxxxxx> · Thu, 23 Jun 2016 12:38:49 +0900

Hello Blair, hello Wade (see below),

On Thu, 23 Jun 2016 12:55:17 +1000 Blair Bethwaite wrote:

> On 23 June 2016 at 12:37, Christian Balzer <chibi@xxxxxxx> wrote:
> > Case in point, my main cluster (RBD images only) with 18 5+TB OSDs on 3
> > servers (64GB RAM each) has 1.8 million 4MB RBD objects using about 7%
> > of the available space.
> > Don't think I could hit this problem before running out of space.
> 
> Perhaps. However ~30TB per server is pretty low with present HDD
> sizes. 
These are in fact 24 3TB HDDs per server, but in 6 RAID10s with 4 HDDs
each.

>In the pool on our large cluster where we've seen this issue we
> have 24x 4TB OSDs per server, and we first hit the problem in pre-prod
> testing at about 20% usage (with default 4MB objects). We went to 40 /
> 8. Then as I reported the other day we hit the issue again at
> somewhere around 50% usage. Now we're at 50 / 12.
> 
High density storage servers have a number of other gotchas and tuning
requirements, I'd consider this simply another one.

As for increasing the default RBD object size, I'd be weary about
performance impacts, especially if you ever are going to have a cache-tier.

If there is no cache-tier in your future for certain, striping might
counteract larger objects.

> The boxes mentioned above are a couple of years old. Today we're
> buying 2RU servers with 128TB in them (16x 8TB)!
> 
As people including me noticed and noted, large OSDs are pushing things,
in more ways than just this issues.

I know very well how attractive it is from a cost and rack space (also
a cost factor of course) perspective to build dense storage nodes, but
most people need more IOPS than storage space and that's were smaller,
faster OSDs are better suited, as pointed out in the Ceph docs for a long
time.

> Replacing our current NAS on RBD setup with CephFS is now starting to
> scare me...
> 
If this is going to happen when Bluestore is stable, this _particular_
problem should be a non-issue hopefully.
I'm sure Murphy will find other amusing ways to keep us entertained
and high-stressed, though.
If nothing else, CephFS would scare me more than a by now well known
problem that can be tuned away.

A question/request for Wade, would it be possible to reformat your OSDs
with Ext4 (I know deprecated, but if you know what you're doing...), BTRFS?
I'm wondering if either doesn't exhibit this behavior or if so at a
different point?

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com