Re: long blocking with writes on rbds

Christian Balzer <chibi@xxxxxxx> · Thu, 9 Apr 2015 16:14:32 +0900

On Thu, 09 Apr 2015 00:25:08 -0400 Jeff Epstein wrote:

Running Ceph on AWS is, as was mentioned before, certainly not going to
improve things when compared to real HW.
At the very least it will make performance unpredictable.

Your 6 OSDs are on a single VM from what I gather?
Aside from being a very small number for something that you seem to be
using in some sort of production environment (Ceph gets faster the more
OSDs you add), where is the redundancy, HA in that?

The number of your PGs and PGPs need to have at least a semblance of being
correctly sized, as others mentioned before.
You want to re-read the Ceph docs about that and check out the PG
calculator:
http://ceph.com/pgcalc/

> 
> >> Our workload involves creating and destroying a lot of pools. Each
> >> pool has 100 pgs, so it adds up. Could this be causing the problem?
> >> What would you suggest instead?
> >
> > ...this is most likely the cause. Deleting a pool causes the data and
> > pgs associated with it to be deleted asynchronously, which can be a lot
> > of background work for the osds.
> >
> > If you're using the cfq scheduler you can try decreasing the priority 
> > of these operations with the "osd disk thread ioprio..." options:
> >
> > http://ceph.com/docs/master/rados/configuration/osd-config-ref/#operations 
> >
> >
> > If that doesn't help enough, deleting data from pools before deleting
> > the pools might help, since you can control the rate more finely. And
> > of course not creating/deleting so many pools would eliminate the
> > hidden background cost of deleting the pools.
> 
> Thanks for your answer. Some follow-up questions:
> 
> - I wouldn't expect that pool deletion is the problem, since our pools, 
> although many, don't contain much data. Typically, we will have one rbd 
> per pool, several GB in size, but in practice containing little data. 
> Would you expect that performance penalty from deleting pool to be 
> relative to the requested size of the rbd, or relative to the quantity 
> of data actually stored in it?
> 
Since RBDs are sparsely allocated, the actual data used is the key factor.
But you're adding the pool removal overhead to this.

> - Rather than creating and deleting multiple pools, each containing a 
> single rbd, do you think we would see a speed-up if we were to instead 
> have one pool, containing multiple (frequently created and deleted) 
> rbds? Does the performance penalty stem only from deleting pools 
> themselves, or from deleting objects within the pool as well?
> 
Both and the fact that you have overloaded the PGs by nearly a factor of
10 (or 20 if you're actually using a replica of 3 and not 1)doesn't help
one bit.
And lets clarify what objects are in the Ceph/RBD context, they're the (by
default) 4MB blobs that make up a RBD image.

> - Somewhat off-topic, but for my own curiosity: Why is deleting data so 
> slow, in terms of ceph's architecture? Shouldn't it just be a matter of 
> flagging a region as available and allowing it to be overwritten, as 
> would a traditional file system?
> 
Apples and oranges, as RBD is block storage, not a FS.
That said, a traditional FS is local and updates an inode or equivalent
bit.
For Ceph to delete a RBD image, it has to go to all cluster nodes with
OSDs that have PGs that contain objects of that image. Then those objects
have to be deleted on the local filesystem of the OSD and various maps
updated cluster wide. Rince and repeat until all objects have been dealt
with.
Quite a bit more involved, but that's the price you have to pay when you
have a DISTRIBUTED storage architecture that doesn't rely on a single item
(like an inode) to reflect things for the whole system.

Christian

> Jeff
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com