On Thu, Jun 4, 2015 at 6:31 AM, Nick Fisk <nick@xxxxxxxxxx> wrote: > > Hi All, > > I have 2 pools both on the same set of OSD’s, 1st is the default rbd pool created at installation 3 months ago, the other has just recently been created, to verify performance problems. > > As mentioned both pools are on the same set of OSD’s, same crush ruleset and RBD’s on both are identical in size, version and order. The only real difference that I can think of is that the existing pool as around 5 million objects on it. > > Testing using RBD enabled fio, I see the newly created pool get an expected random read IO performance of around 60 iop’s. The existing pool only gets around half of this. New pool latency = ~15ms Old pool latency = ~35ms for random reads. > > There is no other IO going on in the cluster at the point of running these tests. > > XFS fragmentation is low, somewhere around 1-2% on most of the disks. Only difference I can think of is that the existing pool has data on it where the new one is empty apart from testing RBD, should this make a difference? > > Any ideas? > > Any hints on what I can check to see why latency is so high for the existing pool? > > Nick Apart from what Somnath said, depending on your PG counts and configuration setup you might also have put enough objects into the cluster that you have a multi-level PG folder hierarchy in the old pool. I wouldn't expect that to make a difference because those folders should be cached in RAM, but if somehow they're not that would require more disk accesses. But more likely it's as Somnath suggests and since most of the objects don't exist for images in the new pool it's able to put back ENOENT on accesses much more quickly. -Greg _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com