Re: rados -p <pool> cache-flush-evict-all surprisingly slow

Gregory Farnum <greg@xxxxxxxxxxx> · Wed, 12 Nov 2014 17:57:57 +0000

My recollection is that the RADOS tool is issuing a special eviction command on every object in the cache tier using primitives we don't use elsewhere. Their existence is currently vestigial from our initial tiering work (rather than the present caching), but I have some hope we'll extend them again in the future.

The usual flushing and eviction routines, meanwhile, run as an agent inside of the OSD and are extremely parallel. I think there's documentation about how to flush entire cache pools in preparation for removing them; I'd check those out. :)
-Greg
On Wed, Nov 12, 2014 at 7:46 AM Martin Millnert <martin@xxxxxxxxxxx> wrote:
Dear Cephers,

I have a lab setup with 6x dual-socket hosts, 48GB RAM, 2x10Gbps hosts,

each equipped with 2x S3700 100GB SSDs and 4x 500GB HDD, where the HDDs

are mapped in a tree under a 'platter' root tree similar to guidance from

Seb at http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ ,

and SSDs similarily under an 'ssd' root.  Replication is set to 3.

Journals on tmpfs (simulating NVRAM).

I have put an ssd pool as a cache tier in front of an hdd pool ("rbd"), and run

fio-rbd against "rbd".  In the benchmarks, at bs=32kb, QD=128 from a

single separate client machine, I reached at peak throughput of around

1.2 GB/s.  So there is some capability.  IOPS-wise I see a max of around

15k iops currently.

After having filled the SSD cache tier, I ran rados -p rbd

cache-flush-evict-all - and I was expecting to see the 6 SSD OSDs start

to evict all the cache-tier pg's to the underlying pool, rbd, which maps

to the HDDs.  I would have expected parallellism and high throughput,

but what I now observe is ~80 MB/s on average flush speed.

Which leads me to the question:  Is "rados -p <pool>

cache-flush-evict-all" supposed to work in a parallell manner?

Cursory viewing in tcpdump suggests to me that eviction operation is

serial, in which case the performance could make a little bit sense,

since it is basically limited by the write speed of a single hdd.

What should I see?

If it is indeed a serial operation, is this different from the regular

cache tier eviction routines that are triggered by full_ratios, max

objects or max storage volume?

Regards,

Martin

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com