Re: rados -p <pool> cache-flush-evict-all surprisingly slow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



My recollection is that the RADOS tool is issuing a special eviction command on every object in the cache tier using primitives we don't use elsewhere. Their existence is currently vestigial from our initial tiering work (rather than the present caching), but I have some hope we'll extend them again in the future.

The usual flushing and eviction routines, meanwhile, run as an agent inside of the OSD and are extremely parallel. I think there's documentation about how to flush entire cache pools in preparation for removing them; I'd check those out. :)
-Greg
On Wed, Nov 12, 2014 at 7:46 AM Martin Millnert <martin@xxxxxxxxxxx> wrote:
Dear Cephers,

I have a lab setup with 6x dual-socket hosts, 48GB RAM, 2x10Gbps hosts,
each equipped with 2x S3700 100GB SSDs and 4x 500GB HDD, where the HDDs
are mapped in a tree under a 'platter' root tree similar to guidance from
Seb at http://www.sebastien-han.fr/blog/2014/08/25/ceph-mix-sata-and-ssd-within-the-same-box/ ,
and SSDs similarily under an 'ssd' root.  Replication is set to 3.
Journals on tmpfs (simulating NVRAM).

I have put an ssd pool as a cache tier in front of an hdd pool ("rbd"), and run
fio-rbd against "rbd".  In the benchmarks, at bs=32kb, QD=128 from a
single separate client machine, I reached at peak throughput of around
1.2 GB/s.  So there is some capability.  IOPS-wise I see a max of around
15k iops currently.

After having filled the SSD cache tier, I ran rados -p rbd
cache-flush-evict-all - and I was expecting to see the 6 SSD OSDs start
to evict all the cache-tier pg's to the underlying pool, rbd, which maps
to the HDDs.  I would have expected parallellism and high throughput,
but what I now observe is ~80 MB/s on average flush speed.

Which leads me to the question:  Is "rados -p <pool>
cache-flush-evict-all" supposed to work in a parallell manner?

Cursory viewing in tcpdump suggests to me that eviction operation is
serial, in which case the performance could make a little bit sense,
since it is basically limited by the write speed of a single hdd.

What should I see?

If it is indeed a serial operation, is this different from the regular
cache tier eviction routines that are triggered by full_ratios, max
objects or max storage volume?

Regards,
Martin
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux