Re: Write back mode Cach-tier behavior

TYLin <wooertim@xxxxxxxxx> · Tue, 6 Jun 2017 10:25:38 +0800

On Jun 5, 2017, at 6:47 PM, Christian Balzer <chibi@xxxxxxx> wrote:

Personally I avoid odd numbered releases, but my needs for stability
and low update frequency seem to be far off the scale for "normal" Ceph
users.

W/o precise numbers of files and the size of your SSDs (which type?) it is
hard to say, but you're likely to be better off just having all metadata
on an SSD pool instead of cache-tiering.
800MB/s sounds about right for your network and cluster in general (no
telling for sure w/o SSD/HDD details of course).

As I pointed out before and will try to explain again below, that speed
difference, while pretty daunting, isn't all that surprising. 

SSD: Intel S3520 240GB
HDD: WDC WD5003ABYZ-011FA0 500GB
fio: bs=4m iodepth=32
dd: bs=4m
The test file is 20GB.

No, not quite. Re-read what I wrote, there's a difference between RADOS
object creation and actual data (contents).

The devs or other people with more code familiarity will correct me, but
essentially as I understand it this happens when a new RADOS object gets
created in conjunction with a cache-tier:

1. Client (cephfs, rbd, whatever) talks to the cache-tier and the
transaction causes a new object to be created.
Since the tier is an overlay of the actual backing storage, the object
(but not necessarily the curent data in it) needs to exist on both.
2. Object gets created on backing storage  which involves creating the
file (at zero length), any needed directories above and the entry in the
OMAP leveldb. All on HDDs, all slow.
I'm pretty sure this needs to be done and finished before the object is
usable, no journals to speed this up.
3. Cache-tier pseudo-promotes the new object (it is empty after all) and
starts accepting writes.

This is leaving out any metadata stuff CephFS needs to do for new "blocks"
and files, which may also be more involved than overwrites. 

Christian

You make it clear to me! thanks! Really appreciate your kind explanation.

Thanks,
Ting Yi Lin
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com