Re: Write back mode Cach-tier behavior

Christian Balzer <chibi@xxxxxxx> · Tue, 6 Jun 2017 11:52:03 +0900

On Tue, 6 Jun 2017 10:25:38 +0800 TYLin wrote:

> > On Jun 5, 2017, at 6:47 PM, Christian Balzer <chibi@xxxxxxx> wrote:
> > 
> > Personally I avoid odd numbered releases, but my needs for stability
> > and low update frequency seem to be far off the scale for "normal" Ceph
> > users.
> > 
> > W/o precise numbers of files and the size of your SSDs (which type?) it is
> > hard to say, but you're likely to be better off just having all metadata
> > on an SSD pool instead of cache-tiering.
> > 800MB/s sounds about right for your network and cluster in general (no
> > telling for sure w/o SSD/HDD details of course).
> > 
> > As I pointed out before and will try to explain again below, that speed
> > difference, while pretty daunting, isn't all that surprising. 
> >   
> 
> SSD: Intel S3520 240GB

At a theoretical maximum speed of 300MB/s per drive this explains your
800MB/s (in conjunction with your network).

These SSDs have an endurance of about 1 DWPD, I'd be monitoring them
closely for wear-out.

Christian

> HDD: WDC WD5003ABYZ-011FA0 500GB
> fio: bs=4m iodepth=32
> dd: bs=4m
> The test file is 20GB.
> 
> > No, not quite. Re-read what I wrote, there's a difference between RADOS
> > object creation and actual data (contents).
> > 
> > The devs or other people with more code familiarity will correct me, but
> > essentially as I understand it this happens when a new RADOS object gets
> > created in conjunction with a cache-tier:
> > 
> > 1. Client (cephfs, rbd, whatever) talks to the cache-tier and the
> > transaction causes a new object to be created.
> > Since the tier is an overlay of the actual backing storage, the object
> > (but not necessarily the curent data in it) needs to exist on both.
> > 2. Object gets created on backing storage  which involves creating the
> > file (at zero length), any needed directories above and the entry in the
> > OMAP leveldb. All on HDDs, all slow.
> > I'm pretty sure this needs to be done and finished before the object is
> > usable, no journals to speed this up.
> > 3. Cache-tier pseudo-promotes the new object (it is empty after all) and
> > starts accepting writes.
> > 
> > This is leaving out any metadata stuff CephFS needs to do for new "blocks"
> > and files, which may also be more involved than overwrites. 
> > 
> > Christian  
> 
> You make it clear to me! thanks! Really appreciate your kind explanation.
> 
> Thanks,
> Ting Yi Lin

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com