Re: tgt and krbd

Nick Fisk <nick@xxxxxxxxxx> · Fri, 6 Mar 2015 16:18:05 -0000

> Hi Jake,
> 
> Good to see it’s not just me.
> 
> I’m guessing that the fact you are doing 1MB writes means that the latency
> difference is having a less noticeable impact on the overall write bandwidth.
> What I have been discovering with Ceph + iSCSi is that due to all the extra
> hops (client->iscsi proxy->pri OSD-> sec OSD) is that you get a lot of latency
> serialisation which dramatically impacts single threaded iops at small IO sizes.
> 
> That makes sense.  I don't really understand how latency is going down if tgt
> is not really doing caching.
> 
> 
> A few days back I tested adding a tiny SSD write cache on the iscsi proxy and
> this had a dramatic effect in “hiding” the latency behind it from the client.
> 
> Nick
> 
> 
> After seeing your results, I've been considering experimenting with
> that.  Currently, my iSCSI proxy nodes are VMs.
> 
> I would like to build a few dedicated servers with fast SSDs or fusion-io
> devices.  It depends on my budget, it's hard to justify getting a card that costs
> 10x the rest of the server...  I would run all my tgt instances in containers
> pointing to the rbd disk+cache device.  A fusion-io device could support many
> tgt containers.
> I don't really want to go back to krbd.  I have a few rbd's that are format 2
> with striping, there aren't any stable kernels that support that (or any kernels
> at all yet for "fancy striping").  I wish there was a way to incorporate a local
> cache device into tgt with librbd backends.
> 
> Jake

Hi Jake,

I spent a bit more time look at this. The fastest solution was probably a supermicro 2u twin with shared SAS backplane and then a couple of dual port SAS SSD's (about £600 each), so that the cache could fail between servers. 

I also looked at doing DRBD with SSD's, but it looks like DRBD also has latency overheads and I'm not sure it would a whole do much better than Ceph to make it worthwhile.

The other thing I looked at is that with flashcache it does everything with 4kb blocks, so even if you write 1MB, it does 256x4kb IOS to the SSD, this requires very low latency for it not become a bottleneck itself. You can limit max IO size to cache, but I still feel it’s a limiting factor.

EnhanceIO on the other hand looks like it writes the actual IO size down to the SSD so may not suffer from this problem.

I'm hoping to be able to invest in 10GB cards for the ESXi hosts and waiting for Hammer to see if there are any improvements.

Nick

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com