Re: Poor RBD performance as LIO iSCSI target

David Moreau Simard <dmsimard@xxxxxxxx> · Tue, 18 Nov 2014 23:48:57 +0000

Testing without the cache tiering is the next test I want to do when I have time..

When it's hanging, there is no activity at all on the cluster.
Nothing in "ceph -w", nothing in "ceph osd pool stats".

I'll provide an update when I have a chance to test without tiering. 
--
David Moreau Simard

> On Nov 18, 2014, at 3:28 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> 
> Hi David,
> 
> Have you tried on a normal replicated pool with no cache? I've seen a number
> of threads recently where caching is causing various things to block/hang.
> It would be interesting to see if this still happens without the caching
> layer, at least it would rule it out.
> 
> Also is there any sign that as the test passes ~50GB that the cache might
> start flushing to the backing pool causing slow performance?
> 
> I am planning a deployment very similar to yours so I am following this with
> great interest. I'm hoping to build a single node test "cluster" shortly, so
> I might be in a position to work with you on this issue and hopefully get it
> resolved.
> 
> Nick
> 
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> David Moreau Simard
> Sent: 18 November 2014 19:58
> To: Mike Christie
> Cc: ceph-users@xxxxxxxxxxxxxx; Christopher Spearman
> Subject: Re:  Poor RBD performance as LIO iSCSI target
> 
> Thanks guys. I looked at http://tracker.ceph.com/issues/8818 and chatted
> with "dis" on #ceph-devel.
> 
> I ran a LOT of tests on a LOT of comabination of kernels (sometimes with
> tunables legacy). I haven't found a magical combination in which the
> following test does not hang:
> fio --name=writefile --size=100G --filesize=100G --filename=/dev/rbd0
> --bs=1M --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write
> --refill_buffers --end_fsync=1 --iodepth=200 --ioengine=libaio
> 
> Either directly on a mapped rbd device, on a mounted filesystem (over rbd),
> exported through iSCSI.. nothing.
> I guess that rules out a potential issue with iSCSI overhead.
> 
> Now, something I noticed out of pure luck is that I am unable to reproduce
> the issue if I drop the size of the test to 50GB. Tests will complete in
> under 2 minutes.
> 75GB will hang right at the end and take more than 10 minutes.
> 
> TL;DR of tests:
> - 3x fio --name=writefile --size=50G --filesize=50G --filename=/dev/rbd0
> --bs=1M --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write
> --refill_buffers --end_fsync=1 --iodepth=200 --ioengine=libaio
> -- 1m44s, 1m49s, 1m40s
> 
> - 3x fio --name=writefile --size=75G --filesize=75G --filename=/dev/rbd0
> --bs=1M --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write
> --refill_buffers --end_fsync=1 --iodepth=200 --ioengine=libaio
> -- 10m12s, 10m11s, 10m13s
> 
> Details of tests here: http://pastebin.com/raw.php?i=3v9wMtYP
> 
> Does that ring you guys a bell ?
> 
> --
> David Moreau Simard
> 
> 
>> On Nov 13, 2014, at 3:31 PM, Mike Christie <mchristi@xxxxxxxxxx> wrote:
>> 
>> On 11/13/2014 10:17 AM, David Moreau Simard wrote:
>>> Running into weird issues here as well in a test environment. I don't
> have a solution either but perhaps we can find some things in common..
>>> 
>>> Setup in a nutshell:
>>> - Ceph cluster: Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (OSDs with 
>>> separate public/cluster network in 10 Gbps)
>>> - iSCSI Proxy node (targetcli/LIO): Ubuntu 14.04, Kernel 3.16.7, Ceph 
>>> 0.87-1 (10 Gbps)
>>> - Client node: Ubuntu 12.04, Kernel 3.11 (10 Gbps)
>>> 
>>> Relevant cluster config: Writeback cache tiering with NVME PCI-E cards (2
> replica) in front of a erasure coded pool (k=3,m=2) backed by spindles.
>>> 
>>> I'm following the instructions here: 
>>> http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd-ima
>>> ges-san-storage-devices No issues with creating and mapping a 100GB 
>>> RBD image and then creating the target.
>>> 
>>> I'm interested in finding out the overhead/performance impact of
> re-exporting through iSCSI so the idea is to run benchmarks.
>>> Here's a fio test I'm trying to run on the client node on the mounted
> iscsi device:
>>> fio --name=writefile --size=100G --filesize=100G --filename=/dev/sdu 
>>> --bs=1M --nrfiles=1 --direct=1 --sync=0 --randrepeat=0 --rw=write 
>>> --refill_buffers --end_fsync=1 --iodepth=200 --ioengine=libaio
>>> 
>>> The benchmark will eventually hang towards the end of the test for some
> long seconds before completing.
>>> On the proxy node, the kernel complains with iscsi portal login 
>>> timeout: http://pastebin.com/Q49UnTPr and I also see irqbalance 
>>> errors in syslog: http://pastebin.com/AiRTWDwR
>>> 
>> 
>> You are hitting a different issue. German Anders is most likely 
>> correct and you hit the rbd hang. That then caused the iscsi/scsi 
>> command to timeout which caused the scsi error handler to run. In your 
>> logs we see the LIO error handler has received a task abort from the 
>> initiator and that timed out which caused the escalation (iscsi portal 
>> login related messages).
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com