Re: Poor RBD performance as LIO iSCSI target

David Moreau Simard <dmsimard@xxxxxxxx> · Thu, 20 Nov 2014 20:02:53 +0000

Nick,

Can you share more datails on the configuration you are using ? I'll try and duplicate those configurations in my environment and see what happens.
I'm mostly interested in:
- Erasure code profile (k, m, plugin, ruleset-failure-domain)
- Cache tiering pool configuration (ex: hit_set_type, hit_set_period, hit_set_count, target_max_objects, target_max_bytes, cache_target_dirty_ratio, cache_target_full_ratio, cache_min_flush_age, cache_min_evict_age)

The crush rulesets would also be helpful.

Thanks,
--
David Moreau Simard

> On Nov 20, 2014, at 12:43 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
> 
> Hi David,
> 
> I've just finished running the 75GB fio test you posted a few days back on
> my new test cluster.
> 
> The cluster is as follows:-
> 
> Single server with 3x hdd and 1 ssd
> Ubuntu 14.04 with 3.16.7 kernel
> 2+1 EC pool on hdds below a 10G ssd cache pool. SSD is also partitioned to
> provide journals for hdds.
> 150G RBD mapped locally
> 
> The fio test seemed to run without any problems. I want to run a few more
> tests with different settings to see if I can reproduce your problem. I will
> let you know if I find anything.
> 
> If there is anything you would like me to try, please let me know.
> 
> Nick
> 
> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> David Moreau Simard
> Sent: 19 November 2014 10:48
> To: Ramakrishna Nishtala (rnishtal)
> Cc: ceph-users@xxxxxxxxxxxxxx; Nick Fisk
> Subject: Re:  Poor RBD performance as LIO iSCSI target
> 
> Rama,
> 
> Thanks for your reply.
> 
> My end goal is to use iSCSI (with LIO/targetcli) to export rbd block
> devices.
> 
> I was encountering issues with iSCSI which are explained in my previous
> emails.
> I ended up being able to reproduce the problem at will on various Kernel and
> OS combinations, even on raw RBD devices - thus ruling out the hypothesis
> that it was a problem with iSCSI but rather with Ceph.
> I'm even running 0.88 now and the issue is still there.
> 
> I haven't isolated the issue just yet.
> My next tests involve disabling the cache tiering.
> 
> I do have client krbd cache as well, i'll try to disable it too if cache
> tiering isn't enough.
> --
> David Moreau Simard
> 
> 
>> On Nov 18, 2014, at 8:10 PM, Ramakrishna Nishtala (rnishtal)
> <rnishtal@xxxxxxxxx> wrote:
>> 
>> Hi Dave
>> Did you say iscsi only? The tracker issue does not say though.
>> I am on giant, with both client and ceph on RHEL 7 and seems to work ok,
> unless I am missing something here. RBD on baremetal with kmod-rbd and
> caching disabled.
>> 
>> [root@compute4 ~]# time fio --name=writefile --size=100G 
>> --filesize=100G --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 
>> --sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 
>> --iodepth=200 --ioengine=libaio
>> writefile: (g=0): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, 
>> iodepth=200
>> fio-2.1.11
>> Starting 1 process
>> Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/853.0MB/0KB /s] [0/853/0 
>> iops] [eta 00m:00s] ...
>> Disk stats (read/write):
>>  rbd0: ios=184/204800, merge=0/0, ticks=70/16164931, 
>> in_queue=16164942, util=99.98%
>> 
>> real    1m56.175s
>> user    0m18.115s
>> sys     0m10.430s
>> 
>> Regards,
>> 
>> Rama
>> 
>> 
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf 
>> Of David Moreau Simard
>> Sent: Tuesday, November 18, 2014 3:49 PM
>> To: Nick Fisk
>> Cc: ceph-users@xxxxxxxxxxxxxx
>> Subject: Re:  Poor RBD performance as LIO iSCSI target
>> 
>> Testing without the cache tiering is the next test I want to do when I
> have time..
>> 
>> When it's hanging, there is no activity at all on the cluster.
>> Nothing in "ceph -w", nothing in "ceph osd pool stats".
>> 
>> I'll provide an update when I have a chance to test without tiering.
>> --
>> David Moreau Simard
>> 
>> 
>>> On Nov 18, 2014, at 3:28 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
>>> 
>>> Hi David,
>>> 
>>> Have you tried on a normal replicated pool with no cache? I've seen 
>>> a number of threads recently where caching is causing various things to
> block/hang.
>>> It would be interesting to see if this still happens without the 
>>> caching layer, at least it would rule it out.
>>> 
>>> Also is there any sign that as the test passes ~50GB that the cache 
>>> might start flushing to the backing pool causing slow performance?
>>> 
>>> I am planning a deployment very similar to yours so I am following 
>>> this with great interest. I'm hoping to build a single node test 
>>> "cluster" shortly, so I might be in a position to work with you on 
>>> this issue and hopefully get it resolved.
>>> 
>>> Nick
>>> 
>>> -----Original Message-----
>>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On 
>>> Behalf Of David Moreau Simard
>>> Sent: 18 November 2014 19:58
>>> To: Mike Christie
>>> Cc: ceph-users@xxxxxxxxxxxxxx; Christopher Spearman
>>> Subject: Re:  Poor RBD performance as LIO iSCSI target
>>> 
>>> Thanks guys. I looked at http://tracker.ceph.com/issues/8818 and 
>>> chatted with "dis" on #ceph-devel.
>>> 
>>> I ran a LOT of tests on a LOT of comabination of kernels (sometimes 
>>> with tunables legacy). I haven't found a magical combination in 
>>> which the following test does not hang:
>>> fio --name=writefile --size=100G --filesize=100G 
>>> --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0 
>>> --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 
>>> --iodepth=200 --ioengine=libaio
>>> 
>>> Either directly on a mapped rbd device, on a mounted filesystem 
>>> (over rbd), exported through iSCSI.. nothing.
>>> I guess that rules out a potential issue with iSCSI overhead.
>>> 
>>> Now, something I noticed out of pure luck is that I am unable to 
>>> reproduce the issue if I drop the size of the test to 50GB. Tests 
>>> will complete in under 2 minutes.
>>> 75GB will hang right at the end and take more than 10 minutes.
>>> 
>>> TL;DR of tests:
>>> - 3x fio --name=writefile --size=50G --filesize=50G
>>> --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0
>>> --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 
>>> --iodepth=200 --ioengine=libaio
>>> -- 1m44s, 1m49s, 1m40s
>>> 
>>> - 3x fio --name=writefile --size=75G --filesize=75G
>>> --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0
>>> --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 
>>> --iodepth=200 --ioengine=libaio
>>> -- 10m12s, 10m11s, 10m13s
>>> 
>>> Details of tests here: http://pastebin.com/raw.php?i=3v9wMtYP
>>> 
>>> Does that ring you guys a bell ?
>>> 
>>> --
>>> David Moreau Simard
>>> 
>>> 
>>>> On Nov 13, 2014, at 3:31 PM, Mike Christie <mchristi@xxxxxxxxxx> wrote:
>>>> 
>>>> On 11/13/2014 10:17 AM, David Moreau Simard wrote:
>>>>> Running into weird issues here as well in a test environment. I 
>>>>> don't
>>> have a solution either but perhaps we can find some things in common..
>>>>> 
>>>>> Setup in a nutshell:
>>>>> - Ceph cluster: Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (OSDs 
>>>>> with separate public/cluster network in 10 Gbps)
>>>>> - iSCSI Proxy node (targetcli/LIO): Ubuntu 14.04, Kernel 3.16.7, 
>>>>> Ceph
>>>>> 0.87-1 (10 Gbps)
>>>>> - Client node: Ubuntu 12.04, Kernel 3.11 (10 Gbps)
>>>>> 
>>>>> Relevant cluster config: Writeback cache tiering with NVME PCI-E 
>>>>> cards (2
>>> replica) in front of a erasure coded pool (k=3,m=2) backed by spindles.
>>>>> 
>>>>> I'm following the instructions here: 
>>>>> http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd-
>>>>> im a ges-san-storage-devices No issues with creating and mapping a 
>>>>> 100GB RBD image and then creating the target.
>>>>> 
>>>>> I'm interested in finding out the overhead/performance impact of
>>> re-exporting through iSCSI so the idea is to run benchmarks.
>>>>> Here's a fio test I'm trying to run on the client node on the 
>>>>> mounted
>>> iscsi device:
>>>>> fio --name=writefile --size=100G --filesize=100G 
>>>>> --filename=/dev/sdu --bs=1M --nrfiles=1 --direct=1 --sync=0 
>>>>> --randrepeat=0 --rw=write --refill_buffers --end_fsync=1 
>>>>> --iodepth=200 --ioengine=libaio
>>>>> 
>>>>> The benchmark will eventually hang towards the end of the test for 
>>>>> some
>>> long seconds before completing.
>>>>> On the proxy node, the kernel complains with iscsi portal login
>>>>> timeout: http://pastebin.com/Q49UnTPr and I also see irqbalance 
>>>>> errors in syslog: http://pastebin.com/AiRTWDwR
>>>>> 
>>>> 
>>>> You are hitting a different issue. German Anders is most likely 
>>>> correct and you hit the rbd hang. That then caused the iscsi/scsi 
>>>> command to timeout which caused the scsi error handler to run. In 
>>>> your logs we see the LIO error handler has received a task abort 
>>>> from the initiator and that timed out which caused the escalation 
>>>> (iscsi portal login related messages).
>>> 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> 
>>> 
>>> 
>>> 
>> 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com