Hi David,
This is a long shot, but have you checked the Max queue depth on the
iscsi side. I've got a feeling that lio might be set at 32 as default.
This would definitely have an effect at the high queue depths you are
testing with.
On 8 Dec 2014 16:53, David Moreau Simard <dmsimard@xxxxxxxx> wrote:
Haven't tried other iSCSI implementations (yet). LIO/targetcli makes
it very easy to iQuoting David Moreau Simard <dmsimard@xxxxxxxx>
Haven't tried other iSCSI implementations (yet).
LIO/targetcli makes it very easy to
implement/integrate/wrap/automate around so I'm really trying to get
this right.
PCI-E SSD cache tier in front of spindles-backed erasure coded pool
in 10 Gbps across the board yields results slightly better or very
similar to two spindles in hardware RAID-0 with writeback caching.
With that in mind, the performance is not outright awful by any
means, there's just a lot of overhead we have to be reminded about.
What I'd like to further test but am unable to right now is to see
what happens if you scale up the cluster. Right now I'm testing on
only two nodes.
Does the IOPS scale linearly with increasing amount of OSDs/servers
? Or is it more about a capacity thing ?
Perhaps if someone else can chime in, I'm really curious.
--
David Moreau Simard
On Dec 6, 2014, at 11:18 AM, Nick Fisk <nick@xxxxxxxxxx> wrote:
Hi David,
Very strange, but I'm glad you managed to finally get the cluster working
normally. Thank you for posting the benchmarks figures, it's interesting to
see the overhead of LIO over pure RBD performance.
I should have the hardware for our cluster up and running early next year, I
will be in a better position to test the iSCSI performance then. I will
report back once I have some numbers.
Just out of interest, have you tried any of the other iSCSI implementations
to see if they show the same performance drop?
Nick
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
David Moreau Simard
Sent: 05 December 2014 16:03
To: Nick Fisk
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: Poor RBD performance as LIO iSCSI target
I've flushed everything - data, pools, configs and reconfigured the whole
thing.
I was particularly careful with cache tiering configurations (almost leaving
defaults when possible) and it's not locking anymore.
It looks like the cache tiering configuration I had was causing the problem
? I can't put my finger on exactly what/why and I don't have the luxury of
time to do this lengthy testing again.
Here's what I dumped as far as config goes before wiping:
========
# for var in size min_size pg_num pgp_num crush_ruleset
erasure_code_profile; do ceph osd pool get volumes $var; done
size: 5
min_size: 2
pg_num: 7200
pgp_num: 7200
crush_ruleset: 1
erasure_code_profile: ecvolumes
# for var in size min_size pg_num pgp_num crush_ruleset hit_set_type
hit_set_period hit_set_count target_max_objects target_max_bytes
cache_target_dirty_ratio cache_target_full_ratio cache_min_flush_age
cache_min_evict_age; do ceph osd pool get volumecache $var; done
size: 2
min_size: 1
pg_num: 7200
pgp_num: 7200
crush_ruleset: 4
hit_set_type: bloom
hit_set_period: 3600
hit_set_count: 1
target_max_objects: 0
target_max_bytes: 100000000000
cache_target_dirty_ratio: 0.5
cache_target_full_ratio: 0.8
cache_min_flush_age: 600
cache_min_evict_age: 1800
# ceph osd erasure-code-profile get ecvolumes
directory=/usr/lib/ceph/erasure-code
k=3
m=2
plugin=jerasure
ruleset-failure-domain=osd
technique=reed_sol_van
========
And now:
========
# for var in size min_size pg_num pgp_num crush_ruleset
erasure_code_profile; do ceph osd pool get volumes $var; done
size: 5
min_size: 3
pg_num: 2048
pgp_num: 2048
crush_ruleset: 1
erasure_code_profile: ecvolumes
# for var in size min_size pg_num pgp_num crush_ruleset hit_set_type
hit_set_period hit_set_count target_max_objects target_max_bytes
cache_target_dirty_ratio cache_target_full_ratio cache_min_flush_age
cache_min_evict_age; do ceph osd pool get volumecache $var; done
size: 2
min_size: 1
pg_num: 2048
pgp_num: 2048
crush_ruleset: 4
hit_set_type: bloom
hit_set_period: 3600
hit_set_count: 1
target_max_objects: 0
target_max_bytes: 150000000000
cache_target_dirty_ratio: 0.5
cache_target_full_ratio: 0.8
cache_min_flush_age: 0
cache_min_evict_age: 1800
# ceph osd erasure-code-profile get ecvolumes
directory=/usr/lib/ceph/erasure-code
k=3
m=2
plugin=jerasure
ruleset-failure-domain=osd
technique=reed_sol_van
========
Crush map hasn't really changed before and after.
FWIW, the benchmarks I pulled out of the setup:
https://gist.github.com/dmsimard/2737832d077cfc5eff34
Definite overhead going from krbd to krbd + LIO...
--
David Moreau Simard
On Nov 20, 2014, at 4:14 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
Here you go:-
Erasure Profile
k=2
m=1
plugin=jerasure
ruleset-failure-domain=osd
ruleset-root=hdd
technique=reed_sol_van
Cache Settings
hit_set_type: bloom
hit_set_period: 3600
hit_set_count: 1
target_max_objects
target_max_objects: 0
target_max_bytes: 1000000000
cache_target_dirty_ratio: 0.4
cache_target_full_ratio: 0.8
cache_min_flush_age: 0
cache_min_evict_age: 0
Crush Dump
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 region
type 10 root
# buckets
host ceph-test-hdd {
id -5 # do not change unnecessarily
# weight 2.730
alg straw
hash 0 # rjenkins1
item osd.1 weight 0.910
item osd.2 weight 0.910
item osd.0 weight 0.910
}
root hdd {
id -3 # do not change unnecessarily
# weight 2.730
alg straw
hash 0 # rjenkins1
item ceph-test-hdd weight 2.730 } host ceph-test-ssd {
id -6 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item osd.3 weight 1.000
}
root ssd {
id -4 # do not change unnecessarily
# weight 1.000
alg straw
hash 0 # rjenkins1
item ceph-test-ssd weight 1.000 }
# rules
rule hdd {
ruleset 0
type replicated
min_size 0
max_size 10
step take hdd
step chooseleaf firstn 0 type osd
step emit
}
rule ssd {
ruleset 1
type replicated
min_size 0
max_size 4
step take ssd
step chooseleaf firstn 0 type osd
step emit
}
rule ecpool {
ruleset 2
type erasure
min_size 3
max_size 20
step set_chooseleaf_tries 5
step take hdd
step chooseleaf indep 0 type osd
step emit
}
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
Of David Moreau Simard
Sent: 20 November 2014 20:03
To: Nick Fisk
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: Poor RBD performance as LIO iSCSI target
Nick,
Can you share more datails on the configuration you are using ? I'll
try and duplicate those configurations in my environment and see what
happens.
I'm mostly interested in:
- Erasure code profile (k, m, plugin, ruleset-failure-domain)
- Cache tiering pool configuration (ex: hit_set_type, hit_set_period,
hit_set_count, target_max_objects, target_max_bytes,
cache_target_dirty_ratio, cache_target_full_ratio,
cache_min_flush_age,
cache_min_evict_age)
The crush rulesets would also be helpful.
Thanks,
--
David Moreau Simard
On Nov 20, 2014, at 12:43 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
Hi David,
I've just finished running the 75GB fio test you posted a few days
back on my new test cluster.
The cluster is as follows:-
Single server with 3x hdd and 1 ssd
Ubuntu 14.04 with 3.16.7 kernel
2+1 EC pool on hdds below a 10G ssd cache pool. SSD is also
2+partitioned to
provide journals for hdds.
150G RBD mapped locally
The fio test seemed to run without any problems. I want to run a few
more tests with different settings to see if I can reproduce your
problem. I will let you know if I find anything.
If there is anything you would like me to try, please let me know.
Nick
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
Of David Moreau Simard
Sent: 19 November 2014 10:48
To: Ramakrishna Nishtala (rnishtal)
Cc: ceph-users@xxxxxxxxxxxxxx; Nick Fisk
Subject: Re: Poor RBD performance as LIO iSCSI target
Rama,
Thanks for your reply.
My end goal is to use iSCSI (with LIO/targetcli) to export rbd block
devices.
I was encountering issues with iSCSI which are explained in my
previous emails.
I ended up being able to reproduce the problem at will on various
Kernel and OS combinations, even on raw RBD devices - thus ruling out
the hypothesis that it was a problem with iSCSI but rather with Ceph.
I'm even running 0.88 now and the issue is still there.
I haven't isolated the issue just yet.
My next tests involve disabling the cache tiering.
I do have client krbd cache as well, i'll try to disable it too if
cache tiering isn't enough.
--
David Moreau Simard
On Nov 18, 2014, at 8:10 PM, Ramakrishna Nishtala (rnishtal)
<rnishtal@xxxxxxxxx> wrote:
Hi Dave
Did you say iscsi only? The tracker issue does not say though.
I am on giant, with both client and ceph on RHEL 7 and seems to work
ok,
unless I am missing something here. RBD on baremetal with kmod-rbd
and caching disabled.
[root@compute4 ~]# time fio --name=writefile --size=100G
--filesize=100G --filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1
--sync=0 --randrepeat=0 --rw=write --refill_buffers --end_fsync=1
--iodepth=200 --ioengine=libaio
writefile: (g=0): rw=write, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio,
iodepth=200
fio-2.1.11
Starting 1 process
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/853.0MB/0KB /s] [0/853/0
iops] [eta 00m:00s] ...
Disk stats (read/write):
rbd0: ios=184/204800, merge=0/0, ticks=70/16164931,
in_queue=16164942, util=99.98%
real 1m56.175s
user 0m18.115s
sys 0m10.430s
Regards,
Rama
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
Behalf Of David Moreau Simard
Sent: Tuesday, November 18, 2014 3:49 PM
To: Nick Fisk
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re: Poor RBD performance as LIO iSCSI target
Testing without the cache tiering is the next test I want to do when
I
have time..
When it's hanging, there is no activity at all on the cluster.
Nothing in "ceph -w", nothing in "ceph osd pool stats".
I'll provide an update when I have a chance to test without tiering.
--
David Moreau Simard
On Nov 18, 2014, at 3:28 PM, Nick Fisk <nick@xxxxxxxxxx> wrote:
Hi David,
Have you tried on a normal replicated pool with no cache? I've seen
a number of threads recently where caching is causing various
things to
block/hang.
It would be interesting to see if this still happens without the
caching layer, at least it would rule it out.
Also is there any sign that as the test passes ~50GB that the cache
might start flushing to the backing pool causing slow performance?
I am planning a deployment very similar to yours so I am following
this with great interest. I'm hoping to build a single node test
"cluster" shortly, so I might be in a position to work with you on
this issue and hopefully get it resolved.
Nick
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
Behalf Of David Moreau Simard
Sent: 18 November 2014 19:58
To: Mike Christie
Cc: ceph-users@xxxxxxxxxxxxxx; Christopher Spearman
Subject: Re: Poor RBD performance as LIO iSCSI target
Thanks guys. I looked at http://tracker.ceph.com/issues/8818 and
chatted with "dis" on #ceph-devel.
I ran a LOT of tests on a LOT of comabination of kernels (sometimes
with tunables legacy). I haven't found a magical combination in
which the following test does not hang:
fio --name=writefile --size=100G --filesize=100G
--filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0
--randrepeat=0 --rw=write --refill_buffers --end_fsync=1
--iodepth=200 --ioengine=libaio
Either directly on a mapped rbd device, on a mounted filesystem
(over rbd), exported through iSCSI.. nothing.
I guess that rules out a potential issue with iSCSI overhead.
Now, something I noticed out of pure luck is that I am unable to
reproduce the issue if I drop the size of the test to 50GB. Tests
will complete in under 2 minutes.
75GB will hang right at the end and take more than 10 minutes.
TL;DR of tests:
- 3x fio --name=writefile --size=50G --filesize=50G
--filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0
--randrepeat=0 --rw=write --refill_buffers --end_fsync=1
--iodepth=200 --ioengine=libaio
-- 1m44s, 1m49s, 1m40s
- 3x fio --name=writefile --size=75G --filesize=75G
--filename=/dev/rbd0 --bs=1M --nrfiles=1 --direct=1 --sync=0
--randrepeat=0 --rw=write --refill_buffers --end_fsync=1
--iodepth=200 --ioengine=libaio
-- 10m12s, 10m11s, 10m13s
Details of tests here: http://pastebin.com/raw.php?i=3v9wMtYP
Does that ring you guys a bell ?
--
David Moreau Simard
On Nov 13, 2014, at 3:31 PM, Mike Christie <mchristi@xxxxxxxxxx>
wrote:
On 11/13/2014 10:17 AM, David Moreau Simard wrote:
Running into weird issues here as well in a test environment. I
don't
have a solution either but perhaps we can find some things in common..
Setup in a nutshell:
- Ceph cluster: Ubuntu 14.04, Kernel 3.16.7, Ceph 0.87-1 (OSDs
with separate public/cluster network in 10 Gbps)
- iSCSI Proxy node (targetcli/LIO): Ubuntu 14.04, Kernel 3.16.7,
Ceph
0.87-1 (10 Gbps)
- Client node: Ubuntu 12.04, Kernel 3.11 (10 Gbps)
Relevant cluster config: Writeback cache tiering with NVME PCI-E
cards (2
replica) in front of a erasure coded pool (k=3,m=2) backed by spindles.
I'm following the instructions here:
http://www.hastexo.com/resources/hints-and-kinks/turning-ceph-rbd
- im a ges-san-storage-devices No issues with creating and
mapping a 100GB RBD image and then creating the target.
I'm interested in finding out the overhead/performance impact of
re-exporting through iSCSI so the idea is to run benchmarks.
Here's a fio test I'm trying to run on the client node on the
mounted
iscsi device:
fio --name=writefile --size=100G --filesize=100G
--filename=/dev/sdu --bs=1M --nrfiles=1 --direct=1 --sync=0
--randrepeat=0 --rw=write --refill_buffers --end_fsync=1
--iodepth=200 --ioengine=libaio
The benchmark will eventually hang towards the end of the test
for some
long seconds before completing.
On the proxy node, the kernel complains with iscsi portal login
timeout: http://pastebin.com/Q49UnTPr and I also see irqbalance
errors in syslog: http://pastebin.com/AiRTWDwR
You are hitting a different issue. German Anders is most likely
correct and you hit the rbd hang. That then caused the iscsi/scsi
command to timeout which caused the scsi error handler to run. In
your logs we see the LIO error handler has received a task abort
from the initiator and that timed out which caused the escalation
(iscsi portal login related messages).
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com