Slow ceph io. High iops. Compared to hadoop.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all.

Last week I've investigate the status for hadoop on ceph.
I create some patches to remove some bugs and crashes.
Looks like it works. Even hbase works on top.

For reference all sources and patches are here

https://github.com/octo47/hadoop-common/tree/branch-1.0-ceph
https://github.com/octo47/ceph/tree/v0.40-hadoop

After YCSB and TestDSFIO work without crashes i start investigate
performance.

I have 5node cluster with 4 sata disks. btrfs. 24core on each.
raid. iozone shows up to 520MB/s.

Performance differs in 2-3 times. After some tests i see strange thing.
hadoop uses disk very close to iozone: small amount and iops and high
throughtput (same as iozone).
ceph uses very inefficient: huge amount of iops, up to 3 times less
throughtput (i think because of high amount of iops).

hadoop dstat output:
sda--sdb--sdc--sdd- ----total-cpu-usage---- -dsk/total- --io/total-
util:util:util:util|usr sys idl wai hiq siq| read  writ| read  writ
 100: 100: 100: 100|  1   5  83  11   0   0|   0   529M|   0   247
 100: 100: 100: 100|  1   0  83  16   0   0|   0   542M|   0   168
 100: 100: 100: 100|  1   0  81  18   0   0|  28k  518M|6.00   149
 100: 100: 100: 100|  1   4  77  17   0   0|   0   533M|   0   243
 100: 100: 100: 100|  1   3  83  13   0   0|   0   523M|   0   264

ceph dstat output:
===================================================
sda--sdb--sdc--sdd- ----total-cpu-usage---- -dsk/total- --io/total-
util:util:util:util|usr sys idl wai hiq siq| read  writ| read  writ
68.0:70.0:79.0:76.0|  1   2  93   4   0   0|   0   195M|   0  1723
86.0:85.0:93.0:91.0|  1   2  91   5   0   0|   0   226M|   0  1816
85.0:85.0:85.0:84.0|  1   3  92   4   0   0|   0   235M|   0  2316


So, my question is: can someone point me:
a) can it be because of inefficient buffer size on osd part
(i tried to increase CephOutputStream buffer to 256kb, not helps)
b) what other problems can be and what options can i tune
to find out what is going on.

PS: i can't use iozone on kernel mounted fs. something
hang in kernel, only reboot helps.
in /var/log/messages i see attached kern.log.



-- 
Andrey.
Jan 16 13:34:56 ceph06g kernel: [1496687.289893] licp:mn 4.0.6.7:79ssinetbihd<3>[1498294.006046] INFO: rcu_sched_state detected stall on CPU 1 (t=6000 jiffies)
Jan 16 13:36:49 ceph06g kernel: [1498406.412898] btrfs-transacti D ffff88032a4c4890     0  1958      2 0x00000000
Jan 16 13:36:49 ceph06g kernel: [1498406.412902]  ffff880323f63b70 0000000000000046 ffff880323f63b20 ffffffff00000000
Jan 16 13:36:49 ceph06g kernel: [1498406.412906  0000001a0ff802f3d ff83360000000024
Jan 16 13:36:49 ceph06g 4[480.199 ff802f3d 0000024 ff803dc00ff883ac40<>1946421]Cl rc:<>1946422] <ffff1930]?_wi_bfe+x0030
Jan 16 13:36:49 ceph06g 4[480.199 [ffff8548> u_fln_ato_i+x809
Jan 16 13:36:49 ceph06g 4[480.14] <ffff1870]?auoeoewk_ucin04/x0<>19464196 [ffff81da> _ato_ufr0203
Jan 16 13:36:49 ceph06g 4[1946426] [fffffa198> rt_e_ues02c030[rs
Jan 16 13:36:49 ceph06g 4[480.195 [ffffa196> rt_l_ues021020[tf]
Jan 16 13:36:49 ceph06g 4[480.102 [ffff8049>  aeu_i+x004
Jan 16 13:36:49 ceph06g 4[4840.101 [ffffa184> rnato_tra+x8/xa brs
Jan 16 13:36:49 ceph06g 4[480.101 [ffffa18c>  tf_i_q_n_o09/x0[tf]<>1946433] <ffff1826]kha+x60a
Jan 16 13:36:49 ceph06g 4[480.107 [ffff85f6> enltra_epr0401
Jan 16 13:36:49 ceph06g 4[480.100 [ffff804a>  tra_okrf+x9/x9
Jan 16 13:36:49 ceph06g 4[480.103 [ffff85f6>  scag+x301
Jan 16 13:38:49 ceph06g kernel: [1498526.228090] btrfs-transacti D ffff88032a4c4890     0  1958      2 0x00000000
Jan 16 13:38:49 ceph06g kernel: [1498526.228094]  ffff880323f63b70 0000000000000046 ffff880323f63b20 ffffffff00000000
Jan 16 13:38:49 ceph06g kernel: [1498526.228098]  0000000000012a40 ffff880323f63fd8 ffff880323f62010 0000000000012a40
Jan 16 13:38:49 ceph06g kernel: [1498526.228101]  ffff880323f63fd8 0000000000012a40 ffff88032d1c0000 ffff88032a4c44d0
Jan 16 13:38:49 ceph06g kernel: [1498526.228104] Call Trace:
Jan 16 13:38:49 ceph06g kernel: [1498526.228114]  [<ffffffff8119d3b0>] ? __wait_on_buffer+0x30/0x30
Jan 16 13:38:49 ceph06g kernel: [1498526.228120]  [<ffffffff815e464c>] io_schedule+0x8c/0xd0
Jan 16 13:38:49 ceph06g kernel: [1498526.228122]  [<ffffffff8119d3be>] sleep_on_buffer+0xe/0x20
Jan 16 13:38:49 ceph06g kernel: [1498526.228125]  [<ffffffff815e4ddf>] __wait_on_bit+0x5f/0x90
Jan 16 13:38:49 ceph06g kernel: [1498526.228128]  [<ffffffff8119d3b0>] ? __wait_on_buffer+0x30/0x30
Jan 16 13:38:49 ceph06g kernel: [1498526.228130]  [<ffffffff815e4e88>] out_of_line_wait_on_bit+0x78/0x90
Jan 16 13:38:49 ceph06g kernel: [1498526.228135]  [<ffffffff810847d0>] ? autoremove_wake_function+0x40/0x40
Jan 16 13:38:49 ceph06g kernel: [1498526.228138]  [<ffffffff8119d3ae>] __wait_on_buffer+0x2e/0x30
Jan 16 13:38:49 ceph06g kernel: [1498526.228158]  [<ffffffffa01d918c>] write_dev_supers+0x2ac/0x340 [btrfs]
Jan 16 13:38:49 ceph06g kernel: [1498526.228168]  [<ffffffffa01d9461>] write_all_supers+0x241/0x2d0 [btrfs]
Jan 16 13:38:49 ceph06g kernel: [1498526.228178]  [<ffffffffa01d9503>] write_ctree_super+0x13/0x20 [btrfs]
Jan 16 13:38:49 ceph06g kernel: [1498526.228188]  [<ffffffffa01df58> tf_omttascin063080[tf]<>1956289] <ffff1e5e]?_a_pnlc+x/x0<>1956280] <ffff0dc9]?jitascin06/x7 brs
Jan 16 13:38:49 ceph06g 4[1956280] [ffff8049>  ae_upbt04/x0<>1956281] <ffff0d83]tascinkhed023020[tf]<>1956282] <ffffa0d50]?brsbow_n_ox009 brs
Jan 16 13:40:49 ceph06g 4[482.221  [fffff1ea4]kre_hedhle+x/x0<>1956283] <ffff1810]?khedwre_n010010<>14956283] <ffff1ea0]?g_hne01/x3<3>[1498646.043163] INFO: task btrfs-transacti:1958 blocked for more than 120 seconds.
Jan 16 13:40:49 ceph06g kernel: [1498646.043330] btrfs-transacti D ffff88032a4c4890     0  1958      2 0x00000000
Jan 16 13:40:49 ceph06g kernel: [1498646.043334]  ffff880323f63b70 0000000000000046 ffff880323f63b20 ffffffff00000000
Jan 16 13:40:49 ceph06g kernel: [1498646.043338]  0000000000012a40 fff88336f8ff802f21 000001a0<>1966034] ff802f3d 000001a0ff802100ff8024c4d
Jan 16 13:40:49 ceph06g 4[484.434 alTae
Jan 16 13:40:49 ceph06g 4[484.433 [ffff81db>  _ato_ufr03/x0<>1966035] [<ffff1e6c]i_ceue08/x0<>1966036] <ffff193b> le_nbfe+x/x0<>1966036] [<ffff1edf]_wi_nbt05/x0<>1966037 [ffff81db>  _ato_ufr03/x0<>19660437] <ffff1ee8]oto_iewi_nbt07/x0<>1966037] <ffff1870]?atrmv_aefnton+x004
Jan 16 13:40:49 ceph06g 4[484.437 [ffff81da> _ato_ufr02/x0<>1966039] <ffff0d1c]wiede_ues02c030[tf]<>1966030] <ffff0d41]wiealspr+x4/xd [tf]<>1966031] [ffffa190> rt_te_ue+x302 brs
Jan 16 13:40:49 ceph06g 4[484.446 [ffffa1f8> tf_omttascin063/080[tf]<>1966033] <ffff1e5e]?_a_pnlc+x/x0<>1966033] <ffff0dc9]?jin_rnato+x9020[tf]<>1966034] <ffff1870]?wk_pbt04/x0<>1966035] <ffffa0d83]tasto_tra+x8/xa brs
Jan 16 13:40:57 ceph06g 4[484.442 [ffffa18c>  tf_i_qedi+x009 brf]<>1966036] <ffff1826]khed09/x0<>1966036] <ffff1ea4]kre_hedhle+x/x0<>1966037] <ffff1810]?khedwre_n010010<>1966037] <ffff1ea0]?g_hne01/x3<3>[1498654.050695] INFO: rcu_sched_state detected stall on CPU 1 (t

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux