Hi all. Last week I've investigate the status for hadoop on ceph. I create some patches to remove some bugs and crashes. Looks like it works. Even hbase works on top. For reference all sources and patches are here https://github.com/octo47/hadoop-common/tree/branch-1.0-ceph https://github.com/octo47/ceph/tree/v0.40-hadoop After YCSB and TestDSFIO work without crashes i start investigate performance. I have 5node cluster with 4 sata disks. btrfs. 24core on each. raid. iozone shows up to 520MB/s. Performance differs in 2-3 times. After some tests i see strange thing. hadoop uses disk very close to iozone: small amount and iops and high throughtput (same as iozone). ceph uses very inefficient: huge amount of iops, up to 3 times less throughtput (i think because of high amount of iops). hadoop dstat output: sda--sdb--sdc--sdd- ----total-cpu-usage---- -dsk/total- --io/total- util:util:util:util|usr sys idl wai hiq siq| read writ| read writ 100: 100: 100: 100| 1 5 83 11 0 0| 0 529M| 0 247 100: 100: 100: 100| 1 0 83 16 0 0| 0 542M| 0 168 100: 100: 100: 100| 1 0 81 18 0 0| 28k 518M|6.00 149 100: 100: 100: 100| 1 4 77 17 0 0| 0 533M| 0 243 100: 100: 100: 100| 1 3 83 13 0 0| 0 523M| 0 264 ceph dstat output: =================================================== sda--sdb--sdc--sdd- ----total-cpu-usage---- -dsk/total- --io/total- util:util:util:util|usr sys idl wai hiq siq| read writ| read writ 68.0:70.0:79.0:76.0| 1 2 93 4 0 0| 0 195M| 0 1723 86.0:85.0:93.0:91.0| 1 2 91 5 0 0| 0 226M| 0 1816 85.0:85.0:85.0:84.0| 1 3 92 4 0 0| 0 235M| 0 2316 So, my question is: can someone point me: a) can it be because of inefficient buffer size on osd part (i tried to increase CephOutputStream buffer to 256kb, not helps) b) what other problems can be and what options can i tune to find out what is going on. PS: i can't use iozone on kernel mounted fs. something hang in kernel, only reboot helps. in /var/log/messages i see attached kern.log. -- Andrey.
Jan 16 13:34:56 ceph06g kernel: [1496687.289893] licp:mn 4.0.6.7:79ssinetbihd<3>[1498294.006046] INFO: rcu_sched_state detected stall on CPU 1 (t=6000 jiffies) Jan 16 13:36:49 ceph06g kernel: [1498406.412898] btrfs-transacti D ffff88032a4c4890 0 1958 2 0x00000000 Jan 16 13:36:49 ceph06g kernel: [1498406.412902] ffff880323f63b70 0000000000000046 ffff880323f63b20 ffffffff00000000 Jan 16 13:36:49 ceph06g kernel: [1498406.412906 0000001a0ff802f3d ff83360000000024 Jan 16 13:36:49 ceph06g 4[480.199 ff802f3d 0000024 ff803dc00ff883ac40<>1946421]Cl rc:<>1946422] <ffff1930]?_wi_bfe+x0030 Jan 16 13:36:49 ceph06g 4[480.199 [ffff8548> u_fln_ato_i+x809 Jan 16 13:36:49 ceph06g 4[480.14] <ffff1870]?auoeoewk_ucin04/x0<>19464196 [ffff81da> _ato_ufr0203 Jan 16 13:36:49 ceph06g 4[1946426] [fffffa198> rt_e_ues02c030[rs Jan 16 13:36:49 ceph06g 4[480.195 [ffffa196> rt_l_ues021020[tf] Jan 16 13:36:49 ceph06g 4[480.102 [ffff8049> aeu_i+x004 Jan 16 13:36:49 ceph06g 4[4840.101 [ffffa184> rnato_tra+x8/xa brs Jan 16 13:36:49 ceph06g 4[480.101 [ffffa18c> tf_i_q_n_o09/x0[tf]<>1946433] <ffff1826]kha+x60a Jan 16 13:36:49 ceph06g 4[480.107 [ffff85f6> enltra_epr0401 Jan 16 13:36:49 ceph06g 4[480.100 [ffff804a> tra_okrf+x9/x9 Jan 16 13:36:49 ceph06g 4[480.103 [ffff85f6> scag+x301 Jan 16 13:38:49 ceph06g kernel: [1498526.228090] btrfs-transacti D ffff88032a4c4890 0 1958 2 0x00000000 Jan 16 13:38:49 ceph06g kernel: [1498526.228094] ffff880323f63b70 0000000000000046 ffff880323f63b20 ffffffff00000000 Jan 16 13:38:49 ceph06g kernel: [1498526.228098] 0000000000012a40 ffff880323f63fd8 ffff880323f62010 0000000000012a40 Jan 16 13:38:49 ceph06g kernel: [1498526.228101] ffff880323f63fd8 0000000000012a40 ffff88032d1c0000 ffff88032a4c44d0 Jan 16 13:38:49 ceph06g kernel: [1498526.228104] Call Trace: Jan 16 13:38:49 ceph06g kernel: [1498526.228114] [<ffffffff8119d3b0>] ? __wait_on_buffer+0x30/0x30 Jan 16 13:38:49 ceph06g kernel: [1498526.228120] [<ffffffff815e464c>] io_schedule+0x8c/0xd0 Jan 16 13:38:49 ceph06g kernel: [1498526.228122] [<ffffffff8119d3be>] sleep_on_buffer+0xe/0x20 Jan 16 13:38:49 ceph06g kernel: [1498526.228125] [<ffffffff815e4ddf>] __wait_on_bit+0x5f/0x90 Jan 16 13:38:49 ceph06g kernel: [1498526.228128] [<ffffffff8119d3b0>] ? __wait_on_buffer+0x30/0x30 Jan 16 13:38:49 ceph06g kernel: [1498526.228130] [<ffffffff815e4e88>] out_of_line_wait_on_bit+0x78/0x90 Jan 16 13:38:49 ceph06g kernel: [1498526.228135] [<ffffffff810847d0>] ? autoremove_wake_function+0x40/0x40 Jan 16 13:38:49 ceph06g kernel: [1498526.228138] [<ffffffff8119d3ae>] __wait_on_buffer+0x2e/0x30 Jan 16 13:38:49 ceph06g kernel: [1498526.228158] [<ffffffffa01d918c>] write_dev_supers+0x2ac/0x340 [btrfs] Jan 16 13:38:49 ceph06g kernel: [1498526.228168] [<ffffffffa01d9461>] write_all_supers+0x241/0x2d0 [btrfs] Jan 16 13:38:49 ceph06g kernel: [1498526.228178] [<ffffffffa01d9503>] write_ctree_super+0x13/0x20 [btrfs] Jan 16 13:38:49 ceph06g kernel: [1498526.228188] [<ffffffffa01df58> tf_omttascin063080[tf]<>1956289] <ffff1e5e]?_a_pnlc+x/x0<>1956280] <ffff0dc9]?jitascin06/x7 brs Jan 16 13:38:49 ceph06g 4[1956280] [ffff8049> ae_upbt04/x0<>1956281] <ffff0d83]tascinkhed023020[tf]<>1956282] <ffffa0d50]?brsbow_n_ox009 brs Jan 16 13:40:49 ceph06g 4[482.221 [fffff1ea4]kre_hedhle+x/x0<>1956283] <ffff1810]?khedwre_n010010<>14956283] <ffff1ea0]?g_hne01/x3<3>[1498646.043163] INFO: task btrfs-transacti:1958 blocked for more than 120 seconds. Jan 16 13:40:49 ceph06g kernel: [1498646.043330] btrfs-transacti D ffff88032a4c4890 0 1958 2 0x00000000 Jan 16 13:40:49 ceph06g kernel: [1498646.043334] ffff880323f63b70 0000000000000046 ffff880323f63b20 ffffffff00000000 Jan 16 13:40:49 ceph06g kernel: [1498646.043338] 0000000000012a40 fff88336f8ff802f21 000001a0<>1966034] ff802f3d 000001a0ff802100ff8024c4d Jan 16 13:40:49 ceph06g 4[484.434 alTae Jan 16 13:40:49 ceph06g 4[484.433 [ffff81db> _ato_ufr03/x0<>1966035] [<ffff1e6c]i_ceue08/x0<>1966036] <ffff193b> le_nbfe+x/x0<>1966036] [<ffff1edf]_wi_nbt05/x0<>1966037 [ffff81db> _ato_ufr03/x0<>19660437] <ffff1ee8]oto_iewi_nbt07/x0<>1966037] <ffff1870]?atrmv_aefnton+x004 Jan 16 13:40:49 ceph06g 4[484.437 [ffff81da> _ato_ufr02/x0<>1966039] <ffff0d1c]wiede_ues02c030[tf]<>1966030] <ffff0d41]wiealspr+x4/xd [tf]<>1966031] [ffffa190> rt_te_ue+x302 brs Jan 16 13:40:49 ceph06g 4[484.446 [ffffa1f8> tf_omttascin063/080[tf]<>1966033] <ffff1e5e]?_a_pnlc+x/x0<>1966033] <ffff0dc9]?jin_rnato+x9020[tf]<>1966034] <ffff1870]?wk_pbt04/x0<>1966035] <ffffa0d83]tasto_tra+x8/xa brs Jan 16 13:40:57 ceph06g 4[484.442 [ffffa18c> tf_i_qedi+x009 brf]<>1966036] <ffff1826]khed09/x0<>1966036] <ffff1ea4]kre_hedhle+x/x0<>1966037] <ffff1810]?khedwre_n010010<>1966037] <ffff1ea0]?g_hne01/x3<3>[1498654.050695] INFO: rcu_sched_state detected stall on CPU 1 (t