Well, as you've noted you're getting some slow requests on the OSDs when they turn back on; and then the iSCSI gateway is panicking (probably because the block device write request is just hanging). We've gotten prior reports that iSCSI is a lot more sensitive to a few slow requests than most use cases, and OSDs coming back in can cause some slow requests, but if it's a common case for you then there's probably something that can be done to optimize that recovery. Have you checked into what's blocking the slow operations or why the PGs are taking so long to get ready? -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Nov 5, 2013 at 1:33 AM, James Wilkins <James.Wilkins@xxxxxxxxxxxxx> wrote: > Hello, > > Wondering if anyone else has come over an issue we're having with our POC CEPH Cluster at the moment. > > Some details about its setup; > > 6 x Dell R720 (20 x 1TB Drives, 4 xSSD CacheCade), 4 x 10GB Nics > 4 x Generic white label server (24 x 2 4TB Disk Raid-0 ), 4 x 10GB Nics > 3 x Dell R620 - Acting as ISCSI Heads (targetcli / Linux kernel ISCSI) - 4 x 10GB Nics. An RBD device is mounted and exported via targetcli, this is then mounted on a client device to push backup data. > > All machines are running Ubuntu 12.04.3 LTS and ceph 0.67.4 > > Machines are split over two racks (distinct layer 2 domains) using a leaf/spine model and we use ECMP/quagga on the ISCSI heads to reach the CEPH Cluster. > > Crush map has racks defined to spread data over 2 racks - I've attached the ceph.conf > > The cluster performs great normally, and we only have issues when simulating rack failure. > > The issue comes when the following steps are taken > > o) Initiate load against the cluster (backups going via ISCSI) > o) ceph osd set noout > o) Reboot 2 x Generic Servers / 3 x Dell Servers (basically all the nodes in 1 Rack) > o) Cluster goes degraded, as expected > > cluster 55dcf929-fca5-49fe-99d0-324a19afd5b4 > health HEALTH_WARN 7056 pgs degraded; 282 pgs stale; 2842 pgs stuck unclean; recovery 1286582/2700870 degraded (47.636%); 108/216 in osds are down; noout flag(s) set > monmap e3: 5 mons at {fh-ceph01-mon-01=172.17.12.224:6789/0,fh-ceph01-mon-02=172.17.12.225:6789/0,fh-ceph01-mon-03=172.17.11.224:6789/0,fh-ceph01-mon-04=172.17.11.225:6789/0,fh-ceph01-mon-05=172.17.12.226:6789/0}, election epoch 74, quorum 0,1,2,3,4 fh-ceph01-mon-01,fh-ceph01-mon-02,fh-ceph01-mon-03,fh-ceph01-mon-04,fh-ceph01-mon-05 > osdmap e4237: 216 osds: 108 up, 216 in > pgmap v117686: 7328 pgs: 266 active+clean, 6 stale+active+clean, 6780 active+degraded, 276 stale+active+degraded; 3511 GB data, 10546 GB used, 794 TB / 805 TB avail; 1286582/2700870 degraded (47.636%) > mdsmap e1: 0/0/1 up > > > 2013-11-05 08:51:44.830393 mon.0 [INF] pgmap v117685: 7328 pgs: 1489 active+clean, 1289 stale+active+clean, 3215 active+degraded, 1335 stale+active+degraded; 3511 GB data, 10546 GB used, 794 TB / 805 TB avail; 1048742/2700870 degraded (38.830%); recovering 7 o/s, 28969KB/s > > o) As OSDS start returning > > 2013-11-05 08:52:42.019295 mon.0 [INF] osd.165 172.17.11.9:6864/6074 boot > 2013-11-05 08:52:42.023055 mon.0 [INF] osd.154 172.17.11.9:6828/5943 boot > 2013-11-05 08:52:42.024226 mon.0 [INF] osd.159 172.17.11.9:6816/5820 boot > 2013-11-05 08:52:42.031996 mon.0 [INF] osd.161 172.17.11.9:6856/6059 boot > > o) We then see some slow requests; > > 2013-11-05 08:53:11.677044 osd.153 [WRN] 6 slow requests, 6 included below; oldest blocked for > 30.409992 secs > 2013-11-05 08:53:11.677052 osd.153 [WRN] slow request 30.409992 seconds old, received at 2013-11-05 08:52:41.266994: osd_op(client.16010.1:13441679 rb.0.21ec.238e1f29.00000012fa28 [write 2854912~4096] 3.516ef071 RETRY=-1 e4240) currently reached pg > 2013-11-05 08:53:11.677056 osd.153 [WRN] slow request 30.423024 seconds old, received at 2013-11-05 08:52:41.253962: osd_op(client.15755.1:13437999 rb.0.21ec.238e1f29.00000012fa28 [write 0~233472] 3.516ef071 RETRY=1 e4240) v4 currently reached pg > > o) A few minutes , the ISCSI heads start panicking > > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664305] ------------[ cut here ]------------ > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664313] WARNING: at /build/buildd/linux-lts-raring-3.8.0/kernel/watchdog.c:246 wat > chdog_overflow_callback+0x9a/0xc0() > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664315] Hardware name: PowerEdge R620 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664317] Watchdog detected hard LOCKUP on cpu 6 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664318] Modules linked in: ib_srpt(F) tcm_qla2xxx(F) tcm_loop(F) tcm_fc(F) iscsi_t > arget_mod(F) target_core_pscsi(F) target_core_file(F) target_core_iblock(F) target_core_mod(F) rbd(F) libceph(F) ipmi_devintf(F) ipm > i_si(F) ipmi_msghandler(F) qla2xxx(F) libfc(F) scsi_transport_fc(F) scsi_tgt(F) configfs(F) dell_rbu(F) ib_iser(F) rdma_cm(F) ib_cm( > F) iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) ext2(F) iscsi_tcp(F) libiscsi_tcp(F) libiscsi(F) scsi_transport_iscsi(F) corete > mp(F) kvm_intel(F) kvm(F) ghash_clmulni_intel(F) aesni_intel(F) ablk_helper(F) cryptd(F) lrw(F) aes_x86_64(F) xts(F) gf128mul(F) gpi > o_ich(F) dcdbas(F) microcode(F) joydev(F) shpchp(F) sb_edac(F) wmi(F) edac_core(F) acpi_power_meter(F) mei(F) lpc_ich(F) mac_hid(F) > 8021q(F) garp(F) stp(F) llc(F) lp(F) parport(F) hid_generic(F) usbhid(F) hid(F) ahci(F) libahci(F) ixgbe(F) dca(F) megaraid_sas(F) m > dio(F) tg3(F) ptp(F) pps_core(F) btrfs(F) zlib_deflate(F) libcrc32c(F) [last unloaded: target_core_mod] > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664387] Pid: 460, comm: kworker/u:5 Tainted: GF W 3.8.0-31-generic #46~pr > ecise1-Ubuntu > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664389] Call Trace: > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664391] <NMI> [<ffffffff81059b2f>] warn_slowpath_common+0x7f/0xc0 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664402] [<ffffffff81059c26>] warn_slowpath_fmt+0x46/0x50 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664406] [<ffffffff810ed6fa>] watchdog_overflow_callback+0x9a/0xc0 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664410] [<ffffffff8112c75c>] __perf_event_overflow+0x9c/0x220 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664415] [<ffffffff81025aca>] ? x86_perf_event_set_period+0xda/0x150 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664418] [<ffffffff8112cf54>] perf_event_overflow+0x14/0x20 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664423] [<ffffffff8102b8db>] intel_pmu_handle_irq+0x19b/0x250 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664429] [<ffffffff816f5c71>] perf_event_nmi_handler+0x21/0x30 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664432] [<ffffffff816f53a9>] nmi_handle.isra.0+0x59/0x80 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664435] [<ffffffff816f54b8>] default_do_nmi+0x58/0x240 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664438] [<ffffffff816f5730>] do_nmi+0x90/0xd0 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664442] [<ffffffff816f49c1>] end_repeat_nmi+0x1e/0x2e > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664447] [<ffffffff81045c42>] ? __ticket_spin_lock+0x22/0x30 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664450] [<ffffffff81045c42>] ? __ticket_spin_lock+0x22/0x30 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664453] [<ffffffff81045c42>] ? __ticket_spin_lock+0x22/0x30 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664454] <<EOE>> [<ffffffff81045cd9>] default_spin_lock_flags+0x9/0x10 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664460] [<ffffffff816f400e>] _raw_spin_lock_irqsave+0x2e/0x40 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664466] [<ffffffff8108a442>] __wake_up+0x32/0x70 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664477] [<ffffffffa03687d2>] iscsit_add_cmd_to_response_queue+0xb2/0xe0 [iscsi_ta > rget_mod] > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664483] [<ffffffffa0374e28>] lio_queue_tm_rsp+0x28/0x30 [iscsi_target_mod] > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664492] [<ffffffffa03a79ee>] target_tmr_work+0x6e/0xf0 [target_core_mod] > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664496] [<ffffffff81078d01>] process_one_work+0x141/0x490 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664500] [<ffffffff81079cc8>] worker_thread+0x168/0x400 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664503] [<ffffffff81079b60>] ? manage_workers+0x120/0x120 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664506] [<ffffffff8107f1d0>] kthread+0xc0/0xd0 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664510] [<ffffffff8107f110>] ? flush_kthread_worker+0xb0/0xb0 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664514] [<ffffffff816fcb2c>] ret_from_fork+0x7c/0xb0 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664517] [<ffffffff8107f110>] ? flush_kthread_worker+0xb0/0xb0 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69081.664519] ---[ end trace 9b6438fe7cb20627 ]--- > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602470] INFO: rcu_sched detected stalls on CPUs/tasks: { 6} (detected by 21, t=150 > 02 jiffies, g=18619, c=18618, q=7934) > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602570] sending NMI to all CPUs: > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602574] NMI backtrace for cpu 0 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602577] CPU 0 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602581] Pid: 0, comm: swapper/0 Tainted: GF W 3.8.0-31-generic #46~precis > e1-Ubuntu Dell Inc. PowerEdge R620/0PXXHP > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602583] RIP: 0010:[<ffffffff8158b2e2>] [<ffffffff8158b2e2>] poll_idle+0x42/0x90 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602590] RSP: 0018:ffffffff81c01db8 EFLAGS: 00000246 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602592] RAX: 0000000400000000 RBX: 0000000000000000 RCX: 0000000000000018 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602594] RDX: ffffffff81c01fd8 RSI: ffffffff81c7a380 RDI: ffffffff81c1af00 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602596] RBP: ffffffff81c01df8 R08: 0000000000000020 R09: 0000000000000000 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602598] R10: 0000000000002000 R11: 0000000000000000 R12: ffff880fffc1b370 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602599] R13: 00003ef672bf9148 R14: ffffffff81c7a380 R15: 0000000000000000 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602602] FS: 0000000000000000(0000) GS:ffff880fffc00000(0000) knlGS:00000000000000 > 00 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602604] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602605] CR2: 00007fffd8f0f000 CR3: 0000000fda277000 CR4: 00000000000407f0 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602607] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602609] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602611] Process swapper/0 (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c15 > 440) > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602613] Stack: > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602614] 0000000000000000 0000000006e86f73 ffffffff81c01dd8 ffffffff8101ba59 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602620] ffffffff81c01e08 ffff880fffc1b370 00003ef672bf9129 ffffffff8158ae40 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602625] ffffffff81c01e08 ffffffff8158ae59 ffffffff81c01e68 ffffffff8158b8d1 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602629] Call Trace: > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602634] [<ffffffff8101ba59>] ? read_tsc+0x9/0x20 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602638] [<ffffffff8158ae40>] ? centrino_target+0x360/0x360 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602641] [<ffffffff8158ae59>] cpuidle_enter+0x19/0x20 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602645] [<ffffffff8158b8d1>] cpuidle_wrap_enter+0x41/0xa0 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602648] [<ffffffff8158cd1e>] ? menu_select+0x16e/0x2b0 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602651] [<ffffffff8158b940>] cpuidle_enter_tk+0x10/0x20 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602654] [<ffffffff8158b4ff>] cpuidle_idle_call+0xaf/0x2b0 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602658] [<ffffffff8101da8f>] cpu_idle+0xcf/0x120 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602664] [<ffffffff816c7082>] rest_init+0x72/0x80 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602669] [<ffffffff81d05c4f>] start_kernel+0x3d1/0x3de > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602673] [<ffffffff81d057ff>] ? pass_bootoption.constprop.2+0xd3/0xd3 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602676] [<ffffffff81d05397>] x86_64_start_reservations+0x131/0x135 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602680] [<ffffffff81d05120>] ? early_idt_handlers+0x120/0x120 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602683] [<ffffffff81d05468>] x86_64_start_kernel+0xcd/0xdc > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602685] Code: e8 84 36 b2 ff 49 89 c5 fb 66 66 90 66 66 90 65 48 8b 14 25 30 c7 00 > 00 48 8b 82 38 e0 ff ff a8 08 75 13 66 0f 1f 44 00 00 f3 90 <48> 8b 82 38 e0 ff ff a8 08 74 f3 e8 4e 36 b2 ff 4c 29 e8 48 89 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602718] NMI backtrace for cpu 1 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602721] CPU 1 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602725] Pid: 0, comm: swapper/1 Tainted: GF W 3.8.0-31-generic #46~precis > e1-Ubuntu Dell Inc. PowerEdge R620/0PXXHP > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602728] RIP: 0010:[<ffffffff8158b2e9>] [<ffffffff8158b2e9>] poll_idle+0x49/0x90 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602735] RSP: 0018:ffff880fe9545de8 EFLAGS: 00000246 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602737] RAX: 0000000400000000 RBX: 0000000000000000 RCX: 0000000000000018 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602739] RDX: ffff880fe9545fd8 RSI: ffffffff81c7a380 RDI: ffffffff81c1af00 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602741] RBP: ffff880fe9545e28 R08: 0000000000000020 R09: 0000000000000000 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602743] R10: 0000000000002000 R11: 0000000000000000 R12: ffff88203fc1b370 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602745] R13: 00003ef672bf7e9f R14: ffffffff81c7a380 R15: 0000000000000000 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602747] FS: 0000000000000000(0000) GS:ffff88203fc00000(0000) knlGS:00000000000000 > 00 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602749] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602751] CR2: 00007f1dc4053038 CR3: 0000001fd97aa000 CR4: 00000000000407e0 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602753] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602755] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Nov 5 08:56:06 fh-ceph01-iscsi-01 kernel: [69125.602757] Process swapper/1 (pid: 0, threadinfo ffff880fe9544000, task ffff880fe9549 > 740) > > I'm in the process of building a separate iscsi head on different hardware, just incase it's something related to the M620's but wanted to see if anyone had any ideas? Any input greatly appreciated. I can pretty much replicate this on demand at the moment by enacting the above steps. > > Cheers, > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com