It looks like deep scrub cause the disk busy and some threads blocking on this. Maybe you could lower the scrub related configurations and see the disk util when deep-scrubing. On Sat, Apr 11, 2015 at 3:01 AM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote: > Hi guys, > > I was wondering if anyone noticed that the deep scrubbing process causes > some osd to go down? > > I have been keeping an eye on a few remaining stability issues in my test > cluster. One of the unsolved issues is the occasional reporting of osd(s) > going down and coming back up after about 20-30 seconds. This happens to > various osds throughout the cluster. I have a small cluster of just 2 osd > servers with 9 osds each. > > The common trend that i see week after week is that whenever there is a long > deep scrubbing activity on the cluster it triggers one or more osds to go > down for a short period of time. After the osd is marked down, it goes back > up after about 20 seconds. Obviously there is a repair process that kicks in > which causes more load on the cluster. While looking at the logs, i've not > seen the osds being marked down when the cluster is not deep scrubbing. It > _always_ happens when there is a deep scrub activity. I am seeing the > reports of osds going down about 3-4 times a week. > > The latest happened just recently with the following log entries: > > 2015-04-10 19:32:48.330430 mon.0 192.168.168.13:6789/0 3441533 : cluster > [INF] pgmap v50849466: 8508 pgs: 8506 active+clean, 2 > active+clean+scrubbing+deep; 13213 GB data, 26896 GB used, 23310 GB / 50206 > GB avail; 1005 B/s rd, 1005 > B/s wr, 0 op/s > 2015-04-10 19:32:52.950633 mon.0 192.168.168.13:6789/0 3441542 : cluster > [INF] osd.6 192.168.168.200:6816/3738 failed (5 reports from 5 peers after > 60.747890 >= grace 46.701350) > 2015-04-10 19:32:53.121904 mon.0 192.168.168.13:6789/0 3441544 : cluster > [INF] osdmap e74309: 18 osds: 17 up, 18 in > 2015-04-10 19:32:53.231730 mon.0 192.168.168.13:6789/0 3441545 : cluster > [INF] pgmap v50849467: 8508 pgs: 599 stale+active+clean, 7907 active+clean, > 1 stale+active+clean+scrubbing+deep, 1 active+clean+scrubbing+deep; 13213 GB > data, 26896 GB used, 23310 GB / 50206 GB avail; 375 B/s rd, 0 op/s > > > osd.6 logs around the same time are: > > 2015-04-10 19:16:29.110617 7fad6d5ec700 0 log_channel(default) log [INF] : > 5.3d7 deep-scrub ok > 2015-04-10 19:27:47.561389 7fad6bde9700 0 log_channel(default) log [INF] : > 5.276 deep-scrub ok > 2015-04-10 19:31:11.611321 7fad6d5ec700 0 log_channel(default) log [INF] : > 5.287 deep-scrub ok > 2015-04-10 19:31:53.339881 7fad7ce0b700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7fad735f8700' had timed out after 15 > 2015-04-10 19:31:53.339887 7fad7ce0b700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7fad745fa700' had timed out after 15 > 2015-04-10 19:31:53.339890 7fad7ce0b700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7fad705f2700' had timed out after 15 > 2015-04-10 19:31:53.340050 7fad7e60e700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7fad735f8700' had timed out after 15 > 2015-04-10 19:31:53.340053 7fad7e60e700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7fad745fa700' had timed out after 15 > > [.....] > > 2015-04-10 19:32:53.010609 7fad7e60e700 1 heartbeat_map is_healthy > 'FileStore::op_tp thread 0x7fad86132700' had timed out after 60 > 2015-04-10 19:32:53.010611 7fad7e60e700 1 heartbeat_map is_healthy > 'FileStore::op_tp thread 0x7fad88937700' had timed out after 60 > 2015-04-10 19:32:53.111470 7fad66ed2700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6837/4409 pipe(0x2b793b80 sd=179 :6817 s=2 pgs=5 cs=1 l=0 > c=0x21e8b420).fault with nothing to send, going to standby > 2015-04-10 19:32:53.111496 7fad6329d700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6827/4208 pipe(0x2b793600 sd=172 :6817 s=2 pgs=7 cs=1 l=0 > c=0x1791ab00).fault with nothing to send, going to standby > 2015-04-10 19:32:53.111463 7fad55bd0700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6822/3910 pipe(0x2cb55dc0 sd=262 :6817 s=2 pgs=8 cs=1 l=0 > c=0xe7802c0).fault with nothing to send, going to standby > 2015-04-10 19:32:53.121815 7fad6218c700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6807/3575 pipe(0x2cf8e080 sd=294 :6817 s=2 pgs=4 cs=1 l=0 > c=0x138669a0).fault with nothing to send, going to standby > 2015-04-10 19:32:53.121856 7fad67bdf700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6842/4442 pipe(0x2b792580 sd=190 :6817 s=2 pgs=9 cs=1 l=0 > c=0x138922c0).fault with nothing to send, going to standby > 2015-04-10 19:32:53.123545 7fad651bc700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6801/3053 pipe(0x15e538c0 sd=260 :6817 s=2 pgs=1 cs=1 l=0 > c=0x16bf09a0).fault with nothing to send, going to standby > 2015-04-10 19:32:53.128729 7fad53eb3700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6832/4257 pipe(0x37dcb80 sd=311 :6817 s=2 pgs=3 cs=1 l=0 > c=0x1131f420).fault with nothing to send, going to standby > 2015-04-10 19:32:53.132691 7fad53fb4700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6812/3674 pipe(0x37dbdc0 sd=417 :6817 s=2 pgs=3 cs=1 l=0 > c=0x1c0badc0).fault with nothing to send, going to standby > 2015-04-10 19:32:53.139856 7fad65cc7700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=365 :6817 s=2 pgs=5 cs=1 l=0 > c=0xbd1f2c0).fault with nothing to send, going to standby > > > [ followed by about 500 or so lines which look similar to:] > 2015-04-10 19:33:09.021252 7fad65cc7700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6842/4442 pipe(0x2b792580 sd=24 :45523 s=2 pgs=1721 cs=3371 > l=0 c=0x138922c0).fault, initiating reconnect > 2015-04-10 19:33:09.021280 7fad53eb3700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6807/3575 pipe(0x2cf8e080 sd=37 :51859 s=2 pgs=1755 cs=3449 > l=0 c=0x138669a0).fault, initiating reconnect > 2015-04-10 19:33:09.021443 7fad66ed2700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6812/3674 pipe(0x37dbdc0 sd=18 :46453 s=2 pgs=1782 cs=3489 > l=0 c=0x1c0badc0).fault, initiating reconnect > 2015-04-10 19:33:09.021702 7fad6218c700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6837/4409 pipe(0x2b793b80 sd=38 :53724 s=2 pgs=1757 cs=3449 > l=0 c=0x21e8b420).fault, initiating reconnect > > > [followed by:] > > 2015-04-10 19:33:09.021830 7fad8e142700 0 log_channel(default) log [WRN] : > 6 slow requests, 6 included below; oldest blocked for > 94.284115 secs > 2015-04-10 19:33:09.021834 7fad67bdf700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=29 :50115 s=2 pgs=1717 cs=3395 > l=0 c=0xbd1f2c0).fault, initiating reconnect > 2015-04-10 19:33:09.021840 7fad8e142700 0 log_channel(default) log [WRN] : > slow request 62.902554 seconds old, received at 2015-04-10 19:32:06.119164: > osd_op(client.64965167.0:1562644 rbd_data.1f264b2ae8944a.0000000000000400 > [set-alloc-hint object_size 4194304 write_size 4194304,write 1155072~4096] > 5.f4cd34bb ack+ondisk+write+known_if_redirected e74308) currently waiting > for subops from 13 > 2015-04-10 19:33:09.021846 7fad8e142700 0 log_channel(default) log [WRN] : > slow request 63.004012 seconds old, received at 2015-04-10 19:32:06.017706: > osd_op(client.69637439.0:278581863 rbd_data.265f967a5f7514.0000000000064e2e > [set-alloc-hint object_size 4194304 write_size 4194304,write 1040384~4096] > 5.bf3d2652 snapc ac=[ac] ack+ondisk+write+known_if_redirected e74308) > currently waiting for subops from 11 > 2015-04-10 19:33:09.021851 7fad8e142700 0 log_channel(default) log [WRN] : > slow request 94.284115 seconds old, received at 2015-04-10 19:31:34.737603: > osd_sub_op(unknown.0.0:0 5.299 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] > snapc=0=[]) currently started > 2015-04-10 19:33:09.021856 7fad8e142700 0 log_channel(default) log [WRN] : > slow request 92.669022 seconds old, received at 2015-04-10 19:31:36.352696: > osd_op(client.69645050.0:2750479 rbd_data.1b704a8f51a837.000000000000023d > [stat,set-alloc-hint object_size 4194304 write_size 4194304,write > 1449984~4096] 5.44a257d7 snapc 9cc=[9cc,9c1,9b5,9a9,99d,991,984,972,953,2f8] > ack+ondisk+write+known_if_redirected e74308) currently waiting for subops > from 17 > 2015-04-10 19:33:09.021861 7fad8e142700 0 log_channel(default) log [WRN] : > slow request 76.733358 seconds old, received at 2015-04-10 19:31:52.288360: > osd_op(client.64987185.0:26635606 rbd_data.1b652a2ae8944a.0000000000000623 > [set-alloc-hint object_size 4194304 write_size 4194304,write 1282048~4096] > 5.64d42a36 ack+ondisk+write+known_if_redirected e74308) currently waiting > for subops from 17 > > > > [and finally] > > 2015-04-10 19:33:09.024361 7fad5c83b700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6832/4257 pipe(0x37dcb80 sd=35 :55060 s=2 pgs=1719 cs=3359 > l=0 c=0x1131f420).fault, initiating reconnect > 2015-04-10 19:33:09.024418 7fad82616700 0 log_channel(default) log [WRN] : > map e74309 wrongly marked me down > 2015-04-10 19:33:09.024466 7fad53fb4700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6827/4208 pipe(0x2b793600 sd=33 :36954 s=2 pgs=1731 cs=3403 > l=0 c=0x1791ab00).fault, initiating reconnect > 2015-04-10 19:33:09.024476 7fad67bdf700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=24 :50133 s=2 pgs=1720 cs=3401 > l=0 c=0xbd1f2c0).fault, initiating reconnect > 2015-04-10 19:33:09.024553 7fad65cc7700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6842/4442 pipe(0x2b792580 sd=29 :45548 s=2 pgs=1725 cs=3379 > l=0 c=0x138922c0).fault, initiating reconnect > 2015-04-10 19:33:09.024629 7fad53eb3700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6807/3575 pipe(0x2cf8e080 sd=37 :51884 s=2 pgs=1759 cs=3457 > l=0 c=0x138669a0).fault, initiating reconnect > 2015-04-10 19:33:09.024783 7fad650bb700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6842/4442 pipe(0x2b792580 sd=29 :45548 s=1 pgs=1725 cs=3380 > l=0 c=0x138922c0).fault > 2015-04-10 19:33:09.025056 7fad6218c700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6837/4409 pipe(0x2b793b80 sd=38 :53747 s=2 pgs=1761 cs=3457 > l=0 c=0x21e8b420).fault, initiating reconnect > 2015-04-10 19:33:09.025062 7fad61580700 0 -- 192.168.168.200:6817/3738 >> > 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=24 :50140 s=4 pgs=1720 cs=3402 > l=0 c=0xbd1f2c0).connect got RESETSESSION but no longer connecting > 2015-04-10 19:33:16.999741 7fad5f964700 0 -- 192.168.168.200:6846/1003738 >>> 192.168.168.200:6837/4266 pipe(0xa711580 sd=257 :6846 s=0 pgs=0 cs=0 l=0 > c=0x1afbf840).accept connect_seq 0 vs existing 0 state connecting > 2015-04-10 19:33:16.999883 7fad5f863700 0 -- 192.168.168.200:6846/1003738 >>> 192.168.168.200:6827/3938 pipe(0x2c48a8c0 sd=260 :6846 s=0 pgs=0 cs=0 l=0 > c=0x18a85580).accept connect_seq 0 vs existing 0 state connecting > 2015-04-10 19:36:26.247402 7fad6bde9700 0 log_channel(default) log [INF] : > 5.299 deep-scrub ok > 2015-04-10 19:40:18.577423 7fad725f6700 0 <cls> cls/rgw/cls_rgw.cc:1458: > gc_iterate_entries end_key=1_01428691218.577419000 > > 2015-04-10 19:42:22.709291 7fad6edef700 0 log_channel(default) log [INF] : > 5.2ae deep-scrub ok > 2015-04-10 19:45:51.790224 7fad6bde9700 0 log_channel(default) log [INF] : > 5.2bb deep-scrub ok > 2015-04-10 19:48:17.022413 7fad5f863700 0 -- 192.168.168.200:6846/1003738 >>> 192.168.168.200:6827/3938 pipe(0x2c48a8c0 sd=260 :6846 s=2 pgs=63 cs=1 > l=0 c=0x3f81160).fault with nothing to send, going to standby > > > > > In the past I have been asked to verify that the networking is working okay. > I've ran days and days of various tests and verified that the network is not > the cause of the problems. The network is running perfectly well with no > errors reported on the interfaces nor any delays. I ran multitude of tcp and > udp tests including firing hundreds to thousands connections per second for > weeks with 0% packet loss or any other network related errors. > > Does anyone have any idea what could be causing the osds to be marked down > during the deep scrub activity? > > I have enabled the following tuning options in the ceph.conf: > > #change to decrease rebuild/recovery priority (should make guest vms more > responsive) and decrease scrubbing priority > osd recovery max chunk = 8388608 > osd recovery op priority = 1 > osd max backfills = 1 > osd recovery max active = 1 > osd recovery threads = 1 > osd_disk_thread_ioprio_priority = 7 > osd_disk_thread_ioprio_class = idle > > #reporting of down osds > mon osd min down reporters = 6 > mon osd report timeout = 1800 > mon osd min down reports = 7 > osd heartbeat grace = 60 > > > > Many thanks for your help > > Andrei > > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Best Regards, Wheat _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com