Re: deep scrubbing causes osd down

Haomai Wang <haomaiwang@xxxxxxxxx> · Sat, 11 Apr 2015 12:23:27 +0800

It looks like deep scrub cause the disk busy and some threads blocking on this.

Maybe you could lower the scrub related configurations and see the
disk util when deep-scrubing.

On Sat, Apr 11, 2015 at 3:01 AM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
> Hi guys,
>
> I was wondering if anyone noticed that the deep scrubbing process causes
> some osd to go down?
>
> I have been keeping an eye on a few remaining stability issues in my test
> cluster. One of the unsolved issues is the occasional reporting of osd(s)
> going down and coming back up after about 20-30 seconds. This happens to
> various osds throughout the cluster. I have a small cluster of just 2 osd
> servers with 9 osds each.
>
> The common trend that i see week after week is that whenever there is a long
> deep scrubbing activity on the cluster it triggers one or more osds to go
> down for a short period of time. After the osd is marked down, it goes back
> up after about 20 seconds. Obviously there is a repair process that kicks in
> which causes more load on the cluster. While looking at the logs, i've not
> seen the osds being marked down when the cluster is not deep scrubbing. It
> _always_ happens when there is a deep scrub activity. I am seeing the
> reports of osds going down about 3-4 times a week.
>
> The latest happened just recently with the following log entries:
>
> 2015-04-10 19:32:48.330430 mon.0 192.168.168.13:6789/0 3441533 : cluster
> [INF] pgmap v50849466: 8508 pgs: 8506 active+clean, 2
> active+clean+scrubbing+deep; 13213 GB data, 26896 GB used, 23310 GB / 50206
> GB avail; 1005 B/s rd, 1005
> B/s wr, 0 op/s
> 2015-04-10 19:32:52.950633 mon.0 192.168.168.13:6789/0 3441542 : cluster
> [INF] osd.6 192.168.168.200:6816/3738 failed (5 reports from 5 peers after
> 60.747890 >= grace 46.701350)
> 2015-04-10 19:32:53.121904 mon.0 192.168.168.13:6789/0 3441544 : cluster
> [INF] osdmap e74309: 18 osds: 17 up, 18 in
> 2015-04-10 19:32:53.231730 mon.0 192.168.168.13:6789/0 3441545 : cluster
> [INF] pgmap v50849467: 8508 pgs: 599 stale+active+clean, 7907 active+clean,
> 1 stale+active+clean+scrubbing+deep, 1 active+clean+scrubbing+deep; 13213 GB
> data, 26896 GB used, 23310 GB / 50206 GB avail; 375 B/s rd, 0 op/s
>
>
> osd.6 logs around the same time are:
>
> 2015-04-10 19:16:29.110617 7fad6d5ec700  0 log_channel(default) log [INF] :
> 5.3d7 deep-scrub ok
> 2015-04-10 19:27:47.561389 7fad6bde9700  0 log_channel(default) log [INF] :
> 5.276 deep-scrub ok
> 2015-04-10 19:31:11.611321 7fad6d5ec700  0 log_channel(default) log [INF] :
> 5.287 deep-scrub ok
> 2015-04-10 19:31:53.339881 7fad7ce0b700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fad735f8700' had timed out after 15
> 2015-04-10 19:31:53.339887 7fad7ce0b700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fad745fa700' had timed out after 15
> 2015-04-10 19:31:53.339890 7fad7ce0b700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fad705f2700' had timed out after 15
> 2015-04-10 19:31:53.340050 7fad7e60e700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fad735f8700' had timed out after 15
> 2015-04-10 19:31:53.340053 7fad7e60e700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fad745fa700' had timed out after 15
>
> [.....]
>
> 2015-04-10 19:32:53.010609 7fad7e60e700  1 heartbeat_map is_healthy
> 'FileStore::op_tp thread 0x7fad86132700' had timed out after 60
> 2015-04-10 19:32:53.010611 7fad7e60e700  1 heartbeat_map is_healthy
> 'FileStore::op_tp thread 0x7fad88937700' had timed out after 60
> 2015-04-10 19:32:53.111470 7fad66ed2700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6837/4409 pipe(0x2b793b80 sd=179 :6817 s=2 pgs=5 cs=1 l=0
> c=0x21e8b420).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.111496 7fad6329d700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6827/4208 pipe(0x2b793600 sd=172 :6817 s=2 pgs=7 cs=1 l=0
> c=0x1791ab00).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.111463 7fad55bd0700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6822/3910 pipe(0x2cb55dc0 sd=262 :6817 s=2 pgs=8 cs=1 l=0
> c=0xe7802c0).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.121815 7fad6218c700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6807/3575 pipe(0x2cf8e080 sd=294 :6817 s=2 pgs=4 cs=1 l=0
> c=0x138669a0).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.121856 7fad67bdf700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6842/4442 pipe(0x2b792580 sd=190 :6817 s=2 pgs=9 cs=1 l=0
> c=0x138922c0).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.123545 7fad651bc700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6801/3053 pipe(0x15e538c0 sd=260 :6817 s=2 pgs=1 cs=1 l=0
> c=0x16bf09a0).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.128729 7fad53eb3700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6832/4257 pipe(0x37dcb80 sd=311 :6817 s=2 pgs=3 cs=1 l=0
> c=0x1131f420).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.132691 7fad53fb4700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6812/3674 pipe(0x37dbdc0 sd=417 :6817 s=2 pgs=3 cs=1 l=0
> c=0x1c0badc0).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.139856 7fad65cc7700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=365 :6817 s=2 pgs=5 cs=1 l=0
> c=0xbd1f2c0).fault with nothing to send, going to standby
>
>
> [ followed by about 500 or so lines which look similar to:]
> 2015-04-10 19:33:09.021252 7fad65cc7700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6842/4442 pipe(0x2b792580 sd=24 :45523 s=2 pgs=1721 cs=3371
> l=0 c=0x138922c0).fault, initiating reconnect
> 2015-04-10 19:33:09.021280 7fad53eb3700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6807/3575 pipe(0x2cf8e080 sd=37 :51859 s=2 pgs=1755 cs=3449
> l=0 c=0x138669a0).fault, initiating reconnect
> 2015-04-10 19:33:09.021443 7fad66ed2700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6812/3674 pipe(0x37dbdc0 sd=18 :46453 s=2 pgs=1782 cs=3489
> l=0 c=0x1c0badc0).fault, initiating reconnect
> 2015-04-10 19:33:09.021702 7fad6218c700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6837/4409 pipe(0x2b793b80 sd=38 :53724 s=2 pgs=1757 cs=3449
> l=0 c=0x21e8b420).fault, initiating reconnect
>
>
> [followed by:]
>
> 2015-04-10 19:33:09.021830 7fad8e142700  0 log_channel(default) log [WRN] :
> 6 slow requests, 6 included below; oldest blocked for > 94.284115 secs
> 2015-04-10 19:33:09.021834 7fad67bdf700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=29 :50115 s=2 pgs=1717 cs=3395
> l=0 c=0xbd1f2c0).fault, initiating reconnect
> 2015-04-10 19:33:09.021840 7fad8e142700  0 log_channel(default) log [WRN] :
> slow request 62.902554 seconds old, received at 2015-04-10 19:32:06.119164:
> osd_op(client.64965167.0:1562644 rbd_data.1f264b2ae8944a.0000000000000400
> [set-alloc-hint object_size 4194304 write_size 4194304,write 1155072~4096]
> 5.f4cd34bb ack+ondisk+write+known_if_redirected e74308) currently waiting
> for subops from 13
> 2015-04-10 19:33:09.021846 7fad8e142700  0 log_channel(default) log [WRN] :
> slow request 63.004012 seconds old, received at 2015-04-10 19:32:06.017706:
> osd_op(client.69637439.0:278581863 rbd_data.265f967a5f7514.0000000000064e2e
> [set-alloc-hint object_size 4194304 write_size 4194304,write 1040384~4096]
> 5.bf3d2652 snapc ac=[ac] ack+ondisk+write+known_if_redirected e74308)
> currently waiting for subops from 11
> 2015-04-10 19:33:09.021851 7fad8e142700  0 log_channel(default) log [WRN] :
> slow request 94.284115 seconds old, received at 2015-04-10 19:31:34.737603:
> osd_sub_op(unknown.0.0:0 5.299 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[]
> snapc=0=[]) currently started
> 2015-04-10 19:33:09.021856 7fad8e142700  0 log_channel(default) log [WRN] :
> slow request 92.669022 seconds old, received at 2015-04-10 19:31:36.352696:
> osd_op(client.69645050.0:2750479 rbd_data.1b704a8f51a837.000000000000023d
> [stat,set-alloc-hint object_size 4194304 write_size 4194304,write
> 1449984~4096] 5.44a257d7 snapc 9cc=[9cc,9c1,9b5,9a9,99d,991,984,972,953,2f8]
> ack+ondisk+write+known_if_redirected e74308) currently waiting for subops
> from 17
> 2015-04-10 19:33:09.021861 7fad8e142700  0 log_channel(default) log [WRN] :
> slow request 76.733358 seconds old, received at 2015-04-10 19:31:52.288360:
> osd_op(client.64987185.0:26635606 rbd_data.1b652a2ae8944a.0000000000000623
> [set-alloc-hint object_size 4194304 write_size 4194304,write 1282048~4096]
> 5.64d42a36 ack+ondisk+write+known_if_redirected e74308) currently waiting
> for subops from 17
>
>
>
> [and finally]
>
> 2015-04-10 19:33:09.024361 7fad5c83b700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6832/4257 pipe(0x37dcb80 sd=35 :55060 s=2 pgs=1719 cs=3359
> l=0 c=0x1131f420).fault, initiating reconnect
> 2015-04-10 19:33:09.024418 7fad82616700  0 log_channel(default) log [WRN] :
> map e74309 wrongly marked me down
> 2015-04-10 19:33:09.024466 7fad53fb4700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6827/4208 pipe(0x2b793600 sd=33 :36954 s=2 pgs=1731 cs=3403
> l=0 c=0x1791ab00).fault, initiating reconnect
> 2015-04-10 19:33:09.024476 7fad67bdf700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=24 :50133 s=2 pgs=1720 cs=3401
> l=0 c=0xbd1f2c0).fault, initiating reconnect
> 2015-04-10 19:33:09.024553 7fad65cc7700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6842/4442 pipe(0x2b792580 sd=29 :45548 s=2 pgs=1725 cs=3379
> l=0 c=0x138922c0).fault, initiating reconnect
> 2015-04-10 19:33:09.024629 7fad53eb3700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6807/3575 pipe(0x2cf8e080 sd=37 :51884 s=2 pgs=1759 cs=3457
> l=0 c=0x138669a0).fault, initiating reconnect
> 2015-04-10 19:33:09.024783 7fad650bb700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6842/4442 pipe(0x2b792580 sd=29 :45548 s=1 pgs=1725 cs=3380
> l=0 c=0x138922c0).fault
> 2015-04-10 19:33:09.025056 7fad6218c700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6837/4409 pipe(0x2b793b80 sd=38 :53747 s=2 pgs=1761 cs=3457
> l=0 c=0x21e8b420).fault, initiating reconnect
> 2015-04-10 19:33:09.025062 7fad61580700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=24 :50140 s=4 pgs=1720 cs=3402
> l=0 c=0xbd1f2c0).connect got RESETSESSION but no longer connecting
> 2015-04-10 19:33:16.999741 7fad5f964700  0 -- 192.168.168.200:6846/1003738
>>> 192.168.168.200:6837/4266 pipe(0xa711580 sd=257 :6846 s=0 pgs=0 cs=0 l=0
> c=0x1afbf840).accept connect_seq 0 vs existing 0 state connecting
> 2015-04-10 19:33:16.999883 7fad5f863700  0 -- 192.168.168.200:6846/1003738
>>> 192.168.168.200:6827/3938 pipe(0x2c48a8c0 sd=260 :6846 s=0 pgs=0 cs=0 l=0
> c=0x18a85580).accept connect_seq 0 vs existing 0 state connecting
> 2015-04-10 19:36:26.247402 7fad6bde9700  0 log_channel(default) log [INF] :
> 5.299 deep-scrub ok
> 2015-04-10 19:40:18.577423 7fad725f6700  0 <cls> cls/rgw/cls_rgw.cc:1458:
> gc_iterate_entries end_key=1_01428691218.577419000
>
> 2015-04-10 19:42:22.709291 7fad6edef700  0 log_channel(default) log [INF] :
> 5.2ae deep-scrub ok
> 2015-04-10 19:45:51.790224 7fad6bde9700  0 log_channel(default) log [INF] :
> 5.2bb deep-scrub ok
> 2015-04-10 19:48:17.022413 7fad5f863700  0 -- 192.168.168.200:6846/1003738
>>> 192.168.168.200:6827/3938 pipe(0x2c48a8c0 sd=260 :6846 s=2 pgs=63 cs=1
> l=0 c=0x3f81160).fault with nothing to send, going to standby
>
>
>
>
> In the past I have been asked to verify that the networking is working okay.
> I've ran days and days of various tests and verified that the network is not
> the cause of the problems. The network is running perfectly well with no
> errors reported on the interfaces nor any delays. I ran multitude of tcp and
> udp tests including firing hundreds to thousands connections per second for
> weeks with 0% packet loss or any other network related errors.
>
> Does anyone have any idea what could be causing the osds to be marked down
> during the deep scrub activity?
>
> I have enabled the following tuning options in the ceph.conf:
>
> #change to decrease rebuild/recovery priority (should make guest vms more
> responsive) and decrease scrubbing priority
> osd recovery max chunk = 8388608
> osd recovery op priority = 1
> osd max backfills = 1
> osd recovery max active = 1
> osd recovery threads = 1
> osd_disk_thread_ioprio_priority = 7
> osd_disk_thread_ioprio_class = idle
>
> #reporting of down osds
> mon osd min down reporters = 6
> mon osd report timeout = 1800
> mon osd min down reports = 7
> osd heartbeat grace = 60
>
>
>
> Many thanks for your help
>
> Andrei
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Best Regards,

Wheat
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com