Re: deep scrubbing causes osd down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It looks like deep scrub cause the disk busy and some threads blocking on this.

Maybe you could lower the scrub related configurations and see the
disk util when deep-scrubing.

On Sat, Apr 11, 2015 at 3:01 AM, Andrei Mikhailovsky <andrei@xxxxxxxxxx> wrote:
> Hi guys,
>
> I was wondering if anyone noticed that the deep scrubbing process causes
> some osd to go down?
>
> I have been keeping an eye on a few remaining stability issues in my test
> cluster. One of the unsolved issues is the occasional reporting of osd(s)
> going down and coming back up after about 20-30 seconds. This happens to
> various osds throughout the cluster. I have a small cluster of just 2 osd
> servers with 9 osds each.
>
> The common trend that i see week after week is that whenever there is a long
> deep scrubbing activity on the cluster it triggers one or more osds to go
> down for a short period of time. After the osd is marked down, it goes back
> up after about 20 seconds. Obviously there is a repair process that kicks in
> which causes more load on the cluster. While looking at the logs, i've not
> seen the osds being marked down when the cluster is not deep scrubbing. It
> _always_ happens when there is a deep scrub activity. I am seeing the
> reports of osds going down about 3-4 times a week.
>
> The latest happened just recently with the following log entries:
>
> 2015-04-10 19:32:48.330430 mon.0 192.168.168.13:6789/0 3441533 : cluster
> [INF] pgmap v50849466: 8508 pgs: 8506 active+clean, 2
> active+clean+scrubbing+deep; 13213 GB data, 26896 GB used, 23310 GB / 50206
> GB avail; 1005 B/s rd, 1005
> B/s wr, 0 op/s
> 2015-04-10 19:32:52.950633 mon.0 192.168.168.13:6789/0 3441542 : cluster
> [INF] osd.6 192.168.168.200:6816/3738 failed (5 reports from 5 peers after
> 60.747890 >= grace 46.701350)
> 2015-04-10 19:32:53.121904 mon.0 192.168.168.13:6789/0 3441544 : cluster
> [INF] osdmap e74309: 18 osds: 17 up, 18 in
> 2015-04-10 19:32:53.231730 mon.0 192.168.168.13:6789/0 3441545 : cluster
> [INF] pgmap v50849467: 8508 pgs: 599 stale+active+clean, 7907 active+clean,
> 1 stale+active+clean+scrubbing+deep, 1 active+clean+scrubbing+deep; 13213 GB
> data, 26896 GB used, 23310 GB / 50206 GB avail; 375 B/s rd, 0 op/s
>
>
> osd.6 logs around the same time are:
>
> 2015-04-10 19:16:29.110617 7fad6d5ec700  0 log_channel(default) log [INF] :
> 5.3d7 deep-scrub ok
> 2015-04-10 19:27:47.561389 7fad6bde9700  0 log_channel(default) log [INF] :
> 5.276 deep-scrub ok
> 2015-04-10 19:31:11.611321 7fad6d5ec700  0 log_channel(default) log [INF] :
> 5.287 deep-scrub ok
> 2015-04-10 19:31:53.339881 7fad7ce0b700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fad735f8700' had timed out after 15
> 2015-04-10 19:31:53.339887 7fad7ce0b700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fad745fa700' had timed out after 15
> 2015-04-10 19:31:53.339890 7fad7ce0b700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fad705f2700' had timed out after 15
> 2015-04-10 19:31:53.340050 7fad7e60e700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fad735f8700' had timed out after 15
> 2015-04-10 19:31:53.340053 7fad7e60e700  1 heartbeat_map is_healthy
> 'OSD::osd_op_tp thread 0x7fad745fa700' had timed out after 15
>
> [.....]
>
> 2015-04-10 19:32:53.010609 7fad7e60e700  1 heartbeat_map is_healthy
> 'FileStore::op_tp thread 0x7fad86132700' had timed out after 60
> 2015-04-10 19:32:53.010611 7fad7e60e700  1 heartbeat_map is_healthy
> 'FileStore::op_tp thread 0x7fad88937700' had timed out after 60
> 2015-04-10 19:32:53.111470 7fad66ed2700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6837/4409 pipe(0x2b793b80 sd=179 :6817 s=2 pgs=5 cs=1 l=0
> c=0x21e8b420).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.111496 7fad6329d700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6827/4208 pipe(0x2b793600 sd=172 :6817 s=2 pgs=7 cs=1 l=0
> c=0x1791ab00).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.111463 7fad55bd0700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6822/3910 pipe(0x2cb55dc0 sd=262 :6817 s=2 pgs=8 cs=1 l=0
> c=0xe7802c0).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.121815 7fad6218c700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6807/3575 pipe(0x2cf8e080 sd=294 :6817 s=2 pgs=4 cs=1 l=0
> c=0x138669a0).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.121856 7fad67bdf700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6842/4442 pipe(0x2b792580 sd=190 :6817 s=2 pgs=9 cs=1 l=0
> c=0x138922c0).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.123545 7fad651bc700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6801/3053 pipe(0x15e538c0 sd=260 :6817 s=2 pgs=1 cs=1 l=0
> c=0x16bf09a0).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.128729 7fad53eb3700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6832/4257 pipe(0x37dcb80 sd=311 :6817 s=2 pgs=3 cs=1 l=0
> c=0x1131f420).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.132691 7fad53fb4700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6812/3674 pipe(0x37dbdc0 sd=417 :6817 s=2 pgs=3 cs=1 l=0
> c=0x1c0badc0).fault with nothing to send, going to standby
> 2015-04-10 19:32:53.139856 7fad65cc7700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=365 :6817 s=2 pgs=5 cs=1 l=0
> c=0xbd1f2c0).fault with nothing to send, going to standby
>
>
> [ followed by about 500 or so lines which look similar to:]
> 2015-04-10 19:33:09.021252 7fad65cc7700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6842/4442 pipe(0x2b792580 sd=24 :45523 s=2 pgs=1721 cs=3371
> l=0 c=0x138922c0).fault, initiating reconnect
> 2015-04-10 19:33:09.021280 7fad53eb3700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6807/3575 pipe(0x2cf8e080 sd=37 :51859 s=2 pgs=1755 cs=3449
> l=0 c=0x138669a0).fault, initiating reconnect
> 2015-04-10 19:33:09.021443 7fad66ed2700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6812/3674 pipe(0x37dbdc0 sd=18 :46453 s=2 pgs=1782 cs=3489
> l=0 c=0x1c0badc0).fault, initiating reconnect
> 2015-04-10 19:33:09.021702 7fad6218c700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6837/4409 pipe(0x2b793b80 sd=38 :53724 s=2 pgs=1757 cs=3449
> l=0 c=0x21e8b420).fault, initiating reconnect
>
>
> [followed by:]
>
> 2015-04-10 19:33:09.021830 7fad8e142700  0 log_channel(default) log [WRN] :
> 6 slow requests, 6 included below; oldest blocked for > 94.284115 secs
> 2015-04-10 19:33:09.021834 7fad67bdf700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=29 :50115 s=2 pgs=1717 cs=3395
> l=0 c=0xbd1f2c0).fault, initiating reconnect
> 2015-04-10 19:33:09.021840 7fad8e142700  0 log_channel(default) log [WRN] :
> slow request 62.902554 seconds old, received at 2015-04-10 19:32:06.119164:
> osd_op(client.64965167.0:1562644 rbd_data.1f264b2ae8944a.0000000000000400
> [set-alloc-hint object_size 4194304 write_size 4194304,write 1155072~4096]
> 5.f4cd34bb ack+ondisk+write+known_if_redirected e74308) currently waiting
> for subops from 13
> 2015-04-10 19:33:09.021846 7fad8e142700  0 log_channel(default) log [WRN] :
> slow request 63.004012 seconds old, received at 2015-04-10 19:32:06.017706:
> osd_op(client.69637439.0:278581863 rbd_data.265f967a5f7514.0000000000064e2e
> [set-alloc-hint object_size 4194304 write_size 4194304,write 1040384~4096]
> 5.bf3d2652 snapc ac=[ac] ack+ondisk+write+known_if_redirected e74308)
> currently waiting for subops from 11
> 2015-04-10 19:33:09.021851 7fad8e142700  0 log_channel(default) log [WRN] :
> slow request 94.284115 seconds old, received at 2015-04-10 19:31:34.737603:
> osd_sub_op(unknown.0.0:0 5.299 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[]
> snapc=0=[]) currently started
> 2015-04-10 19:33:09.021856 7fad8e142700  0 log_channel(default) log [WRN] :
> slow request 92.669022 seconds old, received at 2015-04-10 19:31:36.352696:
> osd_op(client.69645050.0:2750479 rbd_data.1b704a8f51a837.000000000000023d
> [stat,set-alloc-hint object_size 4194304 write_size 4194304,write
> 1449984~4096] 5.44a257d7 snapc 9cc=[9cc,9c1,9b5,9a9,99d,991,984,972,953,2f8]
> ack+ondisk+write+known_if_redirected e74308) currently waiting for subops
> from 17
> 2015-04-10 19:33:09.021861 7fad8e142700  0 log_channel(default) log [WRN] :
> slow request 76.733358 seconds old, received at 2015-04-10 19:31:52.288360:
> osd_op(client.64987185.0:26635606 rbd_data.1b652a2ae8944a.0000000000000623
> [set-alloc-hint object_size 4194304 write_size 4194304,write 1282048~4096]
> 5.64d42a36 ack+ondisk+write+known_if_redirected e74308) currently waiting
> for subops from 17
>
>
>
> [and finally]
>
> 2015-04-10 19:33:09.024361 7fad5c83b700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6832/4257 pipe(0x37dcb80 sd=35 :55060 s=2 pgs=1719 cs=3359
> l=0 c=0x1131f420).fault, initiating reconnect
> 2015-04-10 19:33:09.024418 7fad82616700  0 log_channel(default) log [WRN] :
> map e74309 wrongly marked me down
> 2015-04-10 19:33:09.024466 7fad53fb4700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6827/4208 pipe(0x2b793600 sd=33 :36954 s=2 pgs=1731 cs=3403
> l=0 c=0x1791ab00).fault, initiating reconnect
> 2015-04-10 19:33:09.024476 7fad67bdf700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=24 :50133 s=2 pgs=1720 cs=3401
> l=0 c=0xbd1f2c0).fault, initiating reconnect
> 2015-04-10 19:33:09.024553 7fad65cc7700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6842/4442 pipe(0x2b792580 sd=29 :45548 s=2 pgs=1725 cs=3379
> l=0 c=0x138922c0).fault, initiating reconnect
> 2015-04-10 19:33:09.024629 7fad53eb3700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6807/3575 pipe(0x2cf8e080 sd=37 :51884 s=2 pgs=1759 cs=3457
> l=0 c=0x138669a0).fault, initiating reconnect
> 2015-04-10 19:33:09.024783 7fad650bb700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6842/4442 pipe(0x2b792580 sd=29 :45548 s=1 pgs=1725 cs=3380
> l=0 c=0x138922c0).fault
> 2015-04-10 19:33:09.025056 7fad6218c700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6837/4409 pipe(0x2b793b80 sd=38 :53747 s=2 pgs=1761 cs=3457
> l=0 c=0x21e8b420).fault, initiating reconnect
> 2015-04-10 19:33:09.025062 7fad61580700  0 -- 192.168.168.200:6817/3738 >>
> 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=24 :50140 s=4 pgs=1720 cs=3402
> l=0 c=0xbd1f2c0).connect got RESETSESSION but no longer connecting
> 2015-04-10 19:33:16.999741 7fad5f964700  0 -- 192.168.168.200:6846/1003738
>>> 192.168.168.200:6837/4266 pipe(0xa711580 sd=257 :6846 s=0 pgs=0 cs=0 l=0
> c=0x1afbf840).accept connect_seq 0 vs existing 0 state connecting
> 2015-04-10 19:33:16.999883 7fad5f863700  0 -- 192.168.168.200:6846/1003738
>>> 192.168.168.200:6827/3938 pipe(0x2c48a8c0 sd=260 :6846 s=0 pgs=0 cs=0 l=0
> c=0x18a85580).accept connect_seq 0 vs existing 0 state connecting
> 2015-04-10 19:36:26.247402 7fad6bde9700  0 log_channel(default) log [INF] :
> 5.299 deep-scrub ok
> 2015-04-10 19:40:18.577423 7fad725f6700  0 <cls> cls/rgw/cls_rgw.cc:1458:
> gc_iterate_entries end_key=1_01428691218.577419000
>
> 2015-04-10 19:42:22.709291 7fad6edef700  0 log_channel(default) log [INF] :
> 5.2ae deep-scrub ok
> 2015-04-10 19:45:51.790224 7fad6bde9700  0 log_channel(default) log [INF] :
> 5.2bb deep-scrub ok
> 2015-04-10 19:48:17.022413 7fad5f863700  0 -- 192.168.168.200:6846/1003738
>>> 192.168.168.200:6827/3938 pipe(0x2c48a8c0 sd=260 :6846 s=2 pgs=63 cs=1
> l=0 c=0x3f81160).fault with nothing to send, going to standby
>
>
>
>
> In the past I have been asked to verify that the networking is working okay.
> I've ran days and days of various tests and verified that the network is not
> the cause of the problems. The network is running perfectly well with no
> errors reported on the interfaces nor any delays. I ran multitude of tcp and
> udp tests including firing hundreds to thousands connections per second for
> weeks with 0% packet loss or any other network related errors.
>
> Does anyone have any idea what could be causing the osds to be marked down
> during the deep scrub activity?
>
> I have enabled the following tuning options in the ceph.conf:
>
> #change to decrease rebuild/recovery priority (should make guest vms more
> responsive) and decrease scrubbing priority
> osd recovery max chunk = 8388608
> osd recovery op priority = 1
> osd max backfills = 1
> osd recovery max active = 1
> osd recovery threads = 1
> osd_disk_thread_ioprio_priority = 7
> osd_disk_thread_ioprio_class = idle
>
> #reporting of down osds
> mon osd min down reporters = 6
> mon osd report timeout = 1800
> mon osd min down reports = 7
> osd heartbeat grace = 60
>
>
>
> Many thanks for your help
>
> Andrei
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Best Regards,

Wheat
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux