Hi Andrei,
1) what ceph version are you running? 2) what distro and version are you running? 3) have you checked the disk elevator for the OSD devices to be set to cfq? 4) Have have you considered exploring the following parameters to further tune - osd_scrub_chunk_min lower the default value of 5. e.g. = 1 - osd_scrub_chunk_max lower the default value of 25. e.g. = 5 - osd_deep_scrub_stride If you have lowered parameters above, you can play with this one to fit best your physical disk behaviour. - osd_scrub_sleep introduce a half second sleep between 2 scrubs; e.g. = 0.5 to start with a half second delay
Cheers JC
Hi guys,
I was wondering if anyone noticed that the deep scrubbing process causes some osd to go down?
I have been keeping an eye on a few remaining stability issues in my test cluster. One of the unsolved issues is the occasional reporting of osd(s) going down and coming back up after about 20-30 seconds. This happens to various osds throughout the cluster. I have a small cluster of just 2 osd servers with 9 osds each.
The common trend that i see week after week is that whenever there is a long deep scrubbing activity on the cluster it triggers one or more osds to go down for a short period of time. After the osd is marked down, it goes back up after about 20 seconds. Obviously there is a repair process that kicks in which causes more load on the cluster. While looking at the logs, i've not seen the osds being marked down when the cluster is not deep scrubbing. It _always_ happens when there is a deep scrub activity. I am seeing the reports of osds going down about 3-4 times a week.
The latest happened just recently with the following log entries:
2015-04-10 19:32:48.330430 mon.0 192.168.168.13:6789/0 3441533 : cluster [INF] pgmap v50849466: 8508 pgs: 8506 active+clean, 2 active+clean+scrubbing+deep; 13213 GB data, 26896 GB used, 23310 GB / 50206 GB avail; 1005 B/s rd, 1005 B/s wr, 0 op/s 2015-04-10 19:32:52.950633 mon.0 192.168.168.13:6789/0 3441542 : cluster [INF] osd.6 192.168.168.200:6816/3738 failed (5 reports from 5 peers after 60.747890 >= grace 46.701350) 2015-04-10 19:32:53.121904 mon.0 192.168.168.13:6789/0 3441544 : cluster [INF] osdmap e74309: 18 osds: 17 up, 18 in 2015-04-10 19:32:53.231730 mon.0 192.168.168.13:6789/0 3441545 : cluster [INF] pgmap v50849467: 8508 pgs: 599 stale+active+clean, 7907 active+clean, 1 stale+active+clean+scrubbing+deep, 1 active+clean+scrubbing+deep; 13213 GB data, 26896 GB used, 23310 GB / 50206 GB avail; 375 B/s rd, 0 op/s
osd.6 logs around the same time are:
2015-04-10 19:16:29.110617 7fad6d5ec700 0 log_channel(default) log [INF] : 5.3d7 deep-scrub ok 2015-04-10 19:27:47.561389 7fad6bde9700 0 log_channel(default) log [INF] : 5.276 deep-scrub ok 2015-04-10 19:31:11.611321 7fad6d5ec700 0 log_channel(default) log [INF] : 5.287 deep-scrub ok 2015-04-10 19:31:53.339881 7fad7ce0b700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fad735f8700' had timed out after 15 2015-04-10 19:31:53.339887 7fad7ce0b700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fad745fa700' had timed out after 15 2015-04-10 19:31:53.339890 7fad7ce0b700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fad705f2700' had timed out after 15 2015-04-10 19:31:53.340050 7fad7e60e700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fad735f8700' had timed out after 15 2015-04-10 19:31:53.340053 7fad7e60e700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fad745fa700' had timed out after 15
[.....]
2015-04-10 19:32:53.010609 7fad7e60e700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7fad86132700' had timed out after 60 2015-04-10 19:32:53.010611 7fad7e60e700 1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7fad88937700' had timed out after 60 2015-04-10 19:32:53.111470 7fad66ed2700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6837/4409 pipe(0x2b793b80 sd=179 :6817 s=2 pgs=5 cs=1 l=0 c=0x21e8b420).fault with nothing to send, going to standby 2015-04-10 19:32:53.111496 7fad6329d700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6827/4208 pipe(0x2b793600 sd=172 :6817 s=2 pgs=7 cs=1 l=0 c=0x1791ab00).fault with nothing to send, going to standby 2015-04-10 19:32:53.111463 7fad55bd0700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6822/3910 pipe(0x2cb55dc0 sd=262 :6817 s=2 pgs=8 cs=1 l=0 c=0xe7802c0).fault with nothing to send, going to standby 2015-04-10 19:32:53.121815 7fad6218c700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6807/3575 pipe(0x2cf8e080 sd=294 :6817 s=2 pgs=4 cs=1 l=0 c=0x138669a0).fault with nothing to send, going to standby 2015-04-10 19:32:53.121856 7fad67bdf700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6842/4442 pipe(0x2b792580 sd=190 :6817 s=2 pgs=9 cs=1 l=0 c=0x138922c0).fault with nothing to send, going to standby 2015-04-10 19:32:53.123545 7fad651bc700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6801/3053 pipe(0x15e538c0 sd=260 :6817 s=2 pgs=1 cs=1 l=0 c=0x16bf09a0).fault with nothing to send, going to standby 2015-04-10 19:32:53.128729 7fad53eb3700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6832/4257 pipe(0x37dcb80 sd=311 :6817 s=2 pgs=3 cs=1 l=0 c=0x1131f420).fault with nothing to send, going to standby 2015-04-10 19:32:53.132691 7fad53fb4700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6812/3674 pipe(0x37dbdc0 sd=417 :6817 s=2 pgs=3 cs=1 l=0 c=0x1c0badc0).fault with nothing to send, going to standby 2015-04-10 19:32:53.139856 7fad65cc7700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=365 :6817 s=2 pgs=5 cs=1 l=0 c=0xbd1f2c0).fault with nothing to send, going to standby
[ followed by about 500 or so lines which look similar to:] 2015-04-10 19:33:09.021252 7fad65cc7700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6842/4442 pipe(0x2b792580 sd=24 :45523 s=2 pgs=1721 cs=3371 l=0 c=0x138922c0).fault, initiating reconnect 2015-04-10 19:33:09.021280 7fad53eb3700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6807/3575 pipe(0x2cf8e080 sd=37 :51859 s=2 pgs=1755 cs=3449 l=0 c=0x138669a0).fault, initiating reconnect 2015-04-10 19:33:09.021443 7fad66ed2700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6812/3674 pipe(0x37dbdc0 sd=18 :46453 s=2 pgs=1782 cs=3489 l=0 c=0x1c0badc0).fault, initiating reconnect 2015-04-10 19:33:09.021702 7fad6218c700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6837/4409 pipe(0x2b793b80 sd=38 :53724 s=2 pgs=1757 cs=3449 l=0 c=0x21e8b420).fault, initiating reconnect
[followed by:]
2015-04-10 19:33:09.021830 7fad8e142700 0 log_channel(default) log [WRN] : 6 slow requests, 6 included below; oldest blocked for > 94.284115 secs 2015-04-10 19:33:09.021834 7fad67bdf700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=29 :50115 s=2 pgs=1717 cs=3395 l=0 c=0xbd1f2c0).fault, initiating reconnect 2015-04-10 19:33:09.021840 7fad8e142700 0 log_channel(default) log [WRN] : slow request 62.902554 seconds old, received at 2015-04-10 19:32:06.119164: osd_op(client.64965167.0:1562644 rbd_data.1f264b2ae8944a.0000000000000400 [set-alloc-hint object_size 4194304 write_size 4194304,write 1155072~4096] 5.f4cd34bb ack+ondisk+write+known_if_redirected e74308) currently waiting for subops from 13 2015-04-10 19:33:09.021846 7fad8e142700 0 log_channel(default) log [WRN] : slow request 63.004012 seconds old, received at 2015-04-10 19:32:06.017706: osd_op(client.69637439.0:278581863 rbd_data.265f967a5f7514.0000000000064e2e [set-alloc-hint object_size 4194304 write_size 4194304,write 1040384~4096] 5.bf3d2652 snapc ac=[ac] ack+ondisk+write+known_if_redirected e74308) currently waiting for subops from 11 2015-04-10 19:33:09.021851 7fad8e142700 0 log_channel(default) log [WRN] : slow request 94.284115 seconds old, received at 2015-04-10 19:31:34.737603: osd_sub_op(unknown.0.0:0 5.299 0//0//-1 [scrub-map] v 0'0 snapset=0=[]:[] snapc=0=[]) currently started 2015-04-10 19:33:09.021856 7fad8e142700 0 log_channel(default) log [WRN] : slow request 92.669022 seconds old, received at 2015-04-10 19:31:36.352696: osd_op(client.69645050.0:2750479 rbd_data.1b704a8f51a837.000000000000023d [stat,set-alloc-hint object_size 4194304 write_size 4194304,write 1449984~4096] 5.44a257d7 snapc 9cc=[9cc,9c1,9b5,9a9,99d,991,984,972,953,2f8] ack+ondisk+write+known_if_redirected e74308) currently waiting for subops from 17 2015-04-10 19:33:09.021861 7fad8e142700 0 log_channel(default) log [WRN] : slow request 76.733358 seconds old, received at 2015-04-10 19:31:52.288360: osd_op(client.64987185.0:26635606 rbd_data.1b652a2ae8944a.0000000000000623 [set-alloc-hint object_size 4194304 write_size 4194304,write 1282048~4096] 5.64d42a36 ack+ondisk+write+known_if_redirected e74308) currently waiting for subops from 17
[and finally]
2015-04-10 19:33:09.024361 7fad5c83b700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6832/4257 pipe(0x37dcb80 sd=35 :55060 s=2 pgs=1719 cs=3359 l=0 c=0x1131f420).fault, initiating reconnect 2015-04-10 19:33:09.024418 7fad82616700 0 log_channel(default) log [WRN] : map e74309 wrongly marked me down 2015-04-10 19:33:09.024466 7fad53fb4700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6827/4208 pipe(0x2b793600 sd=33 :36954 s=2 pgs=1731 cs=3403 l=0 c=0x1791ab00).fault, initiating reconnect 2015-04-10 19:33:09.024476 7fad67bdf700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=24 :50133 s=2 pgs=1720 cs=3401 l=0 c=0xbd1f2c0).fault, initiating reconnect 2015-04-10 19:33:09.024553 7fad65cc7700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6842/4442 pipe(0x2b792580 sd=29 :45548 s=2 pgs=1725 cs=3379 l=0 c=0x138922c0).fault, initiating reconnect 2015-04-10 19:33:09.024629 7fad53eb3700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6807/3575 pipe(0x2cf8e080 sd=37 :51884 s=2 pgs=1759 cs=3457 l=0 c=0x138669a0).fault, initiating reconnect 2015-04-10 19:33:09.024783 7fad650bb700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6842/4442 pipe(0x2b792580 sd=29 :45548 s=1 pgs=1725 cs=3380 l=0 c=0x138922c0).fault 2015-04-10 19:33:09.025056 7fad6218c700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6837/4409 pipe(0x2b793b80 sd=38 :53747 s=2 pgs=1761 cs=3457 l=0 c=0x21e8b420).fault, initiating reconnect 2015-04-10 19:33:09.025062 7fad61580700 0 -- 192.168.168.200:6817/3738 >> 192.168.168.201:6817/3864 pipe(0x2cf8eb80 sd=24 :50140 s=4 pgs=1720 cs=3402 l=0 c=0xbd1f2c0).connect got RESETSESSION but no longer connecting 2015-04-10 19:33:16.999741 7fad5f964700 0 -- 192.168.168.200:6846/1003738 >> 192.168.168.200:6837/4266 pipe(0xa711580 sd=257 :6846 s=0 pgs=0 cs=0 l=0 c=0x1afbf840).accept connect_seq 0 vs existing 0 state connecting 2015-04-10 19:33:16.999883 7fad5f863700 0 -- 192.168.168.200:6846/1003738 >> 192.168.168.200:6827/3938 pipe(0x2c48a8c0 sd=260 :6846 s=0 pgs=0 cs=0 l=0 c=0x18a85580).accept connect_seq 0 vs existing 0 state connecting 2015-04-10 19:36:26.247402 7fad6bde9700 0 log_channel(default) log [INF] : 5.299 deep-scrub ok 2015-04-10 19:40:18.577423 7fad725f6700 0 <cls> cls/rgw/cls_rgw.cc:1458: gc_iterate_entries end_key=1_01428691218.577419000
2015-04-10 19:42:22.709291 7fad6edef700 0 log_channel(default) log [INF] : 5.2ae deep-scrub ok 2015-04-10 19:45:51.790224 7fad6bde9700 0 log_channel(default) log [INF] : 5.2bb deep-scrub ok 2015-04-10 19:48:17.022413 7fad5f863700 0 -- 192.168.168.200:6846/1003738 >> 192.168.168.200:6827/3938 pipe(0x2c48a8c0 sd=260 :6846 s=2 pgs=63 cs=1 l=0 c=0x3f81160).fault with nothing to send, going to standby
In the past I have been asked to verify that the networking is working okay. I've ran days and days of various tests and verified that the network is not the cause of the problems. The network is running perfectly well with no errors reported on the interfaces nor any delays. I ran multitude of tcp and udp tests including firing hundreds to thousands connections per second for weeks with 0% packet loss or any other network related errors.
Does anyone have any idea what could be causing the osds to be marked down during the deep scrub activity?
I have enabled the following tuning options in the ceph.conf:
#change to decrease rebuild/recovery priority (should make guest vms more responsive) and decrease scrubbing priority osd recovery max chunk = 8388608 osd recovery op priority = 1 osd max backfills = 1 osd recovery max active = 1 osd recovery threads = 1 osd_disk_thread_ioprio_priority = 7 osd_disk_thread_ioprio_class = idle
#reporting of down osds mon osd min down reporters = 6 mon osd report timeout = 1800 mon osd min down reports = 7 osd heartbeat grace = 60
Many thanks for your help
Andrei
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxxhttp://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|