Re: OSD went down but no idea why

"blackpiglet J." <blackpigletbruce@xxxxxxxxx> · Tue, 30 Jan 2018 16:59:02 +0800

I found some logs says osd.91 is down. I think that should be same for osd.9
I am not sure what will cause the OSD process treated by peers as down.

2018-01-30 06:39:33.767747 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424396 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 1 entries (version 424396) 
2018-01-30 06:39:37.623969 7f2409f07700  0 log_channel(cluster) log [INF] : Health check cleared: REQUEST_SLOW (was: 2 slow requests are blocked > 32 sec) 
2018-01-30 06:39:37.751100 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424397 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 1 entries (version 424397) 
2018-01-30 06:39:53.535500 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424398 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 0 entries (version 424398) 
2018-01-30 06:40:09.251953 7f2409f07700  0 mon.ubuntuser8@0(leader).data_health(33) update_stats avail 86% total 806 GB, used 70945 MB, avail 696 GB
2018-01-30 06:40:11.668097 7f2409f07700  0 log_channel(cluster) log [WRN] : Health check failed: 1 slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-01-30 06:40:11.794779 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424399 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 1 entries (version 424399) 
2018-01-30 06:40:18.599837 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.0 10.1.248.1:6871/3727 is reporting failure:1
2018-01-30 06:40:18.599861 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.0 10.1.248.1:6871/3727
2018-01-30 06:40:18.783394 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424400 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 0 entries (version 424400) 
2018-01-30 06:40:19.290525 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.61 10.1.248.3:6842/3728 is reporting failure:1
2018-01-30 06:40:19.290546 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.61 10.1.248.3:6842/3728
2018-01-30 06:40:19.316574 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.39 10.1.248.2:6822/3732 is reporting failure:1
2018-01-30 06:40:19.316592 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.39 10.1.248.2:6822/3732
2018-01-30 06:40:19.361317 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.44 10.1.248.2:6800/3683 is reporting failure:1
2018-01-30 06:40:19.361335 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.44 10.1.248.2:6800/3683
2018-01-30 06:40:19.918065 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424401 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 0 entries (version 424401) 
2018-01-30 06:40:20.075119 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.41 10.1.248.2:6830/3677 is reporting failure:1
2018-01-30 06:40:20.075141 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.41 10.1.248.2:6830/3677
2018-01-30 06:40:20.493001 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.56 10.1.248.3:6800/3713 is reporting failure:1
2018-01-30 06:40:20.493023 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.56 10.1.248.3:6800/3713
2018-01-30 06:40:20.787217 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.48 10.1.248.3:6815/3670 is reporting failure:1
2018-01-30 06:40:20.787238 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.48 10.1.248.3:6815/3670
2018-01-30 06:40:20.982180 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.21 10.1.248.1:6800/3711 is reporting failure:1
2018-01-30 06:40:20.982198 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.21 10.1.248.1:6800/3711
2018-01-30 06:40:20.984336 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424402 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 0 entries (version 424402) 
2018-01-30 06:40:21.152463 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.13 10.1.248.1:6842/3700 is reporting failure:1
2018-01-30 06:40:21.152480 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.13 10.1.248.1:6842/3700
2018-01-30 06:40:22.083900 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.38 10.1.248.2:6876/3764 is reporting failure:1
2018-01-30 06:40:22.083920 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.38 10.1.248.2:6876/3764
2018-01-30 06:40:22.117579 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424403 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 0 entries (version 424403) 
2018-01-30 06:40:22.179713 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.89 10.1.248.3:6817/3679 is reporting failure:1
2018-01-30 06:40:22.179737 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.89 10.1.248.3:6817/3679
2018-01-30 06:40:22.318619 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.69 10.1.248.3:6812/3716 is reporting failure:1
2018-01-30 06:40:22.318644 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.69 10.1.248.3:6812/3716
2018-01-30 06:40:22.698652 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.53 10.1.248.3:6843/3701 is reporting failure:1
2018-01-30 06:40:22.698673 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.53 10.1.248.3:6843/3701
2018-01-30 06:40:22.842870 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.23 10.1.248.1:6831/3663 is reporting failure:1
2018-01-30 06:40:22.842886 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.23 10.1.248.1:6831/3663
2018-01-30 06:40:23.183964 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424404 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 0 entries (version 424404) 
2018-01-30 06:40:23.259187 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.25 10.1.248.2:6825/3713 is reporting failure:1
2018-01-30 06:40:23.259203 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.25 10.1.248.2:6825/3713
2018-01-30 06:40:23.281029 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.62 10.1.248.3:6872/3735 is reporting failure:1
2018-01-30 06:40:23.281041 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.62 10.1.248.3:6872/3735
2018-01-30 06:40:23.771737 7f2407702700  1 mon.ubuntuser8@0(leader).osd e3016 prepare_failure osd.90 10.1.248.4:6830/3552 from osd.92 10.1.248.4:6801/3581 is reporting failure:1
2018-01-30 06:40:23.771749 7f2407702700  0 log_channel(cluster) log [DBG] : osd.90 10.1.248.4:6830/3552 reported failed by osd.92 10.1.248.4:6801/3581
2018-01-30 06:40:24.317177 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424405 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 2 entries (version 424405) 
2018-01-30 06:40:31.694701 7f2409f07700  0 log_channel(cluster) log [WRN] : Health check update: 3 slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-01-30 06:40:31.830914 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424406 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 1 entries (version 424406) 
2018-01-30 06:40:37.702118 7f2409f07700  0 log_channel(cluster) log [WRN] : Health check update: 4 slow requests are blocked > 32 sec (REQUEST_SLOW)
2018-01-30 06:40:37.845087 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424407 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 1 entries (version 424407) 
2018-01-30 06:40:38.730131 7f2409f07700  1 mon.ubuntuser8@0(leader).osd e3016  we have enough reporters to mark osd.90 down
2018-01-30 06:40:38.730162 7f2409f07700  0 log_channel(cluster) log [INF] : osd.90 failed (root=default,host=ubuntuser4) (4 reporters from different host after 34.958423 >= grace 31.094427)
2018-01-30 06:40:38.730952 7f2409f07700  0 log_channel(cluster) log [WRN] : Health check failed: 1 osds down (OSD_DOWN)
2018-01-30 06:40:38.795202 7f2402ef9700  1 mon.ubuntuser8@0(leader).osd e3017 e3017: 96 total, 92 up, 93 in
2018-01-30 06:40:38.830011 7f2402ef9700  0 log_channel(cluster) log [DBG] : osdmap e3017: 96 total, 92 up, 93 in
2018-01-30 06:40:38.830254 7f2407702700  0 mon.ubuntuser8@0(leader) e1 handle_command mon_command({"prefix": "osd metadata", "id": 17} v 0) v1 
2018-01-30 06:40:38.830299 7f2407702700  0 log_channel(audit) log [DBG] : from='client.164108 10.1.248.8:0/3388257888' entity='mgr.ubuntuser8' cmd=[{"prefix": "osd metadata", "id": 17}]: dispatch
2018-01-30 06:40:38.911919 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424408 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 2 entries (version 424408) 
2018-01-30 06:40:39.861824 7f2402ef9700  1 mon.ubuntuser8@0(leader).osd e3018 e3018: 96 total, 92 up, 93 in
2018-01-30 06:40:39.895967 7f2402ef9700  0 log_channel(cluster) log [DBG] : osdmap e3018: 96 total, 92 up, 93 in
2018-01-30 06:40:39.896243 7f2407702700  0 mon.ubuntuser8@0(leader) e1 handle_command mon_command({"prefix": "osd metadata", "id": 17} v 0) v1 
2018-01-30 06:40:39.896291 7f2407702700  0 log_channel(audit) log [DBG] : from='client.164108 10.1.248.8:0/3388257888' entity='mgr.ubuntuser8' cmd=[{"prefix": "osd metadata", "id": 17}]: dispatch
2018-01-30 06:40:39.911825 7f2409f07700  0 log_channel(cluster) log [WRN] : Health check update: Reduced data availability: 7 pgs peering, 10 pgs stale (PG_AVAILABILITY)
2018-01-30 06:40:39.911883 7f2409f07700  0 log_channel(cluster) log [WRN] : Health check failed: Degraded data redundancy: 7 pgs unclean (PG_DEGRADED)
2018-01-30 06:40:40.928452 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424409 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 2 entries (version 424409) 
2018-01-30 06:40:43.544073 7f2402ef9700  1 mon.ubuntuser8@0(leader).log v424410 check_sub sending message to client.164108 10.1.248.8:0/3388257888 with 5 entries (version 424410) 
2018-01-30 06:40:43.761429 7f2409f07700  0 log_channel(cluster) log [WRN] : Health check update: 5 slow requests are blocked > 32 sec (REQUEST_SLOW)

2018-01-30 16:07 GMT+08:00 blackpiglet J. <blackpigletbruce@xxxxxxxxx>:
Guys,
We had set up a five nodes Ceph cluster. Four are OSD servers and the other one is MON and MGR.
Recently, during RGW stability test, RGW default pool: default.rgw.buckets.data is accidentally written full. As a result, RGW is stuck. We don't know the exact steps to recover, then we deleted all default RGW pools directly. After RGW restarted, all pools are back.

Then I found 4 OSD processes are down. I am not whether this is related to our RGW pool delete operation. 

This a the log I think may be useful. The full version is in the attachment. 

Any help is appreciated. Thanks in advance.

2018-01-29 20:17:48.896544 7fc111122700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fc0fb912700' had timed out after 15
2018-01-29 20:17:52.360069 7fc110183700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fc0fb912700' had timed out after 15
2018-01-29 20:17:57.360192 7fc110183700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fc0fb912700' had timed out after 15
2018-01-29 20:18:02.360315 7fc110183700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7fc0fb912700' had timed out after 15
2018-01-29 20:18:06.029867 7fc0fb912700  1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7fc0fb912700' had timed out after 15
2018-01-29 20:18:06.030576 7fc0fc914700  1 osd.9 pg_epoch: 2481 pg[6.18( v 2276'10 (0'0,2276'10] local-lis/les=2479/2480 n=3 ec=704/704 lis/c 2479/2470 les/c/f 2480/2471/2312 2481/2481/2476) [9,91,51] r=0 lpr=2481 pi=[2470,2481)/2 luod=
0'0 crt=2276'10 lcod 2151'9 mlcod 0'0 active] start_peering_interval up [9,51] -> [9,91,51], acting [9,51] -> [9,91,51], acting_primary 9 -> 9, up_primary 9 -> 9, role 0 -> 0, features acting 2305244844532236283 upacting 230524484453223
6283
2018-01-29 20:18:06.030774 7fc0fc914700  1 osd.9 pg_epoch: 2481 pg[6.18( v 2276'10 (0'0,2276'10] local-lis/les=2479/2480 n=3 ec=704/704 lis/c 2479/2470 les/c/f 2480/2471/2312 2481/2481/2476) [9,91,51] r=0 lpr=2481 pi=[2470,2481)/2 crt=2
276'10 lcod 2151'9 mlcod 0'0 unknown] state<Start>: transitioning to Primary
2018-01-29 20:18:06.031236 7fc0fc914700  1 osd.9 pg_epoch: 2481 pg[6.1d( v 2316'568 (0'0,2316'568] local-lis/les=2479/2480 n=2 ec=704/704 lis/c 2479/2470 les/c/f 2480/2471/2312 2481/2481/2481) [91,79,9] r=2 lpr=2481 pi=[2470,2481)/2 luo
d=0'0 crt=2316'568 lcod 2312'567 active] start_peering_interval up [79,9] -> [91,79,9], acting [79,9] -> [91,79,9], acting_primary 79 -> 91, up_primary 79 -> 91, role 1 -> 2, features acting 2305244844532236283 upacting 2305244844532236
283
2018-01-29 20:18:06.031324 7fc0fc914700  1 osd.9 pg_epoch: 2481 pg[6.1d( v 2316'568 (0'0,2316'568] local-lis/les=2479/2480 n=2 ec=704/704 lis/c 2479/2470 les/c/f 2480/2471/2312 2481/2481/2481) [91,79,9] r=2 lpr=2481 pi=[2470,2481)/2 crt
=2316'568 lcod 2312'567 unknown NOTIFY] state<Start>: transitioning to Stray
2018-01-29 20:18:06.032167 7fc112124700  0 -- 10.1.248.1:6850/6003656 >> 10.1.248.4:6825/1003558 conn(0x556089bf4800 :6850 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=5245 cs=1 l=0).handle_connect_reply connect got RESETSESSION
2018-01-29 20:18:06.032184 7fc111923700  0 -- 10.1.248.1:6850/6003656 >> 10.1.248.3:6827/3726 conn(0x5560c0c06000 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=6734 cs=1 l=0).handle_connect_reply connect got RESETSESSION
2018-01-29 20:18:06.053899 7fc103121700  0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.9 down, but it is still running
2018-01-29 20:18:06.053917 7fc103121700  0 log_channel(cluster) log [DBG] : map e2485 wrongly marked me down at e2483
2018-01-29 20:18:06.053922 7fc103121700  0 osd.9 2485 _committed_osd_maps marked down 6 > osd_max_markdown_count 5 in last 600.000000 seconds, shutting down
2018-01-29 20:18:06.053927 7fc103121700  1 osd.9 2485 start_waiting_for_healthy
2018-01-29 20:18:06.064250 7fc0fc914700  1 osd.9 pg_epoch: 2483 pg[10.53( empty local-lis/les=2476/2477 n=0 ec=1815/1815 lis/c 2476/2476 les/c/f 2477/2477/2312 2483/2483/2483) [] r=-1 lpr=2483 pi=[2476,2483)/1 crt=0'0 active] start_peer
ing_interval up [9] -> [], acting [9] -> [], acting_primary 9 -> -1, up_primary 9 -> -1, role 0 -> -1, features acting 2305244844532236283 upacting 2305244844532236283
2018-01-29 20:18:06.064258 7fc0fd115700  1 osd.9 pg_epoch: 2483 pg[3.16( v 2284'372 (0'0,2284'372] local-lis/les=2476/2477 n=2 ec=1758/130 lis/c 2476/2476 les/c/f 2477/2477/2312 2483/2483/1760) [33,85] r=-1 lpr=2483 pi=[2476,2483)/1 luo
d=0'0 crt=2284'372 lcod 2284'371 active] start_peering_interval up [33,9,85] -> [33,85], acting [33,9,85] -> [33,85], acting_primary 33 -> 33, up_primary 33 -> 33, role 1 -> -1, features acting 2305244844532236283 upacting 2305244844532
236283
2018-01-29 20:18:06.064493 7fc0fc914700  1 osd.9 pg_epoch: 2485 pg[10.53( empty local-lis/les=2476/2477 n=0 ec=1815/1815 lis/c 2476/2476 les/c/f 2477/2477/2312 2483/2483/2483) [] r=-1 lpr=2483 pi=[2476,2483)/1 crt=0'0 unknown NOTIFY] st
ate<Start>: transitioning to Stray
2018-01-29 20:18:06.064545 7fc0fd115700  1 osd.9 pg_epoch: 2485 pg[3.16( v 2284'372 (0'0,2284'372] local-lis/les=2476/2477 n=2 ec=1758/130 lis/c 2476/2476 les/c/f 2477/2477/2312 2483/2483/1760) [33,85] r=-1 lpr=2483 pi=[2476,2483)/1 crt
=2284'372 lcod 2284'371 unknown NOTIFY] state<Start>: transitioning to Stray
2018-01-29 20:18:06.064886 7fc0fc914700  1 osd.9 pg_epoch: 2483 pg[10.3f( empty local-lis/les=2476/2477 n=0 ec=1815/1815 lis/c 2476/2476 les/c/f 2477/2477/2312 2483/2483/2483) [] r=-1 lpr=2483 pi=[2476,2483)/1 crt=0'0 active] start_peer
ing_interval up [9] -> [], acting [9] -> [], acting_primary 9 -> -1, up_primary 9 -> -1, role 0 -> -1, features acting 2305244844532236283 upacting 2305244844532236283
2018-01-29 20:18:06.064972 7fc0fd115700  1 osd.9 pg_epoch: 2483 pg[6.42( v 2298'367 (0'0,2298'367] local-lis/les=2476/2477 n=1 ec=1753/704 lis/c 2476/2476 les/c/f 2477/2477/2312 2483/2483/2367) [24,84] r=-1 lpr=2483 pi=[2476,2483)/1 luo
d=0'0 crt=2298'367 lcod 2298'366 active] start_peering_interval up [24,84,9] -> [24,84], acting [24,84,9] -> [24,84], acting_primary 24 -> 24, up_primary 24 -> 24, role 2 -> -1, features acting 2305244844532236283 upacting 2305244844532
236283
2018-01-29 20:18:06.065091 7fc0fc914700  1 osd.9 pg_epoch: 2485 pg[10.3f( empty local-lis/les=2476/2477 n=0 ec=1815/1815 lis/c 2476/2476 les/c/f 2477/2477/2312 2483/2483/2483) [] r=-1 lpr=2483 pi=[2476,2483)/1 crt=0'0 unknown NOTIFY] st
ate<Start>: transitioning to Stray

2018-01-29 20:18:06.072202 7fc0fc914700  1 osd.9 pg_epoch: 2483 pg[5.3c( v 2312'24 (0'0,2312'24] local-lis/les=2476/2477 n=1 ec=1748/700 lis/c 2476/2476 les/c/f 2477/2477/2312 2483/2483/2483) [32,66] r=-1 lpr=2483 pi=[2476,2483)/1 luod=
0'0 crt=2312'24 lcod 2312'23 active] start_peering_interval up [9,32,66] -> [32,66], acting [9,32,66] -> [32,66], acting_primary 9 -> 32, up_primary 9 -> 32, role 0 -> -1, features acting 2305244844532236283 upacting 2305244844532236283
2018-01-29 20:18:06.072324 7fc0fc914700  1 osd.9 pg_epoch: 2485 pg[5.3c( v 2312'24 (0'0,2312'24] local-lis/les=2476/2477 n=1 ec=1748/700 lis/c 2476/2476 les/c/f 2477/2477/2312 2483/2483/2483) [32,66] r=-1 lpr=2483 pi=[2476,2483)/1 crt=2
312'24 lcod 2312'23 unknown NOTIFY] state<Start>: transitioning to Stray
2018-01-29 20:18:06.072463 7fc103121700  0 osd.9 2485 _committed_osd_maps shutdown OSD via async signal
2018-01-29 20:18:06.072570 7fc0f4904700 -1 Fail to open '/proc/0/cmdline' error = (2) No such file or directory
2018-01-29 20:18:06.072597 7fc0f4904700 -1 received  signal: Interrupt from  PID: 0 task name: <unknown> UID: 0
2018-01-29 20:18:06.072603 7fc0f4904700 -1 osd.9 2485 *** Got signal Interrupt ***
2018-01-29 20:18:06.072609 7fc0f4904700  0 osd.9 2485 prepare_to_stop starting shutdown
2018-01-29 20:18:06.072614 7fc0f4904700 -1 osd.9 2485 shutdown
2018-01-29 20:18:06.429238 7fc10d936700  0 log_channel(cluster) log [WRN] : 2 slow requests, 2 included below; oldest blocked for > 55.682037 secs
2018-01-29 20:18:06.429251 7fc10d936700  0 log_channel(cluster) log [WRN] : slow request 55.682037 seconds old, received at 2018-01-29 20:17:10.747125: pg_notify((query:2482 sent:2482 6.18( v 2276'10 (0'0,2276'10] local-lis/les=2473/247
4 n=3 ec=704/704 lis/c 2473/2470 les/c/f 2474/2471/2312 2481/2481/2476))=([2470,2480] intervals=([2473,2475] acting 51,91),([2479,2480] acting 9,51)) epoch 2482) currently wait for new map
2018-01-29 20:18:06.429265 7fc10d936700  0 log_channel(cluster) log [WRN] : slow request 55.130508 seconds old, received at 2018-01-29 20:17:11.298655: pg_notify((query:2482 sent:2482 6.18( v 2276'10 (0'0,2276'10] local-lis/les=2479/248
0 n=3 ec=704/704 lis/c 2479/2470 les/c/f 2480/2471/2312 2481/2481/2476))=([2470,2480] intervals=([2473,2475] acting 51,91),([2479,2480] acting 9,51)) epoch 2482) currently wait for new map
2018-01-29 20:18:08.117736 7fc0f4904700  1 bluestore(/var/lib/ceph/osd/ceph-9) umount
2018-01-29 20:18:08.317882 7fc0f4904700  1 stupidalloc shutdown
2018-01-29 20:18:08.322660 7fc0f4904700  1 freelist shutdown
2018-01-29 20:18:08.322718 7fc0f4904700  4 rocksdb: [/build/ceph-12.2.2/src/rocksdb/db/db_impl.cc:217] Shutdown: canceling all background work
2018-01-29 20:18:08.339096 7fc0f4904700  4 rocksdb: [/build/ceph-12.2.2/src/rocksdb/db/db_impl.cc:343] Shutdown complete
2018-01-29 20:18:08.508354 7fc0f4904700  1 bluefs umount
2018-01-29 20:18:08.508555 7fc0f4904700  1 stupidalloc shutdown
2018-01-29 20:18:08.508571 7fc0f4904700  1 stupidalloc shutdown
2018-01-29 20:18:08.508573 7fc0f4904700  1 stupidalloc shutdown
2018-01-29 20:18:08.508718 7fc0f4904700  1 bdev(0x5560701c7440 /dev/nvme0n1p19) close
2018-01-29 20:18:08.786803 7fc0f4904700  1 bdev(0x5560701c6fc0 /dev/nvme0n1p20) close
2018-01-29 20:18:08.874747 7fc0f4904700  1 bdev(0x5560701c7200 /var/lib/ceph/osd/ceph-9/block) close
2018-01-29 20:18:09.030750 7fc0f4904700  1 bdev(0x5560701c6d80 /var/lib/ceph/osd/ceph-9/block) close

BR,
Bruce J.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com