Re: Replacing a failed disk/OSD: unfound object

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




The test setting is like this:
I build a ceph3.0 system with 3mon, 3mds, 3osd on 3 machines.
Then I copied some file around. on a ceph client and wait until ceph -w shows regular healthy states.
Now I turn one machine off.

Here are the logs for this situation of unfound object:please look for the "=>" mark for events of interest
ceph -w
=========

2011-07-13 14:39:43.157935 mon e1: 3 mons at {alpha=192.168.0.135:6789/0,beta=192.168.0.136:6789/0,gamma=192.168.0.137:6789/0}
2011-07-13 14:59:11.519250   osd e179: 3 osds: 3 up, 3 in
2011-07-13 15:01:17.355846 pg v1099: 602 pgs: 602 active+clean; 349 MB data, 1778 MB used, 920 MB / 3069 MB avail
=>ceph0 shutdown  (with mon0, mdsalpa, osd0)
2011-07-13 16:17:22.232696 7f3ac1cd1700 monclient: hunting for new mon
2011-07-13 16:17:34.887551 log 2011-07-13 16:17:29.515704 mon1 192.168.0.136:6789/0 2 : [INF] mon.beta calling new monitor election 2011-07-13 16:17:34.887551 log 2011-07-13 16:17:29.633224 mon2 192.168.0.137:6789/0 9 : [INF] mon.gamma calling new monitor election 2011-07-13 16:17:34.887551 log 2011-07-13 16:17:34.530619 mon1 192.168.0.136:6789/0 3 : [INF] mon.beta@1 won leader election with quorum 1,2 2011-07-13 16:17:47.744681 log 2011-07-13 16:17:47.521582 mon1 192.168.0.136:6789/0 4 : [INF] osd0 192.168.0.135:6801/6165 failed (by osd2 192.168.0.137:6801/9292) 2011-07-13 16:17:48.754993 log 2011-07-13 16:17:47.589716 mon1 192.168.0.136:6789/0 5 : [INF] osd0 192.168.0.135:6801/6165 failed (by osd1 192.168.0.136:6801/2984)
2011-07-13 16:17:52.742905   osd e180: 3 osds: 2 up, 3 in
2011-07-13 16:17:52.754959 log 2011-07-13 16:17:52.524626 mon1 192.168.0.136:6789/0 6 : [INF] osd0 192.168.0.135:6801/6165 failed (by osd2 192.168.0.137:6801/9292) 2011-07-13 16:17:52.767674 pg v1100: 602 pgs: 602 active+clean; 349 MB data, 1778 MB used, 920 MB / 3069 MB avail
2011-07-13 16:17:53.751359   osd e181: 3 osds: 2 up, 3 in
2011-07-13 16:17:53.770067 pg v1101: 602 pgs: 602 active+clean; 349 MB data, 1778 MB used, 920 MB / 3069 MB avail
2011-07-13 16:17:54.049235   osd e182: 3 osds: 2 up, 3 in
2011-07-13 16:17:54.062753 mds e108: 1/1/1 up {0=beta=up:replay}, 1 up:standby 2011-07-13 16:17:54.073946 pg v1102: 602 pgs: 602 active+clean; 349 MB data, 1778 MB used, 920 MB / 3069 MB avail 2011-07-13 16:17:58.904299 pg v1103: 602 pgs: 233 active+clean, 369 active+clean+degraded; 349 MB data, 1790 MB used, 914 MB / 3069 MB avail; 67/224 degraded (29.911%) 2011-07-13 16:17:59.083432 mds e109: 1/1/1 up {0=beta=up:reconnect}, 1 up:standby 2011-07-13 16:17:59.105245 mds e110: 1/1/1 up {0=beta=up:rejoin}, 1 up:standby 2011-07-13 16:17:59.121004 mds e111: 1/1/1 up {0=beta=up:active}, 1 up:standby 2011-07-13 16:17:59.147149 log 2011-07-13 16:17:58.970253 mon1 192.168.0.136:6789/0 7 : [INF] mds0 192.168.0.136:6800/2862 up:reconnect 2011-07-13 16:17:59.147149 log 2011-07-13 16:17:58.993575 mon1 192.168.0.136:6789/0 8 : [INF] mds0 192.168.0.136:6800/2862 up:rejoin 2011-07-13 16:17:59.147149 log 2011-07-13 16:17:59.008454 mon1 192.168.0.136:6789/0 9 : [INF] mds0 192.168.0.136:6800/2862 up:active 2011-07-13 16:18:01.522886 log 2011-07-13 16:17:58.980040 mds0 192.168.0.136:6800/2862 1 : [DBG] reconnect by client4101 192.168.0.138:0/384680305 after 0.009430 2011-07-13 16:18:03.746935 pg v1104: 602 pgs: 233 active+clean, 369 active+clean+degraded; 349 MB data, 1795 MB used, 910 MB / 3069 MB avail; 67/224 degraded (29.911%) 2011-07-13 16:18:13.762497 pg v1105: 602 pgs: 233 active+clean, 369 active+clean+degraded; 349 MB data, 1796 MB used, 909 MB / 3069 MB avail; 67/224 degraded (29.911%) 2011-07-13 16:20:03.752175 pg v1106: 602 pgs: 233 active+clean, 369 active+clean+degraded; 349 MB data, 1791 MB used, 909 MB / 3069 MB avail; 67/224 degraded (29.911%) 2011-07-13 16:20:13.796924 pg v1107: 602 pgs: 233 active+clean, 369 active+clean+degraded; 349 MB data, 1786 MB used, 909 MB / 3069 MB avail; 67/224 degraded (29.911%)
=>osd0 is considered out and replication start
2011-07-13 16:22:54.219786   osd e183: 3 osds: 2 up, 2 in
2011-07-13 16:22:54.234591 pg v1108: 602 pgs: 233 active+clean, 369 active+clean+degraded; 349 MB data, 1260 MB used, 537 MB / 2046 MB avail; 67/224 degraded (29.911%) 2011-07-13 16:22:54.254039 log 2011-07-13 16:22:54.116461 mon1 192.168.0.136:6789/0 10 : [INF] osd0 out (down for 301.489741)
2011-07-13 16:23:02.684313   osd e184: 3 osds: 2 up, 2 in
2011-07-13 16:23:02.698807 pg v1109: 602 pgs: 233 active+clean, 369 active+clean+degraded; 349 MB data, 1260 MB used, 537 MB / 2046 MB avail; 67/224 degraded (29.911%)
2011-07-13 16:23:07.995143   osd e185: 3 osds: 2 up, 2 in
2011-07-13 16:23:08.009328 pg v1110: 602 pgs: 9 active, 402 active+clean, 8 peering, 183 active+clean+degraded; 349 MB data, 1273 MB used, 527 MB / 2046 MB avail; 54/224 degraded (24.107%) 2011-07-13 16:23:08.139752 pg v1111: 602 pgs: 41 active, 552 active+clean, 9 peering; 349 MB data, 1288 MB used, 514 MB / 2046 MB avail; 53/224 degraded (23.661%)
2011-07-13 16:23:09.001543   osd e186: 3 osds: 2 up, 2 in
2011-07-13 16:23:09.027121 pg v1112: 602 pgs: 41 active, 552 active+clean, 9 peering; 349 MB data, 1288 MB used, 514 MB / 2046 MB avail; 53/224 degraded (23.661%)
2011-07-13 16:23:09.311325   osd e187: 3 osds: 2 up, 2 in
2011-07-13 16:23:09.433629 pg v1113: 602 pgs: 41 active, 552 active+clean, 9 peering; 349 MB data, 1288 MB used, 514 MB / 2046 MB avail; 53/224 degraded (23.661%)
2011-07-13 16:23:10.399307   osd e188: 3 osds: 2 up, 2 in
2011-07-13 16:23:10.414293 pg v1114: 602 pgs: 41 active, 552 active+clean, 9 peering; 349 MB data, 1288 MB used, 514 MB / 2046 MB avail; 53/224 degraded (23.661%)
2011-07-13 16:23:11.408559   osd e189: 3 osds: 2 up, 2 in
2011-07-13 16:23:11.427273 pg v1115: 602 pgs: 41 active, 552 active+clean, 9 peering; 349 MB data, 1288 MB used, 514 MB / 2046 MB avail; 53/224 degraded (23.661%)
2011-07-13 16:23:12.329049   osd e190: 3 osds: 2 up, 2 in
2011-07-13 16:23:12.365014 pg v1116: 602 pgs: 41 active, 552 active+clean, 9 peering; 349 MB data, 1288 MB used, 514 MB / 2046 MB avail; 53/224 degraded (23.661%) 2011-07-13 16:23:14.973260 pg v1117: 602 pgs: 49 active, 552 active+clean, 1 peering; 349 MB data, 1301 MB used, 504 MB / 2046 MB avail; 66/224 degraded (29.464%) 2011-07-13 16:23:17.129235 pg v1118: 602 pgs: 50 active, 552 active+clean; 349 MB data, 1317 MB used, 489 MB / 2046 MB avail; 67/224 degraded (29.911%) 2011-07-13 16:23:30.004135 pg v1119: 602 pgs: 46 active, 556 active+clean; 349 MB data, 1329 MB used, 474 MB / 2046 MB avail; 62/224 degraded (27.679%) 2011-07-13 16:23:35.271864 pg v1120: 602 pgs: 40 active, 562 active+clean; 349 MB data, 1373 MB used, 428 MB / 2046 MB avail; 53/224 degraded (23.661%) 2011-07-13 16:23:45.093097 pg v1121: 602 pgs: 38 active, 564 active+clean; 349 MB data, 1394 MB used, 408 MB / 2046 MB avail; 49/224 degraded (21.875%)
=>for some reason osd2 failed during object replication
2011-07-13 16:24:00.435405 log 2011-07-13 16:24:00.288017 mon1 192.168.0.136:6789/0 11 : [INF] osd2 192.168.0.137:6801/9292 failed (by osd1 192.168.0.136:6801/2984) 2011-07-13 16:24:05.579361 log 2011-07-13 16:24:05.424065 mon1 192.168.0.136:6789/0 12 : [INF] osd2 192.168.0.137:6801/9292 failed (by osd1 192.168.0.136:6801/2984)
2011-07-13 16:24:10.583691   osd e191: 3 osds: 1 up, 2 in
2011-07-13 16:24:10.594086 log 2011-07-13 16:24:10.455936 mon1 192.168.0.136:6789/0 13 : [INF] osd2 192.168.0.137:6801/9292 failed (by osd1 192.168.0.136:6801/2984) 2011-07-13 16:24:10.614368 pg v1122: 602 pgs: 38 active, 564 active+clean; 349 MB data, 1394 MB used, 408 MB / 2046 MB avail; 49/224 degraded (21.875%)
2011-07-13 16:24:11.590134   osd e192: 3 osds: 1 up, 2 in
2011-07-13 16:24:11.607345 pg v1123: 602 pgs: 38 active, 564 active+clean; 349 MB data, 1394 MB used, 408 MB / 2046 MB avail; 49/224 degraded (21.875%) 2011-07-13 16:24:15.739147 pg v1124: 602 pgs: 25 active+degraded, 577 active+clean+degraded; 349 MB data, 1404 MB used, 402 MB / 2046 MB avail; 143/224 degraded (63.839%); 31/112 unfound (27.679%) 2011-07-13 16:26:15.845867 pg v1125: 602 pgs: 25 active+degraded, 577 active+clean+degraded; 349 MB data, 1402 MB used, 402 MB / 2046 MB avail; 143/224 degraded (63.839%); 31/112 unfound (27.679%)
=>osd2 considered out
2011-07-13 16:29:14.486774   osd e193: 3 osds: 1 up, 1 in
2011-07-13 16:29:14.504393 pg v1126: 602 pgs: 25 active+degraded, 577 active+clean+degraded; 349 MB data, 592 MB used, 308 MB / 1023 MB avail; 143/224 degraded (63.839%); 31/112 unfound (27.679%) 2011-07-13 16:29:14.514343 log 2011-07-13 16:29:14.394234 mon1 192.168.0.136:6789/0 14 : [INF] osd2 out (down for 303.874867) 2011-07-13 16:29:16.077618 pg v1127: 602 pgs: 25 active+degraded, 577 active+clean+degraded; 349 MB data, 594 MB used, 308 MB / 1023 MB avail; 143/224 degraded (63.839%); 31/112 unfound (27.679%) 2011-07-13 16:31:16.098224 pg v1128: 602 pgs: 25 active+degraded, 577 active+clean+degraded; 349 MB data, 592 MB used, 308 MB / 1023 MB avail; 143/224 degraded (63.839%); 31/112 unfound (27.679%)

osd.2.log
=============
2011-07-13 14:51:37.542028 7f3b6de13700 osd2 178 pg[0.53( v 178'1 (0'0,178'1] n=1 ec=2 les/c 166/178 163/163/155) [2,0] r=0 mlcod 0'0 !hml active+clean] sending commit on repgather(0x7f3b64075350 applied 178'1 rep_tid=266 wfack= wfdisk= op=osd_op(client4101.1:375 10000000007.0000000a [write 0~4194304 [1@-1]] 0.43d3 snapc 1=[])) 0x7f3b6411f840 2011-07-13 14:51:40.448487 7f3b6de13700 osd2 178 pg[0.6e( v 178'2 (0'0,178'2] n=2 ec=2 les/c 172/178 170/170/155) [2,1] r=0 mlcod 0'0 !hml active+clean] sending commit on repgather(0x7f3b6415f090 applied 178'2 rep_tid=267 wfack= wfdisk= op=osd_op(client4101.1:376 10000000007.0000000b [write 0~4194304 [1@-1]] 0.ad6e snapc 1=[])) 0x7f3b640d13e0 2011-07-13 14:51:42.524448 7f3b6e614700 osd2 178 pg[0.62( v 178'2 (0'0,178'2] n=2 ec=2 les/c 166/178 163/163/155) [2,0] r=0 mlcod 0'0 !hml active+clean] sending commit on repgather(0x7f3b640dd560 applied 178'2 rep_tid=268 wfack= wfdisk= op=osd_op(client4101.1:377 10000000007.0000000c [write 0~4194304 [1@-1]] 0.7e62 snapc 1=[])) 0x7f3b6414fb80 2011-07-13 14:51:43.264678 7f3b6e614700 osd2 178 pg[0.8( v 178'2 (0'0,178'2] n=2 ec=2 les/c 166/178 163/163/155) [2,0] r=0 mlcod 0'0 !hml active+clean] sending commit on repgather(0x7f3b6011de10 applied 178'2 rep_tid=271 wfack= wfdisk= op=osd_op(client4101.1:398 10000000007.00000021 [write 0~4194304 [1@-1]] 0.ef08 snapc 1=[])) 0x7f3b600e8010 2011-07-13 14:51:43.264801 7f3b6e614700 osd2 178 pg[0.2e( v 178'1 (0'0,178'1] n=1 ec=2 les/c 166/178 163/163/155) [2,0] r=0 mlcod 0'0 !hml active+clean] sending commit on repgather(0x2dc09f0 applied 178'1 rep_tid=273 wfack= wfdisk= op=osd_op(client4101.1:404 10000000007.00000027 [write 0~269567 [1@-1]] 0.1a2e snapc 1=[])) 0x2e4b8602011-07-13 14:51:43.413590 7f3b6de13700 osd2 178 pg[0.6f( v 178'2 (0'0,178'2] n=2 ec=2 les/c 172/178 170/170/155) [2,1] r=0 mlcod 0'0 !hml active+clean] sending commit on repgather(0x7f3b600b2310 applied 178'2 rep_tid=269 wfack= wfdisk= op=osd_op(client4101.1:379 10000000007.0000000e [write 0~4194304 [1@-1]] 0.8def snapc 1=[])) 0x7f3b6005a340 2011-07-13 14:51:44.581422 7f3b6de13700 osd2 178 pg[0.52( v 178'2 (0'0,178'2] n=2 ec=2 les/c 172/178 170/170/155) [2,1] r=0 mlcod 0'0 !hml active+clean] sending commit on repgather(0x7f3b65d89090 applied 178'2 rep_tid=270 wfack= wfdisk= op=osd_op(client4101.1:393 10000000007.0000001c [write 0~4194304 [1@-1]] 0.9a52 snapc 1=[])) 0x7f3b6406f190 2011-07-13 14:51:44.581630 7f3b6de13700 osd2 178 pg[0.4c( v 178'2 (0'0,178'2] n=2 ec=2 les/c 172/178 170/170/155) [2,1] r=0 mlcod 0'0 !hml active+clean] sending commit on repgather(0x7f3b6008eca0 applied 178'2 rep_tid=272 wfack= wfdisk= op=osd_op(client4101.1:399 10000000007.00000022 [write 0~4194304 [1@-1]] 0.dacc snapc 1=[])) 0x7f3b60139550
=>something already wrong but ceph -w does not know
2011-07-13 14:52:59.465626 7f3b70719700 osd2 178 OSD::ms_handle_reset()
2011-07-13 14:52:59.465731 7f3b70719700 osd2 178 OSD::ms_handle_reset() s=0x2dc0740
2011-07-13 15:06:15.518322 7f3b70719700 osd2 179 OSD::ms_handle_reset()
2011-07-13 15:06:15.518517 7f3b70719700 osd2 179 OSD::ms_handle_reset() s=0x2de6ce0 2011-07-13 15:14:11.831878 7f3b6bb0c700 -- 192.168.0.137:6802/9292 >> 192.168.0.135:6802/6165 pipe(0x7f3b64099ab0 sd=13 pgs=1 cs=1 l=0).fault with nothing to send, going to standby 2011-07-13 15:14:11.880194 7f3b6b304700 -- 192.168.0.137:6802/9292 >> 192.168.0.136:6802/2984 pipe(0x2ce1a20 sd=15 pgs=2 cs=1 l=0).fault with nothing to send, going to standby
2011-07-13 15:21:15.511170 7f3b70719700 osd2 179 OSD::ms_handle_reset()
2011-07-13 15:21:15.511265 7f3b70719700 osd2 179 OSD::ms_handle_reset() s=0x2bbfc10
2011-07-13 15:36:15.613503 7f3b70719700 osd2 179 OSD::ms_handle_reset()
2011-07-13 15:36:15.613602 7f3b70719700 osd2 179 OSD::ms_handle_reset() s=0x2ac40b0
2011-07-13 15:51:15.707005 7f3b70719700 osd2 179 OSD::ms_handle_reset()
2011-07-13 15:51:15.707101 7f3b70719700 osd2 179 OSD::ms_handle_reset() s=0x2827df0
2011-07-13 16:06:15.779902 7f3b70719700 osd2 179 OSD::ms_handle_reset()
2011-07-13 16:06:15.779992 7f3b70719700 osd2 179 OSD::ms_handle_reset() s=0x2ba3650
2011-07-13 16:17:23.667067 7f3b70719700 osd2 179 OSD::ms_handle_reset()
2011-07-13 16:17:23.667128 7f3b70719700 osd2 179 OSD::ms_handle_reset() s=0x2dc09f0
=>ceph0 shutdown
2011-07-13 16:17:25.043107 7f3b6b708700 -- 192.168.0.137:6803/9292 >> 192.168.0.135:6803/6165 pipe(0x7f3b640e77c0 sd=14 pgs=3 cs=1 l=0).fault initiating reconnect 2011-07-13 16:17:25.044344 7f3b6bd0e700 -- 192.168.0.137:6803/9292 >> 192.168.0.135:6803/6165 pipe(0x7f3b640e77c0 sd=14 pgs=3 cs=2 l=0).fault first fault 2011-07-13 16:17:44.634390 7f3b76f26700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:24.634291) 2011-07-13 16:17:45.143635 7f3b6c610700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:25.143606) 2011-07-13 16:17:45.635063 7f3b76f26700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:25.635027) 2011-07-13 16:17:46.446478 7f3b6c610700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:26.446446) 2011-07-13 16:17:46.635367 7f3b76f26700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:26.635334) 2011-07-13 16:17:46.946921 7f3b6c610700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:26.946890) 2011-07-13 16:17:47.635921 7f3b76f26700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:27.635884) 2011-07-13 16:17:48.347317 7f3b6c610700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:28.347285) 2011-07-13 16:17:48.636694 7f3b76f26700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:28.636657) 2011-07-13 16:17:49.637227 7f3b76f26700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:29.637193) 2011-07-13 16:17:49.647933 7f3b6c610700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:29.647916) 2011-07-13 16:17:50.425012 7f3b6c610700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:30.424981) 2011-07-13 16:17:50.637671 7f3b76f26700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:30.637638) 2011-07-13 16:17:51.025729 7f3b6c610700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:31.025698) 2011-07-13 16:17:51.638261 7f3b76f26700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:31.638165) 2011-07-13 16:17:51.926444 7f3b6c610700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:31.926413) 2011-07-13 16:17:52.526995 7f3b6c610700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:32.526963) 2011-07-13 16:17:52.638956 7f3b76f26700 osd2 179 heartbeat_check: no heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 16:17:32.638936) 2011-07-13 16:17:52.763523 7f3b6b607700 -- 192.168.0.137:6802/9292 >> 192.168.0.136:6802/2984 pipe(0x2ce1a20 sd=13 pgs=2 cs=2 l=0).connect claims to be 0.0.0.0:6802/2984 not 192.168.0.136:6802/2984 - presumably this is the same node! 2011-07-13 16:17:59.235254 7f3b75723700 osd2 182 pg[1.94( v 182'23 (178'20,182'23] n=1 ec=2 les/c 181/182 180/180/155) [2] r=0 mlcod 0'0 !hml active+clean+degraded] sending commit on repgather(0x7f3b602cfba0 applied 182'23 rep_tid=274 wfack= wfdisk= op=osd_op(mds0.11:21 200.00000000 [writefull 0~84] 1.3494) v2) 0x7f3b601ebb90
*** Caught signal (Aborted) **
 in thread 0x7f3b74721700
=>restart osd2
2011-07-13 17:06:00.120011 7fe0f5c57720 ceph version 0.30.commit: 64b1b2c70f0cde39c72d5d724c65ea8afaaa00b9. process: cosd. pid: 10241 2011-07-13 17:06:00.132026 7fe0f5c57720 filestore(/data/osd.2) mount FIEMAP ioctl is NOT supported 2011-07-13 17:06:00.132100 7fe0f5c57720 filestore(/data/osd.2) mount detected btrfs 2011-07-13 17:06:00.132120 7fe0f5c57720 filestore(/data/osd.2) mount btrfs CLONE_RANGE ioctl is supported 2011-07-13 17:06:00.149449 7fe0f5c57720 filestore(/data/osd.2) mount btrfs SNAP_CREATE is supported 2011-07-13 17:06:00.297455 7fe0f5c57720 filestore(/data/osd.2) mount btrfs SNAP_DESTROY is supported 2011-07-13 17:06:00.297786 7fe0f5c57720 filestore(/data/osd.2) mount btrfs START_SYNC got 0 Success 2011-07-13 17:06:00.297847 7fe0f5c57720 filestore(/data/osd.2) mount btrfs START_SYNC is supported (transid 363) 2011-07-13 17:06:00.300147 7fe0f5c57720 filestore(/data/osd.2) mount btrfs WAIT_SYNC is supported 2011-07-13 17:06:00.301704 7fe0f5c57720 filestore(/data/osd.2) mount btrfs SNAP_CREATE_V2 got 0 Success 2011-07-13 17:06:00.301727 7fe0f5c57720 filestore(/data/osd.2) mount btrfs SNAP_CREATE_V2 is supported 2011-07-13 17:06:00.325592 7fe0f5c57720 filestore(/data/osd.2) mount found snaps <24050,24085> 2011-07-13 17:06:00.376720 7fe0f5c57720 filestore(/data/osd.2) mount: enabling PARALLEL journal mode: btrfs, SNAP_CREATE_V2 detected and 'filestore btrfs snap' mode is enabled 2011-07-13 17:06:00.376809 7fe0f5c57720 journal _open /data/osd.2/journal fd 11: 1048576000 bytes, block size 4096 bytes, directio = 1 2011-07-13 17:06:00.470792 7fe0f5c57720 journal read_entry 593817600 : seq 24086 1049540 bytes 2011-07-13 17:06:00.472089 7fe0f5c57720 journal read_entry 593817600 : seq 24086 1049540 bytes
*** Caught signal (Aborted) **
 in thread 0x7fe0f5c57720
 ceph version 0.30 (commit:64b1b2c70f0cde39c72d5d724c65ea8afaaa00b9)
 1: /usr/bin/cosd() [0x637e6e]
 2: (()+0xf430) [0x7fe0f563a430]
 3: (gsignal()+0x35) [0x7fe0f442c355]
 4: (abort()+0x17f) [0x7fe0f442d5ef]
 5: (__assert_fail()+0xf1) [0x7fe0f44254c1]
6: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x3d42) [0x5c4b62] 7: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) [0x5c5135] 8: (JournalingObjectStore::journal_replay(unsigned long)+0xf60) [0x5d54c0]
 9: (FileStore::mount()+0x17e9) [0x5ae489]
 10: (OSD::init()+0x165) [0x50cfd5]
 11: (main()+0x2424) [0x48ba64]
 12: (__libc_start_main()+0xfd) [0x7fe0f4418d2d]
 13: /usr/bin/cosd() [0x4881e9]



It seems cosd  does not escalate error situations


On Tue, 12 Jul 2011 09:06:14 -0700 (PDT), Sage Weil wrote:
On Tue, 12 Jul 2011, Meng Zhao wrote:
Thanks Tommi. I rebuilt the ceph cluster a few times just to reproduce the situation. The result seems mixed, more likely btrfs failed (after power
reset). But it does happen anyway.

The big question is: However rare, unfound object situation makes the *entire* ceph file system not mountable, leads to a total lost of all data. That is quite a risk to take for production system. Is there a way to recover from such situation ?(e.g. remove the file associated with the missing object)

There isn't an easy way to do this currently, but it is very easy to add.
I've added an issue to the tracker.

The real question is whether the lost object is file data or namespace metadata. If it's namespace metadata, you may lose part of the directory hierarchy (until the fsck/namespace rebuild work is done). For file data,
it's easy enough to just mark the unfound object as deleted.

sage




On Fri, 8 Jul 2011 10:08:46 -0700, Tommi Virtanen wrote:
> [It seems I dropped the Cc: to ceph-devel, added it back.. Please
> reply to this message instead, and sorry about that. I'm starting to
> dislike Google Apps for mailing list traffic :( ]
>
> On Fri, Jul 8, 2011 at 10:07, Tommi Virtanen
> <tommi.virtanen@xxxxxxxxxxxxx> wrote:
> > On Fri, Jul 8, 2011 at 01:23, Meng Zhao <mzhao@xxxxxxxxxxxx> wrote: > > > I was trying to replace a disk for an osd by following instruction at:
> > > http://ceph.newdream.net/wiki/Replacing_a_failed_disk/OSD
> > >
> > > Now, ceph -w getting
> > > 2011-07-08 15:52:39.702881    pg v1602: 602 pgs: 49 active+degraded, 553 > > > active+clean+degraded; 349 MB data, 333 MB used, 566 MB / 1023 MB avail;
> > > 167/224 degraded (74.554%); 55/112 unfound (49.107%)
> > >
> > > and a copy operation hang on the ceph client forever. I cannot kill (-9)
> > > the
> > > cp process. Is there any hope to recover my ceph filesystem?
> >
> > I'm pretty sure that the cp is hanging because Ceph chooses to wait in > > case the unfound objects do come back (e.g. an OSD comes back online).
> >
> > Now, in the default configuration, losing a single OSD should not have > > caused unfound objects in the first place. Can you provide any more
> > information on how you got to that point?
> >
> > > My general question is: How are objects distributed among OSDs? Does > > > duplication (2x) guarantee that a failure of a single OSD would not lose > > > data?  It appears to me that the objects are statistically redistributed
> > > and
> > > does not guarantee physical separation of replication data location.
> >
> > The CRUSH logic has a special case for that: when picking replicas, if
> > it would pick an already used bucket, it tries again.
> >
> > My understanding is that with the default crushmap, replicas will go > > to different OSDs, and this is what I see in practice. If you did > > construct your crushmap, it's possible that you made one that allows
> > multiple replicas on the same drive. Or even configured Ceph to
> > maintain no replicas.
> >
> > It is also possible you got hit by a bug, most likely one not in the
> > placement rules but in the OSDs code.
> >
> > Can you provide more information on your setup and the steps you took?
> >
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux