Re: Replacing a failed disk/OSD: unfound object

Meng Zhao <mzhao@xxxxxxxxxxxx> · Wed, 13 Jul 2011 18:15:56 +0800

The test setting is like this:
I build a ceph3.0 system with 3mon, 3mds, 3osd on 3 machines.
Then I copied some file around. on a ceph client and wait until ceph -w 
shows regular healthy states.
Now I turn one machine off.

Here are the logs for this situation of unfound object:please look for 
the "=>" mark for events of interest
ceph -w
=========

2011-07-13 14:39:43.157935   mon e1: 3 mons at 
{alpha=192.168.0.135:6789/0,beta=192.168.0.136:6789/0,gamma=192.168.0.137:6789/0}
2011-07-13 14:59:11.519250   osd e179: 3 osds: 3 up, 3 in
2011-07-13 15:01:17.355846    pg v1099: 602 pgs: 602 active+clean; 349 
MB data, 1778 MB used, 920 MB / 3069 MB avail
=>ceph0 shutdown  (with mon0, mdsalpa, osd0)
2011-07-13 16:17:22.232696 7f3ac1cd1700 monclient: hunting for new mon
2011-07-13 16:17:34.887551   log 2011-07-13 16:17:29.515704 mon1 
192.168.0.136:6789/0 2 : [INF] mon.beta calling new monitor election
2011-07-13 16:17:34.887551   log 2011-07-13 16:17:29.633224 mon2 
192.168.0.137:6789/0 9 : [INF] mon.gamma calling new monitor election
2011-07-13 16:17:34.887551   log 2011-07-13 16:17:34.530619 mon1 
192.168.0.136:6789/0 3 : [INF] mon.beta@1 won leader election with 
quorum 1,2
2011-07-13 16:17:47.744681   log 2011-07-13 16:17:47.521582 mon1 
192.168.0.136:6789/0 4 : [INF] osd0 192.168.0.135:6801/6165 failed (by 
osd2 192.168.0.137:6801/9292)
2011-07-13 16:17:48.754993   log 2011-07-13 16:17:47.589716 mon1 
192.168.0.136:6789/0 5 : [INF] osd0 192.168.0.135:6801/6165 failed (by 
osd1 192.168.0.136:6801/2984)
2011-07-13 16:17:52.742905   osd e180: 3 osds: 2 up, 3 in
2011-07-13 16:17:52.754959   log 2011-07-13 16:17:52.524626 mon1 
192.168.0.136:6789/0 6 : [INF] osd0 192.168.0.135:6801/6165 failed (by 
osd2 192.168.0.137:6801/9292)
2011-07-13 16:17:52.767674    pg v1100: 602 pgs: 602 active+clean; 349 
MB data, 1778 MB used, 920 MB / 3069 MB avail
2011-07-13 16:17:53.751359   osd e181: 3 osds: 2 up, 3 in
2011-07-13 16:17:53.770067    pg v1101: 602 pgs: 602 active+clean; 349 
MB data, 1778 MB used, 920 MB / 3069 MB avail
2011-07-13 16:17:54.049235   osd e182: 3 osds: 2 up, 3 in
2011-07-13 16:17:54.062753   mds e108: 1/1/1 up {0=beta=up:replay}, 1 
up:standby
2011-07-13 16:17:54.073946    pg v1102: 602 pgs: 602 active+clean; 349 
MB data, 1778 MB used, 920 MB / 3069 MB avail
2011-07-13 16:17:58.904299    pg v1103: 602 pgs: 233 active+clean, 369 
active+clean+degraded; 349 MB data, 1790 MB used, 914 MB / 3069 MB 
avail; 67/224 degraded (29.911%)
2011-07-13 16:17:59.083432   mds e109: 1/1/1 up {0=beta=up:reconnect}, 
1 up:standby
2011-07-13 16:17:59.105245   mds e110: 1/1/1 up {0=beta=up:rejoin}, 1 
up:standby
2011-07-13 16:17:59.121004   mds e111: 1/1/1 up {0=beta=up:active}, 1 
up:standby
2011-07-13 16:17:59.147149   log 2011-07-13 16:17:58.970253 mon1 
192.168.0.136:6789/0 7 : [INF] mds0 192.168.0.136:6800/2862 up:reconnect
2011-07-13 16:17:59.147149   log 2011-07-13 16:17:58.993575 mon1 
192.168.0.136:6789/0 8 : [INF] mds0 192.168.0.136:6800/2862 up:rejoin
2011-07-13 16:17:59.147149   log 2011-07-13 16:17:59.008454 mon1 
192.168.0.136:6789/0 9 : [INF] mds0 192.168.0.136:6800/2862 up:active
2011-07-13 16:18:01.522886   log 2011-07-13 16:17:58.980040 mds0 
192.168.0.136:6800/2862 1 : [DBG] reconnect by client4101 
192.168.0.138:0/384680305 after 0.009430
2011-07-13 16:18:03.746935    pg v1104: 602 pgs: 233 active+clean, 369 
active+clean+degraded; 349 MB data, 1795 MB used, 910 MB / 3069 MB 
avail; 67/224 degraded (29.911%)
2011-07-13 16:18:13.762497    pg v1105: 602 pgs: 233 active+clean, 369 
active+clean+degraded; 349 MB data, 1796 MB used, 909 MB / 3069 MB 
avail; 67/224 degraded (29.911%)
2011-07-13 16:20:03.752175    pg v1106: 602 pgs: 233 active+clean, 369 
active+clean+degraded; 349 MB data, 1791 MB used, 909 MB / 3069 MB 
avail; 67/224 degraded (29.911%)
2011-07-13 16:20:13.796924    pg v1107: 602 pgs: 233 active+clean, 369 
active+clean+degraded; 349 MB data, 1786 MB used, 909 MB / 3069 MB 
avail; 67/224 degraded (29.911%)
=>osd0 is considered out and replication start
2011-07-13 16:22:54.219786   osd e183: 3 osds: 2 up, 2 in
2011-07-13 16:22:54.234591    pg v1108: 602 pgs: 233 active+clean, 369 
active+clean+degraded; 349 MB data, 1260 MB used, 537 MB / 2046 MB 
avail; 67/224 degraded (29.911%)
2011-07-13 16:22:54.254039   log 2011-07-13 16:22:54.116461 mon1 
192.168.0.136:6789/0 10 : [INF] osd0 out (down for 301.489741)
2011-07-13 16:23:02.684313   osd e184: 3 osds: 2 up, 2 in
2011-07-13 16:23:02.698807    pg v1109: 602 pgs: 233 active+clean, 369 
active+clean+degraded; 349 MB data, 1260 MB used, 537 MB / 2046 MB 
avail; 67/224 degraded (29.911%)
2011-07-13 16:23:07.995143   osd e185: 3 osds: 2 up, 2 in
2011-07-13 16:23:08.009328    pg v1110: 602 pgs: 9 active, 402 
active+clean, 8 peering, 183 active+clean+degraded; 349 MB data, 1273 MB 
used, 527 MB / 2046 MB avail; 54/224 degraded (24.107%)
2011-07-13 16:23:08.139752    pg v1111: 602 pgs: 41 active, 552 
active+clean, 9 peering; 349 MB data, 1288 MB used, 514 MB / 2046 MB 
avail; 53/224 degraded (23.661%)
2011-07-13 16:23:09.001543   osd e186: 3 osds: 2 up, 2 in
2011-07-13 16:23:09.027121    pg v1112: 602 pgs: 41 active, 552 
active+clean, 9 peering; 349 MB data, 1288 MB used, 514 MB / 2046 MB 
avail; 53/224 degraded (23.661%)
2011-07-13 16:23:09.311325   osd e187: 3 osds: 2 up, 2 in
2011-07-13 16:23:09.433629    pg v1113: 602 pgs: 41 active, 552 
active+clean, 9 peering; 349 MB data, 1288 MB used, 514 MB / 2046 MB 
avail; 53/224 degraded (23.661%)
2011-07-13 16:23:10.399307   osd e188: 3 osds: 2 up, 2 in
2011-07-13 16:23:10.414293    pg v1114: 602 pgs: 41 active, 552 
active+clean, 9 peering; 349 MB data, 1288 MB used, 514 MB / 2046 MB 
avail; 53/224 degraded (23.661%)
2011-07-13 16:23:11.408559   osd e189: 3 osds: 2 up, 2 in
2011-07-13 16:23:11.427273    pg v1115: 602 pgs: 41 active, 552 
active+clean, 9 peering; 349 MB data, 1288 MB used, 514 MB / 2046 MB 
avail; 53/224 degraded (23.661%)
2011-07-13 16:23:12.329049   osd e190: 3 osds: 2 up, 2 in
2011-07-13 16:23:12.365014    pg v1116: 602 pgs: 41 active, 552 
active+clean, 9 peering; 349 MB data, 1288 MB used, 514 MB / 2046 MB 
avail; 53/224 degraded (23.661%)
2011-07-13 16:23:14.973260    pg v1117: 602 pgs: 49 active, 552 
active+clean, 1 peering; 349 MB data, 1301 MB used, 504 MB / 2046 MB 
avail; 66/224 degraded (29.464%)
2011-07-13 16:23:17.129235    pg v1118: 602 pgs: 50 active, 552 
active+clean; 349 MB data, 1317 MB used, 489 MB / 2046 MB avail; 67/224 
degraded (29.911%)
2011-07-13 16:23:30.004135    pg v1119: 602 pgs: 46 active, 556 
active+clean; 349 MB data, 1329 MB used, 474 MB / 2046 MB avail; 62/224 
degraded (27.679%)
2011-07-13 16:23:35.271864    pg v1120: 602 pgs: 40 active, 562 
active+clean; 349 MB data, 1373 MB used, 428 MB / 2046 MB avail; 53/224 
degraded (23.661%)
2011-07-13 16:23:45.093097    pg v1121: 602 pgs: 38 active, 564 
active+clean; 349 MB data, 1394 MB used, 408 MB / 2046 MB avail; 49/224 
degraded (21.875%)
=>for some reason osd2 failed during object replication
2011-07-13 16:24:00.435405   log 2011-07-13 16:24:00.288017 mon1 
192.168.0.136:6789/0 11 : [INF] osd2 192.168.0.137:6801/9292 failed (by 
osd1 192.168.0.136:6801/2984)
2011-07-13 16:24:05.579361   log 2011-07-13 16:24:05.424065 mon1 
192.168.0.136:6789/0 12 : [INF] osd2 192.168.0.137:6801/9292 failed (by 
osd1 192.168.0.136:6801/2984)
2011-07-13 16:24:10.583691   osd e191: 3 osds: 1 up, 2 in
2011-07-13 16:24:10.594086   log 2011-07-13 16:24:10.455936 mon1 
192.168.0.136:6789/0 13 : [INF] osd2 192.168.0.137:6801/9292 failed (by 
osd1 192.168.0.136:6801/2984)
2011-07-13 16:24:10.614368    pg v1122: 602 pgs: 38 active, 564 
active+clean; 349 MB data, 1394 MB used, 408 MB / 2046 MB avail; 49/224 
degraded (21.875%)
2011-07-13 16:24:11.590134   osd e192: 3 osds: 1 up, 2 in
2011-07-13 16:24:11.607345    pg v1123: 602 pgs: 38 active, 564 
active+clean; 349 MB data, 1394 MB used, 408 MB / 2046 MB avail; 49/224 
degraded (21.875%)
2011-07-13 16:24:15.739147    pg v1124: 602 pgs: 25 active+degraded, 
577 active+clean+degraded; 349 MB data, 1404 MB used, 402 MB / 2046 MB 
avail; 143/224 degraded (63.839%); 31/112 unfound (27.679%)
2011-07-13 16:26:15.845867    pg v1125: 602 pgs: 25 active+degraded, 
577 active+clean+degraded; 349 MB data, 1402 MB used, 402 MB / 2046 MB 
avail; 143/224 degraded (63.839%); 31/112 unfound (27.679%)
=>osd2 considered out
2011-07-13 16:29:14.486774   osd e193: 3 osds: 1 up, 1 in
2011-07-13 16:29:14.504393    pg v1126: 602 pgs: 25 active+degraded, 
577 active+clean+degraded; 349 MB data, 592 MB used, 308 MB / 1023 MB 
avail; 143/224 degraded (63.839%); 31/112 unfound (27.679%)
2011-07-13 16:29:14.514343   log 2011-07-13 16:29:14.394234 mon1 
192.168.0.136:6789/0 14 : [INF] osd2 out (down for 303.874867)
2011-07-13 16:29:16.077618    pg v1127: 602 pgs: 25 active+degraded, 
577 active+clean+degraded; 349 MB data, 594 MB used, 308 MB / 1023 MB 
avail; 143/224 degraded (63.839%); 31/112 unfound (27.679%)
2011-07-13 16:31:16.098224    pg v1128: 602 pgs: 25 active+degraded, 
577 active+clean+degraded; 349 MB data, 592 MB used, 308 MB / 1023 MB 
avail; 143/224 degraded (63.839%); 31/112 unfound (27.679%)

osd.2.log
=============
2011-07-13 14:51:37.542028 7f3b6de13700 osd2 178 pg[0.53( v 178'1 
(0'0,178'1] n=1 ec=2 les/c 166/178 163/163/155) [2,0] r=0 mlcod 0'0 !hml 
active+clean]  sending commit on repgather(0x7f3b64075350 applied 178'1 
rep_tid=266 wfack= wfdisk= op=osd_op(client4101.1:375 
10000000007.0000000a [write 0~4194304 [1@-1]] 0.43d3 snapc 1=[])) 
0x7f3b6411f840
2011-07-13 14:51:40.448487 7f3b6de13700 osd2 178 pg[0.6e( v 178'2 
(0'0,178'2] n=2 ec=2 les/c 172/178 170/170/155) [2,1] r=0 mlcod 0'0 !hml 
active+clean]  sending commit on repgather(0x7f3b6415f090 applied 178'2 
rep_tid=267 wfack= wfdisk= op=osd_op(client4101.1:376 
10000000007.0000000b [write 0~4194304 [1@-1]] 0.ad6e snapc 1=[])) 
0x7f3b640d13e0
2011-07-13 14:51:42.524448 7f3b6e614700 osd2 178 pg[0.62( v 178'2 
(0'0,178'2] n=2 ec=2 les/c 166/178 163/163/155) [2,0] r=0 mlcod 0'0 !hml 
active+clean]  sending commit on repgather(0x7f3b640dd560 applied 178'2 
rep_tid=268 wfack= wfdisk= op=osd_op(client4101.1:377 
10000000007.0000000c [write 0~4194304 [1@-1]] 0.7e62 snapc 1=[])) 
0x7f3b6414fb80
2011-07-13 14:51:43.264678 7f3b6e614700 osd2 178 pg[0.8( v 178'2 
(0'0,178'2] n=2 ec=2 les/c 166/178 163/163/155) [2,0] r=0 mlcod 0'0 !hml 
active+clean]  sending commit on repgather(0x7f3b6011de10 applied 178'2 
rep_tid=271 wfack= wfdisk= op=osd_op(client4101.1:398 
10000000007.00000021 [write 0~4194304 [1@-1]] 0.ef08 snapc 1=[])) 
0x7f3b600e8010
2011-07-13 14:51:43.264801 7f3b6e614700 osd2 178 pg[0.2e( v 178'1 
(0'0,178'1] n=1 ec=2 les/c 166/178 163/163/155) [2,0] r=0 mlcod 0'0 !hml 
active+clean]  sending commit on repgather(0x2dc09f0 applied 178'1 
rep_tid=273 wfack= wfdisk= op=osd_op(client4101.1:404 
10000000007.00000027 [write 0~269567 [1@-1]] 0.1a2e snapc 1=[])) 
0x2e4b8602011-07-13 14:51:43.413590 7f3b6de13700 osd2 178 pg[0.6f( v 
178'2 (0'0,178'2] n=2 ec=2 les/c 172/178 170/170/155) [2,1] r=0 mlcod 
0'0 !hml active+clean]  sending commit on repgather(0x7f3b600b2310 
applied 178'2 rep_tid=269 wfack= wfdisk= op=osd_op(client4101.1:379 
10000000007.0000000e [write 0~4194304 [1@-1]] 0.8def snapc 1=[])) 
0x7f3b6005a340
2011-07-13 14:51:44.581422 7f3b6de13700 osd2 178 pg[0.52( v 178'2 
(0'0,178'2] n=2 ec=2 les/c 172/178 170/170/155) [2,1] r=0 mlcod 0'0 !hml 
active+clean]  sending commit on repgather(0x7f3b65d89090 applied 178'2 
rep_tid=270 wfack= wfdisk= op=osd_op(client4101.1:393 
10000000007.0000001c [write 0~4194304 [1@-1]] 0.9a52 snapc 1=[])) 
0x7f3b6406f190
2011-07-13 14:51:44.581630 7f3b6de13700 osd2 178 pg[0.4c( v 178'2 
(0'0,178'2] n=2 ec=2 les/c 172/178 170/170/155) [2,1] r=0 mlcod 0'0 !hml 
active+clean]  sending commit on repgather(0x7f3b6008eca0 applied 178'2 
rep_tid=272 wfack= wfdisk= op=osd_op(client4101.1:399 
10000000007.00000022 [write 0~4194304 [1@-1]] 0.dacc snapc 1=[])) 
0x7f3b60139550
=>something already wrong but ceph -w does not know
2011-07-13 14:52:59.465626 7f3b70719700 osd2 178 OSD::ms_handle_reset()
2011-07-13 14:52:59.465731 7f3b70719700 osd2 178 OSD::ms_handle_reset() 
s=0x2dc0740
2011-07-13 15:06:15.518322 7f3b70719700 osd2 179 OSD::ms_handle_reset()
2011-07-13 15:06:15.518517 7f3b70719700 osd2 179 OSD::ms_handle_reset() 
s=0x2de6ce0
2011-07-13 15:14:11.831878 7f3b6bb0c700 -- 192.168.0.137:6802/9292 >> 
192.168.0.135:6802/6165 pipe(0x7f3b64099ab0 sd=13 pgs=1 cs=1 l=0).fault 
with nothing to send, going to standby
2011-07-13 15:14:11.880194 7f3b6b304700 -- 192.168.0.137:6802/9292 >> 
192.168.0.136:6802/2984 pipe(0x2ce1a20 sd=15 pgs=2 cs=1 l=0).fault with 
nothing to send, going to standby
2011-07-13 15:21:15.511170 7f3b70719700 osd2 179 OSD::ms_handle_reset()
2011-07-13 15:21:15.511265 7f3b70719700 osd2 179 OSD::ms_handle_reset() 
s=0x2bbfc10
2011-07-13 15:36:15.613503 7f3b70719700 osd2 179 OSD::ms_handle_reset()
2011-07-13 15:36:15.613602 7f3b70719700 osd2 179 OSD::ms_handle_reset() 
s=0x2ac40b0
2011-07-13 15:51:15.707005 7f3b70719700 osd2 179 OSD::ms_handle_reset()
2011-07-13 15:51:15.707101 7f3b70719700 osd2 179 OSD::ms_handle_reset() 
s=0x2827df0
2011-07-13 16:06:15.779902 7f3b70719700 osd2 179 OSD::ms_handle_reset()
2011-07-13 16:06:15.779992 7f3b70719700 osd2 179 OSD::ms_handle_reset() 
s=0x2ba3650
2011-07-13 16:17:23.667067 7f3b70719700 osd2 179 OSD::ms_handle_reset()
2011-07-13 16:17:23.667128 7f3b70719700 osd2 179 OSD::ms_handle_reset() 
s=0x2dc09f0
=>ceph0 shutdown
2011-07-13 16:17:25.043107 7f3b6b708700 -- 192.168.0.137:6803/9292 >> 
192.168.0.135:6803/6165 pipe(0x7f3b640e77c0 sd=14 pgs=3 cs=1 l=0).fault 
initiating reconnect
2011-07-13 16:17:25.044344 7f3b6bd0e700 -- 192.168.0.137:6803/9292 >> 
192.168.0.135:6803/6165 pipe(0x7f3b640e77c0 sd=14 pgs=3 cs=2 l=0).fault 
first fault
2011-07-13 16:17:44.634390 7f3b76f26700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:24.634291)
2011-07-13 16:17:45.143635 7f3b6c610700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:25.143606)
2011-07-13 16:17:45.635063 7f3b76f26700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:25.635027)
2011-07-13 16:17:46.446478 7f3b6c610700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:26.446446)
2011-07-13 16:17:46.635367 7f3b76f26700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:26.635334)
2011-07-13 16:17:46.946921 7f3b6c610700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:26.946890)
2011-07-13 16:17:47.635921 7f3b76f26700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:27.635884)
2011-07-13 16:17:48.347317 7f3b6c610700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:28.347285)
2011-07-13 16:17:48.636694 7f3b76f26700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:28.636657)
2011-07-13 16:17:49.637227 7f3b76f26700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:29.637193)
2011-07-13 16:17:49.647933 7f3b6c610700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:29.647916)
2011-07-13 16:17:50.425012 7f3b6c610700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:30.424981)
2011-07-13 16:17:50.637671 7f3b76f26700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:30.637638)
2011-07-13 16:17:51.025729 7f3b6c610700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:31.025698)
2011-07-13 16:17:51.638261 7f3b76f26700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:31.638165)
2011-07-13 16:17:51.926444 7f3b6c610700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:31.926413)
2011-07-13 16:17:52.526995 7f3b6c610700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:32.526963)
2011-07-13 16:17:52.638956 7f3b76f26700 osd2 179 heartbeat_check: no 
heartbeat from osd0 since 2011-07-13 16:17:24.048586 (cutoff 2011-07-13 
16:17:32.638936)
2011-07-13 16:17:52.763523 7f3b6b607700 -- 192.168.0.137:6802/9292 >> 
192.168.0.136:6802/2984 pipe(0x2ce1a20 sd=13 pgs=2 cs=2 l=0).connect 
claims to be 0.0.0.0:6802/2984 not 192.168.0.136:6802/2984 - presumably 
this is the same node!
2011-07-13 16:17:59.235254 7f3b75723700 osd2 182 pg[1.94( v 182'23 
(178'20,182'23] n=1 ec=2 les/c 181/182 180/180/155) [2] r=0 mlcod 0'0 
!hml active+clean+degraded]  sending commit on repgather(0x7f3b602cfba0 
applied 182'23 rep_tid=274 wfack= wfdisk= op=osd_op(mds0.11:21 
200.00000000 [writefull 0~84] 1.3494) v2) 0x7f3b601ebb90
*** Caught signal (Aborted) **
 in thread 0x7f3b74721700
=>restart osd2
2011-07-13 17:06:00.120011 7fe0f5c57720 ceph version 0.30.commit: 
64b1b2c70f0cde39c72d5d724c65ea8afaaa00b9. process: cosd. pid: 10241
2011-07-13 17:06:00.132026 7fe0f5c57720 filestore(/data/osd.2) mount 
FIEMAP ioctl is NOT supported
2011-07-13 17:06:00.132100 7fe0f5c57720 filestore(/data/osd.2) mount 
detected btrfs
2011-07-13 17:06:00.132120 7fe0f5c57720 filestore(/data/osd.2) mount 
btrfs CLONE_RANGE ioctl is supported
2011-07-13 17:06:00.149449 7fe0f5c57720 filestore(/data/osd.2) mount 
btrfs SNAP_CREATE is supported
2011-07-13 17:06:00.297455 7fe0f5c57720 filestore(/data/osd.2) mount 
btrfs SNAP_DESTROY is supported
2011-07-13 17:06:00.297786 7fe0f5c57720 filestore(/data/osd.2) mount 
btrfs START_SYNC got 0 Success
2011-07-13 17:06:00.297847 7fe0f5c57720 filestore(/data/osd.2) mount 
btrfs START_SYNC is supported (transid 363)
2011-07-13 17:06:00.300147 7fe0f5c57720 filestore(/data/osd.2) mount 
btrfs WAIT_SYNC is supported
2011-07-13 17:06:00.301704 7fe0f5c57720 filestore(/data/osd.2) mount 
btrfs SNAP_CREATE_V2 got 0 Success
2011-07-13 17:06:00.301727 7fe0f5c57720 filestore(/data/osd.2) mount 
btrfs SNAP_CREATE_V2 is supported
2011-07-13 17:06:00.325592 7fe0f5c57720 filestore(/data/osd.2) mount 
found snaps <24050,24085>
2011-07-13 17:06:00.376720 7fe0f5c57720 filestore(/data/osd.2) mount: 
enabling PARALLEL journal mode: btrfs, SNAP_CREATE_V2 detected and 
'filestore btrfs snap' mode is enabled
2011-07-13 17:06:00.376809 7fe0f5c57720 journal _open 
/data/osd.2/journal fd 11: 1048576000 bytes, block size 4096 bytes, 
directio = 1
2011-07-13 17:06:00.470792 7fe0f5c57720 journal read_entry 593817600 : 
seq 24086 1049540 bytes
2011-07-13 17:06:00.472089 7fe0f5c57720 journal read_entry 593817600 : 
seq 24086 1049540 bytes
*** Caught signal (Aborted) **
 in thread 0x7fe0f5c57720
 ceph version 0.30 (commit:64b1b2c70f0cde39c72d5d724c65ea8afaaa00b9)
 1: /usr/bin/cosd() [0x637e6e]
 2: (()+0xf430) [0x7fe0f563a430]
 3: (gsignal()+0x35) [0x7fe0f442c355]
 4: (abort()+0x17f) [0x7fe0f442d5ef]
 5: (__assert_fail()+0xf1) [0x7fe0f44254c1]
 6: (FileStore::_do_transaction(ObjectStore::Transaction&)+0x3d42) 
[0x5c4b62]
 7: (FileStore::do_transactions(std::list<ObjectStore::Transaction*, 
std::allocator<ObjectStore::Transaction*> >&, unsigned long)+0x75) 
[0x5c5135]
 8: (JournalingObjectStore::journal_replay(unsigned long)+0xf60) 
[0x5d54c0]
 9: (FileStore::mount()+0x17e9) [0x5ae489]
 10: (OSD::init()+0x165) [0x50cfd5]
 11: (main()+0x2424) [0x48ba64]
 12: (__libc_start_main()+0xfd) [0x7fe0f4418d2d]
 13: /usr/bin/cosd() [0x4881e9]

It seems cosd  does not escalate error situations

On Tue, 12 Jul 2011 09:06:14 -0700 (PDT), Sage Weil wrote:
On Tue, 12 Jul 2011, Meng Zhao wrote:
Thanks Tommi.  I rebuilt the ceph cluster a few times just to 
reproduce the
situation. The result seems mixed, more likely btrfs failed (after 
power
reset). But it does happen anyway.

The big question is: However rare, unfound object situation makes 
the *entire*
ceph file system not mountable, leads to a total lost of all data. 
That is
quite a risk to take for production system.  Is there a way to 
recover from
such situation ?(e.g. remove the file associated with the missing 
object)

There isn't an easy way to do this currently, but it is very easy to 
add.
I've added an issue to the tracker.

The real question is whether the lost object is file data or 
namespace
metadata.  If it's namespace metadata, you may lose part of the 
directory
hierarchy (until the fsck/namespace rebuild work is done).  For file 
data,
it's easy enough to just mark the unfound object as deleted.

sage

On Fri, 8 Jul 2011 10:08:46 -0700, Tommi Virtanen wrote:
> [It seems I dropped the Cc: to ceph-devel, added it back.. Please
> reply to this message instead, and sorry about that. I'm starting 
to
> dislike Google Apps for mailing list traffic :( ]
>
> On Fri, Jul 8, 2011 at 10:07, Tommi Virtanen
> <tommi.virtanen@xxxxxxxxxxxxx> wrote:
> > On Fri, Jul 8, 2011 at 01:23, Meng Zhao <mzhao@xxxxxxxxxxxx> 
wrote:
> > > I was trying to replace a disk for an osd by following 
instruction at:
> > > http://ceph.newdream.net/wiki/Replacing_a_failed_disk/OSD
> > >
> > > Now, ceph -w getting
> > > 2011-07-08 15:52:39.702881    pg v1602: 602 pgs: 49 
active+degraded, 553
> > > active+clean+degraded; 349 MB data, 333 MB used, 566 MB / 1023 
MB avail;
> > > 167/224 degraded (74.554%); 55/112 unfound (49.107%)
> > >
> > > and a copy operation hang on the ceph client forever. I cannot 
kill (-9)
> > > the
> > > cp process. Is there any hope to recover my ceph filesystem?
> >
> > I'm pretty sure that the cp is hanging because Ceph chooses to 
wait in
> > case the unfound objects do come back (e.g. an OSD comes back 
online).
> >
> > Now, in the default configuration, losing a single OSD should 
not have
> > caused unfound objects in the first place. Can you provide any 
more
> > information on how you got to that point?
> >
> > > My general question is: How are objects distributed among 
OSDs? Does
> > > duplication (2x) guarantee that a failure of a single OSD 
would not lose
> > > data?  It appears to me that the objects are statistically 
redistributed
> > > and
> > > does not guarantee physical separation of replication data 
location.
> >
> > The CRUSH logic has a special case for that: when picking 
replicas, if
> > it would pick an already used bucket, it tries again.
> >
> > My understanding is that with the default crushmap, replicas 
will go
> > to different OSDs, and this is what I see in practice. If you 
did
> > construct your crushmap, it's possible that you made one that 
allows
> > multiple replicas on the same drive. Or even configured Ceph to
> > maintain no replicas.
> >
> > It is also possible you got hit by a bug, most likely one not in 
the
> > placement rules but in the OSDs code.
> >
> > Can you provide more information on your setup and the steps you 
took?
> >
> --
> To unsubscribe from this list: send the line "unsubscribe 
ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe 
ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html