Hi Samuel, I've now applied your fix and it worked. Everything is active+clean again. Thanks, Christian 2011/7/20 Samuel Just <samuelj@xxxxxxxxxxxxxxx>: > I've now pushed a patch to master implementing a hack to allow a pg to be > forced into creating. > > ceph pg force_create_pg <pg> > > will now reset that pg's status to creating. This should cause the osd to > recreate the pg. > > Let us know if it helps! > > 019955a1f40944ed489fd613abe7ef4f3671fb4b implements the hack. > > -Sam > > On 07/19/2011 03:21 PM, Samuel Just wrote: >> >> Sage may correct me on this, but: >> >> The trouble is probably that an osd is informed about its responsibilities >> towards a pg by the osds which previously handled that pg. If all of those >> have been wiped/removed, the new osd won't be notified. The monitor can >> send an MOSDPGCreate message, but currently won't since we don't want to >> loose data in case the osds with that pg come back. I don't think we have a >> way to address this at the moment. You might be able to fix it by removing >> the pools and recreating them if they don't have any data. If you want to >> wait for a bit, I'm going to add a command to the monitor to allow you to >> re-mark these as creating. >> >> -Sam >> >> On 07/19/2011 08:41 AM, Christian Brunner wrote: >>> >>> 2011/7/18 Sage Weil<sage@xxxxxxxxxxxx>: >>>> >>>> On Mon, 18 Jul 2011, Christian Brunner wrote: >>>>>>>>> >>>>>>>>> $ ceph pg dump -o - | grep crashed >>>>>>>>> pg_stat objects mip degr unf kb bytes log >>>>>>>>> disklog state v reported up acting last_scrub >>>>>>>>> 1.1ac 0 0 0 0 0 0 0 0 >>>>>>>>> crashed+peering 0'0 5869'576 [3,13] [3,13] 0'0 >>>>>>>>> 2011-07-13 17:04:30.221618 >>>>>>>>> 0.1ad 0 0 0 0 0 0 198 198 >>>>>>>>> crashed+peering 3067'1194 5869'515 [3,13] [3,13] >>>>>>>>> 3067'1194 2011-07-13 17:04:29.221726 >>>>>>>>> 2.1ab 0 0 0 0 0 0 0 0 >>>>>>>>> crashed+peering 0'0 5869'576 [3,13] [3,13] 0'0 >>>>>>>>> 2011-07-13 17:04:31.222145 >>>>>>>>> 1.6c 0 0 0 0 0 0 0 0 >>>>>>>>> crashed+peering 0'0 5869'577 [3,13] [3,13] 0'0 >>>>>>>>> 2011-07-13 17:05:35.237286 >>>>>>>>> 0.6d 0 0 0 0 0 0 198 198 >>>>>>>>> crashed+peering 3067'636 5869'516 [3,13] [3,13] >>>>>>>>> 3067'636 2011-07-13 17:05:34.237024 >>>>>>>>> 2.6b 0 0 0 0 0 0 0 0 >>>>>>>>> crashed+peering 0'0 5869'577 [3,13] [3,13] 0'0 >>>>>>>>> 2011-07-13 17:05:37.238474 >>>>>> >>>>>> Strange, none of these PGs show up in those logs. Can you do >>>>>> >>>>>> ceph pg map 1.1ac >>>>>> >>>>>> for each PG and see where the current CRUSH map thinks they should be >>>>>> stored? That would be the node to look for them on. You may also >>>>>> want to >>>>>> look for $osd_data/current/$pgid_head on all the OSDs to see where the >>>>>> copies are. >>>>>> >>>>>> The location in the pg dump (from the monitors PGMap) is just the last >>>>>> reported location. Primaries for each PG normally send stats updates >>>>>> several times a minute for each PG that is touched (and less >>>>>> frequently >>>>>> for those that are not). So it's not necessarily bad that it doesn't >>>>>> match... but it is strange that no surviving copy is reporting updated >>>>>> information. >>>>> >>>>> pg dump matches the data from pg map: >>>>> >>>>> 2011-07-18 09:41:02.340371 mon<- [pg,map,1.1ac] >>>>> 2011-07-18 09:41:02.410063 mon0 -> 'osdmap e6517 pg 1.1ac (1.1ac) -> >>>>> up [3,13] acting [3,13]' (0) >>>>> 2011-07-18 09:41:02.434859 mon<- [pg,map,0.1ad] >>>>> 2011-07-18 09:41:02.435546 mon1 -> 'osdmap e6517 pg 0.1ad (0.1ad) -> >>>>> up [3,13] acting [3,13]' (0) >>>>> 2011-07-18 09:41:02.442316 mon<- [pg,map,2.1ab] >>>>> 2011-07-18 09:41:02.442839 mon1 -> 'osdmap e6517 pg 2.1ab (2.1ab) -> >>>>> up [3,13] acting [3,13]' (0) >>>>> 2011-07-18 09:41:02.449131 mon<- [pg,map,1.6c] >>>>> 2011-07-18 09:41:02.449679 mon2 -> 'osdmap e6517 pg 1.6c (1.6c) -> up >>>>> [3,13] acting [3,13]' (0) >>>>> 2011-07-18 09:41:02.455090 mon<- [pg,map,0.6d] >>>>> 2011-07-18 09:41:02.455429 mon0 -> 'osdmap e6517 pg 0.6d (0.6d) -> up >>>>> [3,13] acting [3,13]' (0) >>>>> 2011-07-18 09:41:02.461530 mon<- [pg,map,2.6b] >>>>> 2011-07-18 09:41:02.462012 mon2 -> 'osdmap e6517 pg 2.6b (2.6b) -> up >>>>> [3,13] acting [3,13]' (0) >>>>> >>>>> I've also looked at the filesystem: $pgid_head directories do neither >>>>> exist on osd003 nor on osd013. >>>> >>>> Does it exist on any other nodes? >>> >>> No, it doesn't exist on any node. >>> >>>> Did the osd crash you mentioned happen at the end (when you started >>>> seeing >>>> these 6 pgs misbehave), or did it recover fully after that, and only do >>>> this after a later OSD was reformatted? >>> >>> The crash happened during the rebuild. The rebuild finished with these >>> 6 PGs in state "crashed+peering". Everything else was fine (no >>> degraded objects). >>> >>> My suspicion is, that these PGs had been skipped in the rebuild from >>> osd013 to osd003, because of the crash. After that I did a reformat on >>> osd013, which might explain, why these PGs are missing on osd013, too. >>> >>> Is there a way to create a PG manually? >>> >>> Thanks, >>> Christian >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html