On Mon, 18 Jul 2011, Christian Brunner wrote: > >> >> $ ceph pg dump -o - | grep crashed > >> >> pg_stat objects mip degr unf kb bytes log > >> >> disklog state v reported up acting last_scrub > >> >> 1.1ac 0 0 0 0 0 0 0 0 > >> >> crashed+peering 0'0 5869'576 [3,13] [3,13] 0'0 > >> >> 2011-07-13 17:04:30.221618 > >> >> 0.1ad 0 0 0 0 0 0 198 198 > >> >> crashed+peering 3067'1194 5869'515 [3,13] [3,13] > >> >> 3067'1194 2011-07-13 17:04:29.221726 > >> >> 2.1ab 0 0 0 0 0 0 0 0 > >> >> crashed+peering 0'0 5869'576 [3,13] [3,13] 0'0 > >> >> 2011-07-13 17:04:31.222145 > >> >> 1.6c 0 0 0 0 0 0 0 0 > >> >> crashed+peering 0'0 5869'577 [3,13] [3,13] 0'0 > >> >> 2011-07-13 17:05:35.237286 > >> >> 0.6d 0 0 0 0 0 0 198 198 > >> >> crashed+peering 3067'636 5869'516 [3,13] [3,13] > >> >> 3067'636 2011-07-13 17:05:34.237024 > >> >> 2.6b 0 0 0 0 0 0 0 0 > >> >> crashed+peering 0'0 5869'577 [3,13] [3,13] 0'0 > >> >> 2011-07-13 17:05:37.238474 > > > > Strange, none of these PGs show up in those logs. Can you do > > > > ceph pg map 1.1ac > > > > for each PG and see where the current CRUSH map thinks they should be > > stored? That would be the node to look for them on. You may also want to > > look for $osd_data/current/$pgid_head on all the OSDs to see where the > > copies are. > > > > The location in the pg dump (from the monitors PGMap) is just the last > > reported location. Primaries for each PG normally send stats updates > > several times a minute for each PG that is touched (and less frequently > > for those that are not). So it's not necessarily bad that it doesn't > > match... but it is strange that no surviving copy is reporting updated > > information. > > pg dump matches the data from pg map: > > 2011-07-18 09:41:02.340371 mon <- [pg,map,1.1ac] > 2011-07-18 09:41:02.410063 mon0 -> 'osdmap e6517 pg 1.1ac (1.1ac) -> > up [3,13] acting [3,13]' (0) > 2011-07-18 09:41:02.434859 mon <- [pg,map,0.1ad] > 2011-07-18 09:41:02.435546 mon1 -> 'osdmap e6517 pg 0.1ad (0.1ad) -> > up [3,13] acting [3,13]' (0) > 2011-07-18 09:41:02.442316 mon <- [pg,map,2.1ab] > 2011-07-18 09:41:02.442839 mon1 -> 'osdmap e6517 pg 2.1ab (2.1ab) -> > up [3,13] acting [3,13]' (0) > 2011-07-18 09:41:02.449131 mon <- [pg,map,1.6c] > 2011-07-18 09:41:02.449679 mon2 -> 'osdmap e6517 pg 1.6c (1.6c) -> up > [3,13] acting [3,13]' (0) > 2011-07-18 09:41:02.455090 mon <- [pg,map,0.6d] > 2011-07-18 09:41:02.455429 mon0 -> 'osdmap e6517 pg 0.6d (0.6d) -> up > [3,13] acting [3,13]' (0) > 2011-07-18 09:41:02.461530 mon <- [pg,map,2.6b] > 2011-07-18 09:41:02.462012 mon2 -> 'osdmap e6517 pg 2.6b (2.6b) -> up > [3,13] acting [3,13]' (0) > > I've also looked at the filesystem: $pgid_head directories do neither > exist on osd003 nor on osd013. Does it exist on any other nodes? Did the osd crash you mentioned happen at the end (when you started seeing these 6 pgs misbehave), or did it recover fully after that, and only do this after a later OSD was reformatted? sage > > I supect that the PGs are empty because they belong to a pool that we > don't use, as we use ceph for rbd exclusive. > > Regards, > Christian > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > >