On Fri, 11 Sep 2015, Haomai Wang wrote: > On Fri, Sep 11, 2015 at 10:09 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > On Fri, 11 Sep 2015, Haomai Wang wrote: > >> On Fri, Sep 11, 2015 at 8:56 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > >> On Fri, 11 Sep 2015, ?? wrote: > >> > Thank Sage Weil: > >> > > >> > 1. I delete some testing pools in the past, but is was a long > >> time ago (may be 2 months ago), in recently upgrade, do not > >> delete pools. > >> > 2. ceph osd dump please see the (attachment file > >> ceph.osd.dump.log) > >> > 3. debug osd = 20' and 'debug filestore = 20 (attachment file > >> ceph.osd.5.log.tar.gz) > >> > >> This one is failing on pool 54, which has been deleted. In this > >> case you > >> can work around it by renaming current/54.* out of the way. > >> > >> > 4. i install the ceph-test, but output error > >> > ceph-kvstore-tool /ceph/data5/current/db list > >> > Invalid argument: /ceph/data5/current/db: does not exist > >> (create_if_missing is false) > >> > >> Sorry, I should have said current/omap, not current/db. I'm > >> still curious > >> to see the key dump. I'm not sure why the leveldb key for these > >> pgs is > >> missing... > >> > >> > >> Yesterday I have a chat with wangrui and the reason is "infos"(legacy oid) > >> is missing. I'm not sure why it's missing. > > > > Probably > > > > https://github.com/ceph/ceph/blob/hammer/src/osd/OSD.cc#L2908 > > > > Oh, I think I see what happened: > > > > - the pg removal was aborted pre-hammer. On pre-hammer, thsi means that > > load_pgs skips it here: > > > > https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L2121 > > > > - we upgrade to hammer. we skip this pg (same reason), don't upgrade it, > > but delete teh legacy infos object > > > > https://github.com/ceph/ceph/blob/hammer/src/osd/OSD.cc#L2908 > > > > - now we see this crash... > > > > I think the fix is, in hammer, to bail out of peek_map_epoch if the infos > > object isn't present, here > > > > https://github.com/ceph/ceph/blob/hammer/src/osd/PG.cc#L2867 > > > > Probably we should restructure so we can return a 'fail' value > > instead of a magic epoch_t meaning the same... > > > > This is similar to the bug I'm fixing on master (and I think I just > > realized what I was doing wrong there). > > Hmm, I got it. So we could skip this assert or just like load_pgs to > check pool whether exists? > > I think it's urgent bug because I remember several people show me the > alike crash. Yeah.. take a look at https://github.com/ceph/ceph/pull/5892 Does that look right to you? Packages are building now... sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html