Re: Failed on starting osd-daemon after upgrade giant-0.87.1 tohammer-0.94.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 11 Sep 2015, Haomai Wang wrote:
> On Fri, Sep 11, 2015 at 10:09 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > On Fri, 11 Sep 2015, Haomai Wang wrote:
> >> On Fri, Sep 11, 2015 at 8:56 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >>       On Fri, 11 Sep 2015, ?? wrote:
> >>       > Thank Sage Weil:
> >>       >
> >>       > 1. I delete some testing pools in the past, but is was a long
> >>       time ago (may be 2 months ago), in recently upgrade, do not
> >>       delete pools.
> >>       > 2.  ceph osd dump please see the (attachment file
> >>       ceph.osd.dump.log)
> >>       > 3. debug osd = 20' and 'debug filestore = 20  (attachment file
> >>       ceph.osd.5.log.tar.gz)
> >>
> >>       This one is failing on pool 54, which has been deleted.  In this
> >>       case you
> >>       can work around it by renaming current/54.* out of the way.
> >>
> >>       > 4. i install the ceph-test, but output error
> >>       > ceph-kvstore-tool /ceph/data5/current/db list
> >>       > Invalid argument: /ceph/data5/current/db: does not exist
> >>       (create_if_missing is false)
> >>
> >>       Sorry, I should have said current/omap, not current/db.  I'm
> >>       still curious
> >>       to see the key dump.  I'm not sure why the leveldb key for these
> >>       pgs is
> >>       missing...
> >>
> >>
> >> Yesterday I have a chat with wangrui and the reason is "infos"(legacy oid)
> >> is missing. I'm not sure why it's missing.
> >
> > Probably
> >
> > https://github.com/ceph/ceph/blob/hammer/src/osd/OSD.cc#L2908
> >
> > Oh, I think I see what happened:
> >
> >  - the pg removal was aborted pre-hammer.  On pre-hammer, thsi means that
> > load_pgs skips it here:
> >
> >  https://github.com/ceph/ceph/blob/firefly/src/osd/OSD.cc#L2121
> >
> >  - we upgrade to hammer.  we skip this pg (same reason), don't upgrade it,
> > but delete teh legacy infos object
> >
> >  https://github.com/ceph/ceph/blob/hammer/src/osd/OSD.cc#L2908
> >
> >  - now we see this crash...
> >
> > I think the fix is, in hammer, to bail out of peek_map_epoch if the infos
> > object isn't present, here
> >
> >  https://github.com/ceph/ceph/blob/hammer/src/osd/PG.cc#L2867
> >
> > Probably we should restructure so we can return a 'fail' value
> > instead of a magic epoch_t meaning the same...
> >
> > This is similar to the bug I'm fixing on master (and I think I just
> > realized what I was doing wrong there).
> 
> Hmm, I got it. So we could skip this assert or just like load_pgs to
> check pool whether exists?
> 
> I think it's urgent bug because I remember several people show me the
> alike crash.

Yeah.. take a look at https://github.com/ceph/ceph/pull/5892

Does that look right to you?  Packages are building now...

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux