Re: [OSDMAP]osdmap did not update after network recovered from failure

Sage Weil <sage@xxxxxxxxxxxx> · Thu, 21 Jun 2018 12:29:51 +0000 (UTC)

On Thu, 21 Jun 2018, cgxu519 wrote:
> On 06/20/2018 10:45 PM, Sage Weil wrote:
> > On Wed, 20 Jun 2018, cgxu519 wrote:
> > > Is there any specific log indicates what was happening?
> > > 
> > > 
> > > On 06/19/2018 09:56 PM, xiangyang yu wrote:
> > > >    Hi cephers,
> > > >      Recently i met a problem in our production environment.
> > > >       My ceph version is hammer 0.94.5(it's too old though.)
> > > >       Osdmap(in the osd process) did not update epoch until the osd is
> > > > restarted.
> > > >       The osd log displays "wrong node", because the actual peer address
> > > > is different from the peer address got from the old osdmap.
> 
> So first of all, I would like  to know what is wrong here, the peer's address
> or the troubled osd itself?
> More information?

The troubled osd has a very old osdmap, and when it connects to its peers 
(using old addresses) it finds different, newer instances of osds there 
instead.  It's a symptom of it having old address info.

I think the fix is to strategically place an osdmap_subscribe() call 
somewhere where we find that our heartbeat checks are failing (either 
immediately or after some time).  That will query the mon for the latest 
osdmap and bring it back up to date.  It should do this periodically but 
not too frequently to avoid loading the mon heavily in a large cluster.

sage

> 
> Thanks,
> Chengguang.
> 
> > > >       Before parts of networks(both the public and cluster networks for
> > > > a range of osds) went down, everything was working well  and the
> > > > osdmap epoch is 100 at the time for example.
> > > >       Then parts of the networks(both the public and cluster networks)
> > > > went down for 3~5 minutes.
> > > >       The influenced osds(osd number is 156 and 50 osds are influenced
> > > > by the failed network) went down by  heartbeat check failure.
> > > >       After the parts of the networks recovered, all influenced osds
> > > > except one osd (let's say osd 8)went online.
> > > >       OSD.8 was down and would not go online although the process for
> > > > osd.8 was running.
> > > >       When I checked  the osd.8 log, I found that its osdmap was still
> > > > 100 and did not change any more after the network failure.
> > > >       But in the ceph cluster, the epoch had increased to a  bigger
> > > > epoch like 160.
> > > >       Does anyone know some bugfixes related to the problem or some
> > > > clues?
> > > >       Best wishes,
> > > >       brandy
> > It sounds to me like it got into a (rare) state where it wasn't chatting
> > with the peer OSDs and didn't hear about the OSDMap change.  Perhaps we
> > should add some sort of fail-safe where the OSDs pings the mon
> > periodically for a new map if everything seems (too) quiet...
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
>