Re: [ceph-users] assert(objiter->second->version > last_divergent_update) when testing pull out disk and insert

Gregory Farnum <gfarnum@xxxxxxxxxx> · Fri, 13 Oct 2017 09:33:44 -0700

On Fri, Oct 13, 2017 at 12:48 AM, zhaomingyue <zhao.mingyue@xxxxxxx> wrote:
> Hi：
>     I had met an assert problem like
> bug16279(http://tracker.ceph.com/issues/16279) when testing pull out disk
> and insert, ceph version 10.2.5，assert(objiter->second->version >
> last_divergent_update)
>
> according to osd log，I think this maybe due to (log.head !=
> *log.log.rbegin.version.version) when some abnormal condition happened,such
> as power off ,pull out disk and insert.

I don't think is supposed to be possible. We apply all changes like
this atomically; FileStore does all its journaling to prevent partial
updates like this.

A few other people have reported the same issue on disk pull, so maybe
there's some *other* issue going on, but the correct fix is by
preventing those two from differing (unless I misunderstand the
context).

Given one of the reporters on that ticket confirms they also had xfs
issues, I find it vastly more likely that something in your kernel
configuration and hardware stack is not writing out data the way it
claims to. Be very, very sure all that is working correctly!

> In below situation, merge_log would push 234’1034 into divergent list;and
> divergent has only one node;then lead to assert(objiter->second->version >
> last_divergent_update).
>
> olog  ----------------   (0’0, 234’1034)  olog.head = 234’1034
>
> log   ----------------   (0’0, 234’1034)  log.head = 234’1033
>
>
>
> I see osd load_pgs code,in function PGLog::read_log() , code like this:
>  .....
>  for (p->seek_to_first(); p->valid() ; p->next()) {
>
> .....
>
>     log.log.push_back(e);
>
>     log.head = e.version;  // every pg log node
>
>   }
>
> .....
>
>  log.head = info.last_update;
>
>
>
> two doubt:
>
> first : why set (log.head = info.last_update) after all pg log node
> processed(every node has updated log.head = e.version)?
>
> second: Whether it can occur that info.last_update is less than
> *log.log.rbegin.version or not and what scene happens?

I'm looking at the luminous code base right now and things have
changed a bit so I don't have the specifics of your question on hand.

But the general reason we change these versions around is because we
need to reconcile the logs across all OSDs. If one OSD has an entry
for an operation that was never returned to the client, we may need to
declare it divergent and undo it. (In replicated pools, entries are
only divergent if the OSD hosting it was either netsplit from the
primary, or else managed to commit something during a failure event
that its peers didn't and then was resubmitted under a different ID by
the client on recovery. In erasure-coded pools things are more
complicated because we can only roll operations forward if a quorum of
the shards are present.)
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html