Re: 7915 is not resolved

Boris Lukashev <blukashev@xxxxxxxxxxxxxxxx> · Tue, 12 Jan 2016 12:43:05 -0500

Should i try altering the patch with the ! removed and reloading the
OSDs? (would read as ' if (intersection == cached_removed_snaps) { ')
Just for my own education on the matter - if there is disagreement
between the contents of an OSD and the map, like in this case where a
pending request seems to be outstanding, is there no mediation process
between the on-disk data (OSD) and metadata (map) services? With XFS
being used underneath most of the time, that strikes me as somewhat
scary - its not the most consistent of filesystems on the best of
days.

With the mismatch between the OSD and map, but xfs_check coming back
clean, should i be worried about a corrupt cluster in the event that i
can somehow get it running?
I figure that with FireFly dying and Hammer available from Mirantis, i
should upgrade the cluster, but i would like to know what the safest
way forward is - i'd really prefer to keep using Ceph, its been
educational and quite handy, but if i have to rebuild the cluster
it'll need to keep playing nice with the Fuel deployed OpenStack. If i
can get access to the images stored by Glance and Swift metadata, i'll
gladly export and rebuild clean presuming i can figure out how. The
RBD images are already saved (manual export by tracking the rbd
segment hashes from the metadata files bearing volume-UUID
designations matching what i saw in Cinder, and dd-ing chunks into
flat files for raw images). Worst case, if the cluster wont come back
up and give me access to the data, what's the process for getting it
to a "clean" state such that i can upgrade to hammer and reseed my
glance, swift, and volume data from backups/exports? Do i need to
remove and re-add OSDs, or is there some darker magic at play to
ensure there are no remnants of bad data/messages?

Thank you all
-Boris

On Tue, Jan 12, 2016 at 8:24 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Tue, 12 Jan 2016, Mykola Golub wrote:
>> On Mon, Jan 11, 2016 at 09:00:18PM -0500, Boris Lukashev wrote:
>> > In case anyone is following the mailing list later on, we spoke in IRC
>> > and Sage provided a patch - http://fpaste.org/309609/52550203/
>>
>> > diff --git a/src/osd/PG.cc b/src/osd/PG.cc
>> > index dc18aec..f9ee23c 100644
>> > --- a/src/osd/PG.cc
>> > +++ b/src/osd/PG.cc
>> > @@ -135,8 +135,16 @@ void PGPool::update(OSDMapRef map)
>> >    name = map->get_pool_name(id);
>> >    if (pi->get_snap_epoch() == map->get_epoch()) {
>> >      pi->build_removed_snaps(newly_removed_snaps);
>> > -    newly_removed_snaps.subtract(cached_removed_snaps);
>> > -    cached_removed_snaps.union_of(newly_removed_snaps);
>> > +    interval_set<snapid_t> intersection;
>> > +    intersection.intersection_of(newly_removed_snaps, cached_removed_snaps);
>> > +    if (!(intersection == cached_removed_snaps)) {
>> > +      newly_removed_snaps.subtract(cached_removed_snaps);
>>
>> Sage, won't it still violate the assert?
>> "intersection != cached_removed_snaps" means that cached_removed_snaps
>> contains snapshots missed in newly_removed_snaps, and we can't subtract?
>
> Oops, yeah, just remote the !.
>
> As you can see the problem is that the OSDMap's removed snaps shrank
> somehow.  If you crank up logging you can see what the competing sets
> are.
>
> An alternative fix/hack would be to modify the monitor to allow the
> snapids that were previously in the set to be added back into the OSDMap.
> That's arguably a better fix, although it's a bit more work.  But, even
> then, something like the above will be needed since there are still
> OSDMaps in the history where the set is smaller.
>
> sage
>
>>
>> > +      cached_removed_snaps.union_of(newly_removed_snaps);
>> > +    } else {
>> > +      lgeneric_subdout(g_ceph_context, osd, 0) << __func__ << " cached_removed_snaps shrank from " << cached_removed_snaps << dendl;
>> > +      cached_removed_snaps = newly_removed_snaps;
>> > +      newly_removed_snaps.clear();
>> > +    }
>> >      snapc = pi->get_snap_context();
>> >    } else {
>> >      newly_removed_snaps.clear();
>>
>> --
>> Mykola Golub
>>
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html