Re: 7915 is not resolved

Boris Lukashev <blukashev@xxxxxxxxxxxxxxxx> · Tue, 12 Jan 2016 15:21:46 -0500

Having added the following diff to the patch stack and rebuilt, i'm
still seeing the two OSD not come up.
However, with sage's help in IRC, i now have the following patch and
changes to the librados hello world example to clean out bad snaps
found via osd log 20:

diff --git a/src/osd/PG.cc b/src/osd/PG.cc
index d7174af..d78ee31 100644
--- a/src/osd/PG.cc
+++ b/src/osd/PG.cc
@@ -137,7 +137,7 @@ void PGPool::update(OSDMapRef map)
     pi->build_removed_snaps(newly_removed_snaps);
     interval_set<snapid_t> intersection;
     intersection.intersection_of(newly_removed_snaps, cached_removed_snaps);
-    if (!(intersection == cached_removed_snaps)) {
+    if (intersection == cached_removed_snaps) {
       newly_removed_snaps.subtract(cached_removed_snaps);
       cached_removed_snaps.union_of(newly_removed_snaps);
     } else {


in hello world, remove everything after io_ctx is initialized until
the out: section and just before it, add
  /*
   * remove snapshots
   */
  {
    io_ctx.selfmanaged_snap_remove(0x19+0xb);
    io_ctx.selfmanaged_snap_remove(0x19+0xc);
    io_ctx.selfmanaged_snap_remove(0x19+0xd);
    io_ctx.selfmanaged_snap_remove(0x19+0xe);
  }

with appropriate pointers to the snaps.

So, with the OSDs still not starting, i'm curious as to what the next
step is - should i keep trying to get the OSDs up, or should i try to
remove the snaps with the librados bin and then try to bring them up.
Since i'm manually deleting things from Ceph, i figure i only get one
shot so suggestions are very welcome :).

Thanks as always,
-Boris

On Tue, Jan 12, 2016 at 1:03 PM, Boris Lukashev
<blukashev@xxxxxxxxxxxxxxxx> wrote:
> I've put some of the output from debug osd 20 at
> http://pastebin.com/he5snqwF, it seems one of the last operations is
> in fact " activate - purged_snaps [1~5,8~2,b~d,19~f]
> cached_removed_snaps [1~5,8~2,b~d,19~b]" which seems to make sense in
> the context of this mismatch...
> There is an ungodly amount of output from level 20, anything specific
> you'd like me to grep for?
>
> On Tue, Jan 12, 2016 at 12:47 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>> On Tue, 12 Jan 2016, Boris Lukashev wrote:
>>> Should i try altering the patch with the ! removed and reloading the
>>> OSDs? (would read as ' if (intersection == cached_removed_snaps) { ')
>>> Just for my own education on the matter - if there is disagreement
>>> between the contents of an OSD and the map, like in this case where a
>>> pending request seems to be outstanding, is there no mediation process
>>> between the on-disk data (OSD) and metadata (map) services? With XFS
>>> being used underneath most of the time, that strikes me as somewhat
>>> scary - its not the most consistent of filesystems on the best of
>>> days.
>>>
>>> With the mismatch between the OSD and map, but xfs_check coming back
>>> clean, should i be worried about a corrupt cluster in the event that i
>>> can somehow get it running?
>>> I figure that with FireFly dying and Hammer available from Mirantis, i
>>> should upgrade the cluster, but i would like to know what the safest
>>> way forward is - i'd really prefer to keep using Ceph, its been
>>> educational and quite handy, but if i have to rebuild the cluster
>>> it'll need to keep playing nice with the Fuel deployed OpenStack. If i
>>> can get access to the images stored by Glance and Swift metadata, i'll
>>> gladly export and rebuild clean presuming i can figure out how. The
>>> RBD images are already saved (manual export by tracking the rbd
>>> segment hashes from the metadata files bearing volume-UUID
>>> designations matching what i saw in Cinder, and dd-ing chunks into
>>> flat files for raw images). Worst case, if the cluster wont come back
>>> up and give me access to the data, what's the process for getting it
>>> to a "clean" state such that i can upgrade to hammer and reseed my
>>> glance, swift, and volume data from backups/exports? Do i need to
>>> remove and re-add OSDs, or is there some darker magic at play to
>>> ensure there are no remnants of bad data/messages?
>>
>> I think teh safe path is to
>>
>> (1) reproduce with debug osd = 20, so we can see what the removed_snaps is
>> on the pg vs hte one in the osdmap.
>>
>> (2) fix the ! in the patch and restart the osds
>>
>> (3) re-add the deleted snaps to the osdmap so that things are back in
>> sync.  This is possible through the librados API so it should be pretty
>> simple to fix.  But, let's look at what (1) shows first.
>>
>> sage
>>
>>
>>>
>>> Thank you all
>>> -Boris
>>>
>>> On Tue, Jan 12, 2016 at 8:24 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>>> > On Tue, 12 Jan 2016, Mykola Golub wrote:
>>> >> On Mon, Jan 11, 2016 at 09:00:18PM -0500, Boris Lukashev wrote:
>>> >> > In case anyone is following the mailing list later on, we spoke in IRC
>>> >> > and Sage provided a patch - http://fpaste.org/309609/52550203/
>>> >>
>>> >> > diff --git a/src/osd/PG.cc b/src/osd/PG.cc
>>> >> > index dc18aec..f9ee23c 100644
>>> >> > --- a/src/osd/PG.cc
>>> >> > +++ b/src/osd/PG.cc
>>> >> > @@ -135,8 +135,16 @@ void PGPool::update(OSDMapRef map)
>>> >> >    name = map->get_pool_name(id);
>>> >> >    if (pi->get_snap_epoch() == map->get_epoch()) {
>>> >> >      pi->build_removed_snaps(newly_removed_snaps);
>>> >> > -    newly_removed_snaps.subtract(cached_removed_snaps);
>>> >> > -    cached_removed_snaps.union_of(newly_removed_snaps);
>>> >> > +    interval_set<snapid_t> intersection;
>>> >> > +    intersection.intersection_of(newly_removed_snaps, cached_removed_snaps);
>>> >> > +    if (!(intersection == cached_removed_snaps)) {
>>> >> > +      newly_removed_snaps.subtract(cached_removed_snaps);
>>> >>
>>> >> Sage, won't it still violate the assert?
>>> >> "intersection != cached_removed_snaps" means that cached_removed_snaps
>>> >> contains snapshots missed in newly_removed_snaps, and we can't subtract?
>>> >
>>> > Oops, yeah, just remote the !.
>>> >
>>> > As you can see the problem is that the OSDMap's removed snaps shrank
>>> > somehow.  If you crank up logging you can see what the competing sets
>>> > are.
>>> >
>>> > An alternative fix/hack would be to modify the monitor to allow the
>>> > snapids that were previously in the set to be added back into the OSDMap.
>>> > That's arguably a better fix, although it's a bit more work.  But, even
>>> > then, something like the above will be needed since there are still
>>> > OSDMaps in the history where the set is smaller.
>>> >
>>> > sage
>>> >
>>> >>
>>> >> > +      cached_removed_snaps.union_of(newly_removed_snaps);
>>> >> > +    } else {
>>> >> > +      lgeneric_subdout(g_ceph_context, osd, 0) << __func__ << " cached_removed_snaps shrank from " << cached_removed_snaps << dendl;
>>> >> > +      cached_removed_snaps = newly_removed_snaps;
>>> >> > +      newly_removed_snaps.clear();
>>> >> > +    }
>>> >> >      snapc = pi->get_snap_context();
>>> >> >    } else {
>>> >> >      newly_removed_snaps.clear();
>>> >>
>>> >> --
>>> >> Mykola Golub
>>> >>
>>> >>
>>> >>
>>>
>>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html