Should i try altering the patch with the ! removed and reloading the OSDs? (would read as ' if (intersection == cached_removed_snaps) { ') Just for my own education on the matter - if there is disagreement between the contents of an OSD and the map, like in this case where a pending request seems to be outstanding, is there no mediation process between the on-disk data (OSD) and metadata (map) services? With XFS being used underneath most of the time, that strikes me as somewhat scary - its not the most consistent of filesystems on the best of days. With the mismatch between the OSD and map, but xfs_check coming back clean, should i be worried about a corrupt cluster in the event that i can somehow get it running? I figure that with FireFly dying and Hammer available from Mirantis, i should upgrade the cluster, but i would like to know what the safest way forward is - i'd really prefer to keep using Ceph, its been educational and quite handy, but if i have to rebuild the cluster it'll need to keep playing nice with the Fuel deployed OpenStack. If i can get access to the images stored by Glance and Swift metadata, i'll gladly export and rebuild clean presuming i can figure out how. The RBD images are already saved (manual export by tracking the rbd segment hashes from the metadata files bearing volume-UUID designations matching what i saw in Cinder, and dd-ing chunks into flat files for raw images). Worst case, if the cluster wont come back up and give me access to the data, what's the process for getting it to a "clean" state such that i can upgrade to hammer and reseed my glance, swift, and volume data from backups/exports? Do i need to remove and re-add OSDs, or is there some darker magic at play to ensure there are no remnants of bad data/messages? Thank you all -Boris On Tue, Jan 12, 2016 at 8:24 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Tue, 12 Jan 2016, Mykola Golub wrote: >> On Mon, Jan 11, 2016 at 09:00:18PM -0500, Boris Lukashev wrote: >> > In case anyone is following the mailing list later on, we spoke in IRC >> > and Sage provided a patch - http://fpaste.org/309609/52550203/ >> >> > diff --git a/src/osd/PG.cc b/src/osd/PG.cc >> > index dc18aec..f9ee23c 100644 >> > --- a/src/osd/PG.cc >> > +++ b/src/osd/PG.cc >> > @@ -135,8 +135,16 @@ void PGPool::update(OSDMapRef map) >> > name = map->get_pool_name(id); >> > if (pi->get_snap_epoch() == map->get_epoch()) { >> > pi->build_removed_snaps(newly_removed_snaps); >> > - newly_removed_snaps.subtract(cached_removed_snaps); >> > - cached_removed_snaps.union_of(newly_removed_snaps); >> > + interval_set<snapid_t> intersection; >> > + intersection.intersection_of(newly_removed_snaps, cached_removed_snaps); >> > + if (!(intersection == cached_removed_snaps)) { >> > + newly_removed_snaps.subtract(cached_removed_snaps); >> >> Sage, won't it still violate the assert? >> "intersection != cached_removed_snaps" means that cached_removed_snaps >> contains snapshots missed in newly_removed_snaps, and we can't subtract? > > Oops, yeah, just remote the !. > > As you can see the problem is that the OSDMap's removed snaps shrank > somehow. If you crank up logging you can see what the competing sets > are. > > An alternative fix/hack would be to modify the monitor to allow the > snapids that were previously in the set to be added back into the OSDMap. > That's arguably a better fix, although it's a bit more work. But, even > then, something like the above will be needed since there are still > OSDMaps in the history where the set is smaller. > > sage > >> >> > + cached_removed_snaps.union_of(newly_removed_snaps); >> > + } else { >> > + lgeneric_subdout(g_ceph_context, osd, 0) << __func__ << " cached_removed_snaps shrank from " << cached_removed_snaps << dendl; >> > + cached_removed_snaps = newly_removed_snaps; >> > + newly_removed_snaps.clear(); >> > + } >> > snapc = pi->get_snap_context(); >> > } else { >> > newly_removed_snaps.clear(); >> >> -- >> Mykola Golub >> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html