On 12/23/2014 09:10 PM, Sage Weil wrote:
This fun issue came up again in the form of 10422:
http://tracker.ceph.com/issues/10422
I think we have 3 main options:
1. Ask users to do a mon scrub prior to upgrade to
ensure it is safe. If a mon is out of sync, manually kick it out, blow it
away, and resync.
2. Do a one-time broadcast of the full osdmap across mons to ensure they
are consistent after upgrade. Bleh.
3. Include full encoded OSDMap in txns on updates going forward.
I like 3 because it solves this and all related problems going forward.
The local encoding of full osdmaps has proven to be a huge headache.
And, the patch to do it is remarkably simple
https://github.com/ceph/ceph/pull/3247/files
and dovetails well with the new CRC.
I prefer 3 as well. Below is my reply on the pull request, which I
wrote before addressing this email, and I shall leave it here for posterity!
(Also, I think the approach in the pull request is correct)
As far as I can tell, the whole idea about relying solely on incremental
to locally build full osdmaps goes as back as a5e2dcb. This has me
believe that while the idea may have seemed good at the time it may not
have been based on a real issue.
Anyway, relaying a few MB's worth of osdmap (if it gets to that) over
the wire doesn't concern me particularly -- the one thing that may be
annoying is writing them to leveldb.
I fear that writing a just-big enough map to leveldb may cause a hang;
while we do now have the async mechanism to handle this, I fear that we
may end up waiting for a big transaction to be applied to leveldb before
accepting the value (e.g., in Paxos::handle_begin() we will wait for the
value to be applied to the store before we send out
MMonPaxos::OP_ACCEPT). Then again, this can easily be something
surmountable by adjusting timeouts if we ever hit it.
-Joao
What do you think?
sage
--
Joao Eduardo Luis
Software Engineer | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html