Re: full osdmaps in mon txns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/23/2014 09:10 PM, Sage Weil wrote:
This fun issue came up again in the form of 10422:

	http://tracker.ceph.com/issues/10422

I think we have 3 main options:

1. Ask users to do a mon scrub prior to upgrade to
ensure it is safe.  If a mon is out of sync, manually kick it out, blow it
away, and resync.

2. Do a one-time broadcast of the full osdmap across mons to ensure they
are consistent after upgrade.  Bleh.

3. Include full encoded OSDMap in txns on updates going forward.

I like 3 because it solves this and all related problems going forward.
The local encoding of full osdmaps has proven to be a huge headache.
And, the patch to do it is remarkably simple

	https://github.com/ceph/ceph/pull/3247/files

and dovetails well with the new CRC.

I prefer 3 as well. Below is my reply on the pull request, which I wrote before addressing this email, and I shall leave it here for posterity!

(Also, I think the approach in the pull request is correct)

As far as I can tell, the whole idea about relying solely on incremental to locally build full osdmaps goes as back as a5e2dcb. This has me believe that while the idea may have seemed good at the time it may not have been based on a real issue.

Anyway, relaying a few MB's worth of osdmap (if it gets to that) over the wire doesn't concern me particularly -- the one thing that may be annoying is writing them to leveldb.

I fear that writing a just-big enough map to leveldb may cause a hang; while we do now have the async mechanism to handle this, I fear that we may end up waiting for a big transaction to be applied to leveldb before accepting the value (e.g., in Paxos::handle_begin() we will wait for the value to be applied to the store before we send out MMonPaxos::OP_ACCEPT). Then again, this can easily be something surmountable by adjusting timeouts if we ever hit it.


  -Joao


What do you think?
sage



--
Joao Eduardo Luis
Software Engineer | http://ceph.com
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux