question about monitor and paxos relationship

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



thanks all, for your great explanation.

Regards 
Pragya Jain


On Saturday, 30 August 2014 4:51 PM, Joao Eduardo Luis <joao.luis at inktank.com> wrote:
 

>
>
>On 08/30/2014 08:03 AM, pragya jain wrote:
>> Thanks Greg, Joao and David,
>>
>> The concept why odd no. of monitors are preferred is clear to me, but
>> still I am not clear about the working of Paxos algorithm:
>>
>> #1. All changes in any data structure of monitor whether it is monitor
>> map, OSD map, PG map, MDS map or CRUSH map; are made through Paxos
>> algorithm and
>> #2. Paxos algorithm also establish a quorum among the monitors for
>> recent copy of cluster map.
>>
>> I am unable to understand how these two things are related and connected
>> ? how does Paxos provide these two functionalities?
>
>As Greg mentioned before, Paxos is a consensus algorithm thus we can 
>leverage Paxos for anything that may require consensus.
>
>We have two portions of the monitors that will use a modified version of 
>Paxos (but still Paxos in nature): map consensus and elections.
>
>Let me give you a (rough) temporal view of how the monitor applies this 
>once it starts.  Say you have 5 monitors total, 2 of which are down.
>
>1. Alive monitors will "probe" all monitors in the monmap (all other 4 
>of them) -- the probing phase is independent from anything-Paxos and is 
>meant to raise awareness to the monitors that are up, alive and reachable.
>
>2. Once enough monitors to form a quorum (i.e., at least (N+1)/2) reply 
>to the probes, the monitors will enter the election phase.
>
>3. The election phase is a stripped-down version of Paxos and goes 
>something like this:
>   - mon.a has rank 0 and thinks it must be the leader
>   - mon.b has rank 1 and thinks it must be the leader
>   - mon.c has rank 2 and thinks it must be the leader
>
>   - mon.a receives mon.b's and mon.c's leader proposals and ignores 
>them as mon.a has a higher rank than mon.b or mon.c (lowest the value, 
>highest the rank)
>
>   - mon.c receives mon.a's leader proposal and defers to mon.a (a's 
>rank 0 > c's rank 2).
>   - mon.c receives mon.b's leader proposal and ignores as it has 
>already deferred to a monitor with higher rank than b's (a's rank 0 > 
>b's rank 1).
>
>   - mon.b receives mon.a's leader proposal and defers to mon.a (a's 
>rank 0 > b's rank 2).
>
>   - mon.a got 3 accepts (mon.a's + mon.b's + mon.c's), which is a 
>absolute majority (3 == (N+1)/2, for N = 5).  mon.a declares itself the 
>leader, every other monitor declares itself a peon.
>
>The election phase follows Paxos 'prepare', 'promise', 'accept' and 
>'accepted' phases.
>
>Same goes for maps.  Once the leader has been elected and the peons 
>established we can state that a quorum was reached.  The quorum is the 
>set of all monitors participating in the cluster, and in this case the 
>quorum will be { mon.a, mon.b, mon.c }.  After a quorum has been 
>established the monitors will be able to allow map modifications as needed.
>
>So say a new OSD is added to the cluster.  The osdmap needs to reflect 
>this.  The leader handles the modification and keeps it on a temporary, 
>to-be-committed osdmap, and proposes the changes to all monitors in the 
>quorum.
>
>1. Leader proposes the modification to all quorum participants.  Each 
>modification is packed with a version and a proposal number.
>
>2. Each monitor will check if it has seen said proposal number before. 
>If not it will take the proposal from the leader, stash it on disk on a 
>temporary location, and will let the leader that it has been accepted. 
>If on the other hand the monitor sees that said proposal number has been 
>proposed before, then it will not accept the proposal and simply ignore 
>the leader.
>
>3. The leader will collect all 'accepts' from peons.  If (N+1)/2 
>monitors (counting with the leader, which accepts its proposals by 
>default) accepted the proposal, then the leader will issue a 'commit' 
>instructing everyone to move the proposal from its temporary location to 
>its final location (for instance, from 'stashed_proposal' to 
>'osdmap:version_10').  If by chance not enough monitors accepted the 
>proposal (i.e., less than (N+1)/2), eventually a timeout will be 
>triggered and the quorum will undergo a new election.
>
>This also follows Paxos 'prepare', 'promise', 'accept' and 'accepted' 
>phases, even if we cut corners to reduce message passing.
>
>Hope this helps.
>
>   -Joao
>
>>
>> Please help to clarify these points.
>>
>> Regards
>> Pragya Jain
>>
>>
>>
>>
>> On Saturday, 30 August 2014 7:29 AM, Joao Eduardo Luis
>> <joao.luis at inktank.com> wrote:
>>
>>
>>
>>     On 08/29/2014 11:22 PM, J David wrote:
>>
>>      > So an even number N of monitors doesn't give you any better fault
>>      > resilience than N-1 monitors.  And the more monitors you have, the
>>      > more traffic there is between them.  So when N is even, N monitors
>>      > consume more resources and provide no extra benefit compared to N-1
>>      > monitors.
>>
>>
>>     Except for more copies ;)
>>
>>     But yeah, if you're going with 2 or 4, you'll be better off with 3
>>     or 5.
>>        As long as you don't go with 1 you should be okay.  Only go with
>>     1 if
>>     you're truly okay with losing whatever you're storing if that one
>>     monitor's disk is fried.
>>
>>        -Joao
>>
>>
>>     --
>>     Joao Eduardo Luis
>>     Software Engineer | http://inktank.com <http://inktank.com/>|
>>    http://ceph.com <http://ceph.com/>
>
>>
>>
>>
>
>
>-- 
>Joao Eduardo Luis
>Software Engineer | http://inktank.com | http://ceph.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140901/8d81ab36/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux