Re: Paxos vs Raft

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



While reading the Raft paper today and remembering the Paxos implementation in Ceph, I was amazed that it looked so similar. Thanks to your explanation I now understand why ;-)

On 14/09/2013 18:48, Gregory Farnum wrote:
> On Fri, Sep 13, 2013 at 11:39 PM, Loic Dachary <loic@xxxxxxxxxxx> wrote:
>> Hi,
>>
>> Ceph ( http://ceph.com/ ) relies on a custom implementation of Paxos to provide exabyte scale distributed storage. Like most people recently exposed to Paxos, I struggle to understand it ... but will keep studying until I get it :-) When a friend mentionned Raft (  http://en.wikipedia.org/wiki/Raft_%28computer_science%29 ), it looked like an easy way out. But it's very recent and I would very much appreciate your opinion. Do you think it is a viable alternative to Paxos ?
> 
> Raft *is* the Paxos people use for all intents and purposes. The
> original Paxos paper and the follow-on "Paxos Made Simple" are very
> much mathematical algorithm papers which describe the necessary
> constraints on a system with Paxos' properties, then define a very
> general system which solves them, then describe a somewhat more
> practical leader-based system. Every implementation I've seen in the
> wild takes that leader system and then applies some of the
> simplifications/enhancements which Lamport suggests in the end of his
> original paper and that Raft has more precisely specified: you elect a
> single leader (using what you might consider to be the full paxos
> system, with very low commit rates!) who is the only one able to
> propose values, then that leader proposes a stream of values which are
> accepted by followers and applied to a shared state (eg, our leveldb
> instance), and recovery happens by electing a new leader who gathers
> the log off of all the other nodes in order to learn what's been
> committed and what can be committed.
> The reason people are enjoying Raft is that it's targeted at system
> implementors instead of theoreticians, so the logical components are
> called out a little more clearly and the phases are separated the way
> you would split them when implementing the algorithms. That said, I'm
> not sure it's *actually* more understandable (even their own test
> results don't really support that assertion); I think you should just
> read both papers and then use whichever one is more understandable as
> the basis for further discussion until you really grok these
> consistency algorithms.
> 
> On Sat, Sep 14, 2013 at 8:16 AM, Noah Watkins <noah.watkins@xxxxxxxxxxx> wrote:
>> I'm curious about what exactly the consensus requirement and
>> assumptions are for the monitors. For instance, in the discussion
>> between Loic and Joao, this statement:
>>
>>   Joao: : the recovery logic in our implementation tries to aleviate
>> the burden of recovering multiple versions at the same time. We
>> propose a version, let the peons accept it, then move on to the next
>> version. On ceph, we only provide one value at a time.
>>
>> seems to indicate that the leader is proposing changes sequentially.
>> However, that makes Ceph's use of paxos sound a lot like the reason
>> for the development of the Zab protocol used in Zookeeper:
>>
>>   https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+vs.+Paxos
> 
> Yes. Our throughput expectations/requirements are significantly lower
> than Zookeeper's. We could extend them to create a pipeline if we
> really wanted to; the one-at-a-time isn't fundamental to the
> algorithms we're using that I can recall. (I am somewhat irked by the
> claim that Zab is a significantly different algorithm from Paxos. It
> certainly fits into the Paxos family of algorithms, although it might
> not be explicitly called out as a variation implementers could use in
> the original paper like most others are.)
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Loïc Dachary, Artisan Logiciel Libre
All that is necessary for the triumph of evil is that good people do nothing.

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux