On Tue, 3 Mar 2015, Li Wang wrote: > Hi Sage, > We are pretty interested in the multi-object transaction support, > we think it is potencially very useful. we have read your implementation > description, and summarize it as below, please check if our > understanding is correct, > > 1 client select a master, and sends full txn to master > 2 master holds txn in memory, sends PREPAREs to slaves > 3 slaves persist PREPARE on the side, send PREPARE_ACK, > in the case there is a compare-then-write operation, > and compartion fail, slave will send PREPARE_FAIL instead > 4 master collects all PREPARE_ACKs and applies the txn > and marks txn COMMITTING, in the case a PREPARE_FAIL received, > master send slaves ROLL_BACK, and the slaves will discard > the prepared txn > 5 once persisted, master send COMMITs to slaves > 6 master replies to client COMMITED, to enable client to proceed > to do other operations except reading the commited data > 7 slaves get COMMIT and apply, reply with COMMIT_ACK > 8 master collect COMMIT_ACK and reply to client FINISHED, to enable > client read the data > 9 master closes out txn record Yep! Plus the failure path handling... > We think it manifiests to implement a transaction itself, however, > it did not take into account the cases that concurrent multiple transactions, > how to enforce the order and atomicity among the distributed transactions, > how to do locking and dead locking avoidance, it seems there are > some further desgining jobs to do. Yeah. I think it would be nice if we can define a few simple flags indicating whether the masters and/or slaves are readable during the prepared-but-uncommitted phase, as there are different requirements for different users. And we need to pick a (simple!) deadlock avoidance approach. Maybe a simple EAGAIN is enough and leave it to the clients to be consistent about which object to choose as the master. > We are wondering if you can move this blueprint discussion into a > UTC+8 friendly time, so that we can involve in I think Patrick is moving it! Thanks- sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html