Re: About the blueprint OSD: Transactions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 3 Mar 2015, Li Wang wrote:
> Hi Sage,
>   We are pretty interested in the multi-object transaction support,
> we think it is potencially very useful. we have read your implementation
> description, and summarize it as below, please check if our
> understanding is correct,
> 
> 1 client select a master, and sends full txn to master
> 2 master holds txn in memory, sends PREPAREs to slaves
> 3 slaves persist PREPARE on the side, send PREPARE_ACK,
>   in the case there is a compare-then-write operation,
>   and compartion fail, slave will send PREPARE_FAIL instead
> 4 master collects all PREPARE_ACKs and applies the txn
>   and marks txn COMMITTING, in the case a PREPARE_FAIL received,
>   master send slaves ROLL_BACK, and the slaves will discard
>   the prepared txn
> 5 once persisted, master send COMMITs to slaves
> 6 master replies to client COMMITED, to enable client to proceed
>   to do other operations except reading the commited data
> 7 slaves get COMMIT and apply, reply with COMMIT_ACK
> 8 master collect COMMIT_ACK and reply to client FINISHED, to enable
>   client read the data
> 9 master closes out txn record

Yep!   Plus the failure path handling...

> We think it manifiests to implement a transaction itself, however,
> it did not take into account the cases that concurrent multiple transactions,
> how to enforce the order and atomicity among the distributed transactions,
> how to do locking and dead locking avoidance, it seems there are
> some further desgining jobs to do.

Yeah.  I think it would be nice if we can define a few simple flags 
indicating whether the masters and/or slaves are readable during the 
prepared-but-uncommitted phase, as there are different requirements for 
different users.

And we need to pick a (simple!) deadlock avoidance approach.  Maybe a 
simple EAGAIN is enough and leave it to the clients to be consistent about 
which object to choose as the master.

> We are wondering if you can move this blueprint discussion into a
> UTC+8 friendly time, so that we can involve in

I think Patrick is moving it!

Thanks-
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux