On Thu, Apr 9, 2015 at 11:38 PM, 池信泽 <xmdxcxz@xxxxxxxxx> wrote: > hi, all: > > Now, ceph should received all ack message from remote and then > reply ack to client, What > > about directly reply to client if primary has been received some of > them. Below is the request > > trace among osd. Primary wait for second sub_op_commit_rec msg for a long time. > > Does it make sense? It makes sense on one level, but unfortunately it's just not feasible. It would change how peering needs to work — right now, we need to contact at least one OSD that is active in any interval. If we allowed commits to happen without having hit disk on every OSD, we need to talk to all the OSDs in every interval (or at least, {num_OSDs} - {number_we_require_ack} + 1 of them), which would be pretty bad for our failure resiliency. This comes up every so often as a suggestion and is a lot more feasible with erasure coding — Yahoo has already implemented the read-side version of this (http://yahooeng.tumblr.com/post/116391291701/yahoo-cloud-object-store-object-storage-at), but doing it on the write side would still take a lot of work. -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html