Re: partial acks when send reply to client to reduce write latency

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 16 Apr 2015 15:59:42 -0700

On Thu, Apr 9, 2015 at 11:38 PM, 池信泽 <xmdxcxz@xxxxxxxxx> wrote:
> hi, all:
>
>     Now, ceph should received all ack message from remote and then
> reply ack to client, What
>
> about directly reply to client if primary has been received some of
> them. Below is the request
>
> trace among osd. Primary wait for second sub_op_commit_rec msg for a long time.
>
>     Does it make sense?

It makes sense on one level, but unfortunately it's just not feasible.
It would change how peering needs to work — right now, we need to
contact at least one OSD that is active in any interval. If we allowed
commits to happen without having hit disk on every OSD, we need to
talk to all the OSDs in every interval (or at least, {num_OSDs} -
{number_we_require_ack} + 1 of them), which would be pretty bad for
our failure resiliency.

This comes up every so often as a suggestion and is a lot more
feasible with erasure coding — Yahoo has already implemented the
read-side version of this
(http://yahooeng.tumblr.com/post/116391291701/yahoo-cloud-object-store-object-storage-at),
but doing it on the write side would still take a lot of work.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html