Re: A question about io consistency in osd down case

Jason Dillaman <jdillama@xxxxxxxxxx> · Sat, 10 Dec 2016 09:00:20 -0500

I should clarify that if the OSD has silently failed (e.g. the TCP
connection wasn't reset and packets are just silently being dropped /
not being acked), IO will pause for up to "osd_heartbeat_grace" before
IO can proceed again.

On Sat, Dec 10, 2016 at 8:46 AM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
> On Sat, Dec 10, 2016 at 6:11 AM, zhong-yan.gu <guzy126@xxxxxxx> wrote:
>> Hi Jason,
>> sorry to bother you. A question about io consistency in osd down case :
>> 1. a write op arrives primary osd A
>> 2. osd A does  local write and sends out replica writes to osd B and C
>> 3. B finishes write and return ACK to A. However C is down and has no chance
>> to send out ACK.
>>
>> In this case A will not reply ACK to client. after a while cluster detects C
>> is down and enters peering. after peering,  how will be the previous write
>> op to be processed?
>
> AFAIK, assuming you have a replica size of 3 and a minimum replica
> size of 2, losing one OSD within the PG set won't be enough to stall
> the write operation assuming it wasn't the primary PG OSD that went
> offline. Quorum was available so both online OSDs were able to log the
> transaction to help recover the offline OSD when it becomes available
> again. Once the offline OSD comes back, it can replay the log received
> from its peers to get back in sync. There is actually lots of
> available documentation on this process [1].
>
>> Does the client still have a chance to receive the ACK?
>
> Yup, the client will receive the ACK as soon as <min size> PGs have
> safely committed the IO.
>
>> The next time if client read the corresponding data, is it updated or not?
>
> If an IO has been ACKed back to the client, future reads to that
> extent will return the committed data (we don't want to go backwards
> in time).
>
>> For the case that both B and C are down before ack replied to A, is there
>> any difference?
>
> Assuming you have <min_size> = 2 (the default), your IO would be
> blocked until those OSDs come back online or until the mons have
> detected those OSDs are dead and has remapped the affected PGs to new
> (online) OSDs.
>
>>
>> Is there any case in which ceph finished writes silently but no ack to
>> clients?
>
> Sure, if your client dies before it receives the ACK from the OSDs.
> However, your data is still crash consistent.
>
>> Zhongyan
>>
>
> [1] https://github.com/ceph/ceph/blob/master/doc/dev/osd_internals/log_based_pg.rst
>
> --
> Jason

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com