I should clarify that if the OSD has silently failed (e.g. the TCP connection wasn't reset and packets are just silently being dropped / not being acked), IO will pause for up to "osd_heartbeat_grace" before IO can proceed again. On Sat, Dec 10, 2016 at 8:46 AM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > On Sat, Dec 10, 2016 at 6:11 AM, zhong-yan.gu <guzy126@xxxxxxx> wrote: >> Hi Jason, >> sorry to bother you. A question about io consistency in osd down case : >> 1. a write op arrives primary osd A >> 2. osd A does local write and sends out replica writes to osd B and C >> 3. B finishes write and return ACK to A. However C is down and has no chance >> to send out ACK. >> >> In this case A will not reply ACK to client. after a while cluster detects C >> is down and enters peering. after peering, how will be the previous write >> op to be processed? > > AFAIK, assuming you have a replica size of 3 and a minimum replica > size of 2, losing one OSD within the PG set won't be enough to stall > the write operation assuming it wasn't the primary PG OSD that went > offline. Quorum was available so both online OSDs were able to log the > transaction to help recover the offline OSD when it becomes available > again. Once the offline OSD comes back, it can replay the log received > from its peers to get back in sync. There is actually lots of > available documentation on this process [1]. > >> Does the client still have a chance to receive the ACK? > > Yup, the client will receive the ACK as soon as <min size> PGs have > safely committed the IO. > >> The next time if client read the corresponding data, is it updated or not? > > If an IO has been ACKed back to the client, future reads to that > extent will return the committed data (we don't want to go backwards > in time). > >> For the case that both B and C are down before ack replied to A, is there >> any difference? > > Assuming you have <min_size> = 2 (the default), your IO would be > blocked until those OSDs come back online or until the mons have > detected those OSDs are dead and has remapped the affected PGs to new > (online) OSDs. > >> >> Is there any case in which ceph finished writes silently but no ack to >> clients? > > Sure, if your client dies before it receives the ACK from the OSDs. > However, your data is still crash consistent. > >> Zhongyan >> > > [1] https://github.com/ceph/ceph/blob/master/doc/dev/osd_internals/log_based_pg.rst > > -- > Jason -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com