On Sat, Dec 10, 2016 at 6:11 AM, zhong-yan.gu <guzy126@xxxxxxx> wrote: > Hi Jason, > sorry to bother you. A question about io consistency in osd down case : > 1. a write op arrives primary osd A > 2. osd A does local write and sends out replica writes to osd B and C > 3. B finishes write and return ACK to A. However C is down and has no chance > to send out ACK. > > In this case A will not reply ACK to client. after a while cluster detects C > is down and enters peering. after peering, how will be the previous write > op to be processed? AFAIK, assuming you have a replica size of 3 and a minimum replica size of 2, losing one OSD within the PG set won't be enough to stall the write operation assuming it wasn't the primary PG OSD that went offline. Quorum was available so both online OSDs were able to log the transaction to help recover the offline OSD when it becomes available again. Once the offline OSD comes back, it can replay the log received from its peers to get back in sync. There is actually lots of available documentation on this process [1]. > Does the client still have a chance to receive the ACK? Yup, the client will receive the ACK as soon as <min size> PGs have safely committed the IO. > The next time if client read the corresponding data, is it updated or not? If an IO has been ACKed back to the client, future reads to that extent will return the committed data (we don't want to go backwards in time). > For the case that both B and C are down before ack replied to A, is there > any difference? Assuming you have <min_size> = 2 (the default), your IO would be blocked until those OSDs come back online or until the mons have detected those OSDs are dead and has remapped the affected PGs to new (online) OSDs. > > Is there any case in which ceph finished writes silently but no ack to > clients? Sure, if your client dies before it receives the ACK from the OSDs. However, your data is still crash consistent. > Zhongyan > [1] https://github.com/ceph/ceph/blob/master/doc/dev/osd_internals/log_based_pg.rst -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com