On Sat, Dec 10, 2016 at 11:00 PM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > I should clarify that if the OSD has silently failed (e.g. the TCP > connection wasn't reset and packets are just silently being dropped / > not being acked), IO will pause for up to "osd_heartbeat_grace" before The number is how long an OSD will wait for a response from another OSD before telling the MONs that it's not responding. > IO can proceed again. > > On Sat, Dec 10, 2016 at 8:46 AM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote: >> On Sat, Dec 10, 2016 at 6:11 AM, zhong-yan.gu <guzy126@xxxxxxx> wrote: >>> Hi Jason, >>> sorry to bother you. A question about io consistency in osd down case : >>> 1. a write op arrives primary osd A >>> 2. osd A does local write and sends out replica writes to osd B and C >>> 3. B finishes write and return ACK to A. However C is down and has no chance >>> to send out ACK. >>> >>> In this case A will not reply ACK to client. after a while cluster detects C >>> is down and enters peering. after peering, how will be the previous write >>> op to be processed? >> >> AFAIK, assuming you have a replica size of 3 and a minimum replica >> size of 2, losing one OSD within the PG set won't be enough to stall >> the write operation assuming it wasn't the primary PG OSD that went >> offline. Quorum was available so both online OSDs were able to log the >> transaction to help recover the offline OSD when it becomes available >> again. Once the offline OSD comes back, it can replay the log received >> from its peers to get back in sync. There is actually lots of >> available documentation on this process [1]. >> >>> Does the client still have a chance to receive the ACK? >> >> Yup, the client will receive the ACK as soon as <min size> PGs have >> safely committed the IO. >> >>> The next time if client read the corresponding data, is it updated or not? >> >> If an IO has been ACKed back to the client, future reads to that >> extent will return the committed data (we don't want to go backwards >> in time). >> >>> For the case that both B and C are down before ack replied to A, is there >>> any difference? >> >> Assuming you have <min_size> = 2 (the default), your IO would be >> blocked until those OSDs come back online or until the mons have >> detected those OSDs are dead and has remapped the affected PGs to new >> (online) OSDs. >> >>> >>> Is there any case in which ceph finished writes silently but no ack to >>> clients? >> >> Sure, if your client dies before it receives the ACK from the OSDs. >> However, your data is still crash consistent. >> >>> Zhongyan >>> >> >> [1] https://github.com/ceph/ceph/blob/master/doc/dev/osd_internals/log_based_pg.rst >> >> -- >> Jason > > > > -- > Jason > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com