> 1. Write object A from client. > 2. Fsync to primary device completes. > 3. Ack to client. > 4. Writes sent to replicas. [...] As mentioned in the discussion this proposal is the opposite of what the current policy, is, which is to wait for all replicas to be written before writes are acknowledged to the client: https://github.com/ceph/ceph/blob/main/doc/architecture.rst "After identifying the target placement group, the client writes the object to the identified placement group's primary OSD. The primary OSD then [...] confirms that the object was stored successfully in the secondary and tertiary OSDs, and reports to the client that the object was stored successfully." A more revolutionary option would be for 'librados' to write in parallel to all the "active set" OSDs and report this to the primary, but that would greatly increase client-Ceph traffic, while the current logic increases traffic only among OSDs. > So I think that to maintain any semblance of reliability, > you'd need to at least wait for a commit ack from the first > replica (i.e. min_size=2). Perhaps it could be similar to 'k'+'m' for EC, that is 'k' synchronous (write completes to the client only when all at least 'k' replicas, including primary, have been committed) and 'm' asynchronous, instead of 'k' being just 1 or 2. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx