Our downstream QE team recently observed an md5 mismatch of replicated objects when testing rgw's server-side encryption in multisite. This corruption is specific to s3 multipart uploads, and only affects the replicated copy - the original object remains intact. The bug likely affects Ceph releases all the way back to Luminous where server-side encryption was first introduced. To expand on the cause of this corruption: Encryption of multipart uploads requires special handling around the part boundaries, because each part is uploaded and encrypted separately. In multisite, objects are replicated in their encrypted form, and multipart uploads are replicated as a single part. As a result, the replicated copy loses its knowledge about the original part boundaries required to decrypt the data correctly. We don't have a fix yet, but we're tracking it in https://tracker.ceph.com/issues/46062. The fix will only modify the replication logic, so won't repair any objects that have already replicated incorrectly. We'll need to develop a radosgw-admin command to search for affected objects and reschedule their replication. In the meantime, I can only advise multisite users to avoid using encryption for multipart uploads. If you'd like to scan your cluster for existing encrypted multipart uploads, you can identify them with a s3 HeadObject request. The response would include a x-amz-server-side-encryption header, and the ETag header value (with "s removed) would be longer than 32 characters (multipart ETags are in the special form "<md5sum>-<num parts>"). Take care not to delete the corrupted replicas, because an active-active multisite configuration would go on to delete the original copy. _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx