Re: Important: RGW multisite bug may silently corrupt encrypted objects on replication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Casey,

Understood, thanks!

That means that the original copy in the site that it was uploaded to is still
safe as long as that copy is not removed, and no underlying changes below
RadosGW in the Ceph storage could corrupt the original copy?

Best regards
Tobias

On 30 May 2023, at 14:48, Casey Bodley <cbodley@xxxxxxxxxx> wrote:

On Tue, May 30, 2023 at 8:22 AM Tobias Urdin <tobias.urdin@xxxxxxxxxx<mailto:tobias.urdin@xxxxxxxxxx>> wrote:

Hello Casey,

Thanks for the information!

Can you please confirm that this is only an issue when using “rgw_crypt_default_encryption_key”
config opt that says “testing only” in the documentation [1] to enable encryption and not when using
Barbican or Vault as KMS or using SSE-C with the S3 API?

unfortunately, all flavors of server-side encryption (SSE-C, SSE-KMS,
SSE-S3, and rgw_crypt_default_encryption_key) are affected by this
bug, as they share the same encryption logic. the main difference is
where they get the key


[1] https://docs.ceph.com/en/quincy/radosgw/encryption/#automatic-encryption-for-testing-only

On 26 May 2023, at 22:45, Casey Bodley <cbodley@xxxxxxxxxx> wrote:

Our downstream QE team recently observed an md5 mismatch of replicated
objects when testing rgw's server-side encryption in multisite. This
corruption is specific to s3 multipart uploads, and only affects the
replicated copy - the original object remains intact. The bug likely
affects Ceph releases all the way back to Luminous where server-side
encryption was first introduced.

To expand on the cause of this corruption: Encryption of multipart
uploads requires special handling around the part boundaries, because
each part is uploaded and encrypted separately. In multisite, objects
are replicated in their encrypted form, and multipart uploads are
replicated as a single part. As a result, the replicated copy loses
its knowledge about the original part boundaries required to decrypt
the data correctly.

We don't have a fix yet, but we're tracking it in
https://tracker.ceph.com/issues/46062. The fix will only modify the
replication logic, so won't repair any objects that have already
replicated incorrectly. We'll need to develop a radosgw-admin command
to search for affected objects and reschedule their replication.

In the meantime, I can only advise multisite users to avoid using
encryption for multipart uploads. If you'd like to scan your cluster
for existing encrypted multipart uploads, you can identify them with a
s3 HeadObject request. The response would include a
x-amz-server-side-encryption header, and the ETag header value (with
"s removed) would be longer than 32 characters (multipart ETags are in
the special form "<md5sum>-<num parts>"). Take care not to delete the
corrupted replicas, because an active-active multisite configuration
would go on to delete the original copy.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux