Greetings,
We are running a number of Ceph clusters in production to provide object storage services. We have stumbled upon an issue where objects of certain sizes are irretrievable. The symptoms are very similar to the fix referenced here: https://www.redhat.com/archives/rhsa-announce/2015-November/msg00060.html. We can put objects into the cluster via s3/radosgw, but we cannot retrieve them (cluster closes the connection without delivering all bytes). Unfortunately, this fix does not apply to us, as we are and have always been running Hammer. We've stumbled on a brand-new edge case.Objects of exactly 4.5MiB (4718592 bytes) can be placed into the cluster but not retrieved. At every interval of `rgw object stripe size` thereafter (in our case, 4 MiB), the objects are similarly irretrievable. We have tested this from 4.5 to 24.5 MiB, then have spot-checked for much larger values to prove the pattern holds. There is a small range of bytes less than this boundary that are irretrievable. After much testing, we have found this boundary to be strongly correlated with the k value in our erasure coded pool. We have observed that the m value in the erasure coding has no effect on the window size. We have tested erasure coded values of k from 2 to 9, and we've observed the following ranges:
k = 4, m = 2 -> No error
This issue cannot be reproduced using rados to place objects directly into EC pools. The issue has only been observed with using RadosGW's S3 interface.
The issue can be reproduced with any S3 client (s3cmd, s3curl, CyberDuck, CloudBerry Backup, and many others have been tested).
Furthermore, we believe the objects to be corrupted at the point they are placed into the cluster. We have tested copying the .rgw.buckets pool to a non-erasure coded pool, then swapping names, and we have found that objects copied from the EC pool to the non-EC pool to be irretrievable once RGW is pointed to the non-EC pool. If we overwrite the object in the non-EC pool with the original, it becomes retrievable again. This has not been tested as exhaustively, though, but we felt it important enough to mention.
I'm sure I've omitted some details here that would aid in an investigation, so please let me know what other information I can provide. My team will be filing an issue shortly.
Many thanks,
Brian Felton
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com