On a large Hammer-based cluster (> 1 Gobjects) we are seeing a small amount of objects being truncated. All of these objects are between 512kB and 4MB in size and they are not uploaded as multipart, so the first 512kB get stored into the head object and the next chunks should be in tail objects named <bucket_id>__shadow_<tag>_N, but the latter seem to go missing sometimes. The PUT operation for these objects is logged as successful (HTTP code 200), so I'm currently having two hypotheses as to what might be happening: 1. The object is received by the radosgw process, the head object is written successfully, then the write for the tail object somehow fails. So the question is whether this is possible or whether radosgw will always wait until all operations have completed successfully before returning the 200. This blog [1] at least mentions some asynchronous operations. 2. The full object is written correctly, but the tail objects are getting deleted somehow afterwards. This might happen during garbage collection if there was a collision between the tail object names for two objects, but again I'm not sure whether this is possible. So the question is whether anyone else has seen this issue, also whether it may possibly be fixed in Jewel or later. The second issue is what happens when a client tries to access such an truncated object. The radosgw first answers with the full headers and a content-length of e.g. 600000, then sends the first chunk of data (524288 bytes) from the head object. After that it tries to read the first tail object, but receives an error -2 (file not found). radosgw now tries to send a 404 status with a NoSuchKey error in XML body, but of course this is too late, the clients sees this as part of the object data. After that, the connection stays open, the clients waits for the rest of the object to be sent and times out with an error in the end. Or, if the original object was just slightly larger than 512k, the client will append the 404 header at that point and continue with corrupted data, hopefully checking the MD5 sum and noticing the issue. This behaviour is still unchanged at least in Jewel and you can easily reproduce it by manually deleting the shadow object from the bucket pool after you have uploaded an object of the proper size. I have created a bug report with the first issue[2], please let me know whether you would like a different ticket for the second one. [1] http://www.ksingh.co.in/blog/2017/01/15/ceph-object-storage-performance-improvement-using-indexless-buckets/ [2] http://tracker.ceph.com/issues/20107 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com