Hello Yehuda and the rest of the mailing list. My main question currently is why are the bucket index and the object manifest ever different? Based on how we are uploading data I do not think that the rados gateway should ever know the full file size without having all of the objects within ceph at one point in time. So after the multipart is marked as completed Rados gateway should cat through all of the objects and make a complete part, correct? Secondly, I think I am not understanding the process to grab all of the parts correctly. To continue to use my example file "86b6fad8-3c53-465f-8758-2009d6df01e9/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam" in bucket tcga_cghub_protected. I would be using the following to grab the prefix: prefix=$(radosgw-admin object stat --bucket=tcga_cghub_protected --object=86b6fad8-3c53-465f-8758-2009d6df01e9/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam | grep -iE '"prefix"' | awk -F"\"" '{print $4}') Which should take everything between quotes for the prefix key and give me the value. In this case:: "prefix": "86b6fad8-3c53-465f-8758-2009d6df01e9\/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam.2\/YAROhWaAm9LPwCHeP55cD4CKlLC0B4S", So lacadmin@kh10-9:~$ echo ${prefix} 86b6fad8-3c53-465f-8758-2009d6df01e9\/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam.2\/YAROhWaAm9LPwCHeP55cD4CKlLC0B4S From here I list all of the objects in the .rgw.buckets pool and grep for that said prefix which yields 1335 objects. From here if I cat all of these objects together I only end up with a 5468160 byte file which is 2G short of what the object manifest says it should be. If I grab the file and tail the Rados gateway log I end up with 1849 objects and when I sum them all up I end up with 7744771642 which is the same size that the manifest reports. I understand that this does nothing other than verify the manifests accuracy but I still find it interesting. The missing chunks may still exist in ceph outside of the object manifest and tagged with the same prefix, correct? Or am I misunderstanding something? We have over 40384 files in the tcga_cghub_protected bucket and only 66 of these files are suffering from this truncation issue. What I need to know is: is this happening on the gateway side or on the client side? Next I need to know what possible actions can occur where the bucket index and the object manifest would be mismatched like this as 40318 out of 40384 are working without issue. The truncated files are of all different sizes (5 megabytes - 980 gigabytes) and the truncation seems to be all over. By "all over" I mean some files are missing the first few bytes that should read "bam" and some are missing parts in the middle. So our upload code is using mmap to stream chunks of the file to the Rados gateway via a multipart upload but no where on the client side do we have a direct reference to the files we are using nor do we specify the size in anyway. So where is the gateway getting the correct complete filesize from and how is the bucket index showing the intended file size? This implies that, at some point in time, ceph was able to see all of the parts of the file and calculate the correct total size. This to me seems like a rados gateway bug regardless of how the file is being uploaded. I think that the RGW should be able to be fuzzed and still store the data correctly. Why is the bucket list not matching the bucket index and how can I verify that the data is not being corrupted by the RGW or worse, after it is committed to ceph? |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com