Re: Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

Sean <seapasulli@xxxxxxxxxxxx> · Tue, 05 May 2015 14:14:19 -0500



      Hello
          Yehuda and the rest of the mailing list.
      

      My
          main question currently is why are the bucket index and the
          object manifest ever different? Based on how we are uploading
          data I do not think that the rados gateway should ever know
          the full file size without having all of the objects within
          ceph at one point in time. So after the multipart is marked as
          completed Rados gateway should cat through all of the objects
          and make a complete part, correct?
      

      Secondly,
      I
          think I am not understanding the process to grab all of the
          parts correctly. To continue to use my example file
          "86b6fad8-3c53-465f-8758-2009d6df01e9/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam"
          in bucket tcga_cghub_protected. I would be using the following
          to grab the prefix:
      

      prefix=$(radosgw-admin
          object stat --bucket=tcga_cghub_protected
          --object=86b6fad8-3c53-465f-8758-2009d6df01e9/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam
           | grep -iE '"prefix"' | awk -F"\"" '{print $4}')
      

      Which
          should take everything between quotes for the prefix key and
          give me the value.
      

      In
          this case::
      
            "prefix":
"86b6fad8-3c53-465f-8758-2009d6df01e9\/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam.2\/YAROhWaAm9LPwCHeP55cD4CKlLC0B4S",
      

      So
      lacadmin@kh10-9:~$
          echo ${prefix}
      86b6fad8-3c53-465f-8758-2009d6df01e9\/TCGA-A2-A0T7-01A-21D-A099-09_IlluminaGA-DNASeq_exome.bam.2\/YAROhWaAm9LPwCHeP55cD4CKlLC0B4S
      

      From
          here I list all of the objects in the .rgw.buckets pool and
          grep for that said prefix which yields 1335 objects. From here
          if I cat all of these objects together I only end up with a
          5468160 byte file which is 2G short of what the object
          manifest says it should be. If I grab the file and tail the
          Rados gateway log I end up with 1849 objects and when I sum
          them all up I end up with 7744771642 which is the same size
          that the manifest reports. I understand that this does nothing
          other than verify the manifests accuracy but I still find it
          interesting. The missing chunks may still exist in ceph
          outside of the object manifest and tagged with the same
          prefix, correct? Or am I misunderstanding something?
      

      We
          have over 40384 files in the tcga_cghub_protected bucket and
          only 66 of these files are suffering from this truncation
          issue. What I need to know is: is this happening on the
          gateway side or on the client side? Next I need to know what
          possible actions can occur where the bucket index and the
          object manifest would be mismatched like this as 40318 out of
          40384 are working without issue.
      

      The
          truncated files are of all different sizes (5 megabytes - 980
          gigabytes) and the truncation seems to be all over. By "all
          over" I mean some files are missing the first few bytes that
          should read "bam" and some are missing parts in the middle.
      

      So
          our upload code is using mmap to stream chunks of the file to
          the Rados gateway via a multipart upload but no where on the
          client side do we have a direct reference to the files we are
          using nor do we specify the size in anyway. So where is the
          gateway getting the correct complete filesize from and how is
          the bucket index showing the intended file size?
      

      This
          implies that, at some point in time, ceph was able to see all
          of the parts of the file and calculate the correct total size.
          This to me seems like a rados gateway bug regardless of how
          the file is being uploaded. I think that the RGW should be
          able to be fuzzed and still store the data correctly.
      

      Why
          is the bucket list not matching the bucket index and how can I
          verify that the data is not being corrupted by the RGW or
          worse, after it is committed to ceph?
      

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com