Re: Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

Sean Sullivan <seapasulli@xxxxxxxxxxxx> · Tue, 28 Apr 2015 18:03:17 -0500

Will do.  The reason for the partial request is that the total size of the 
file is close to 1TB so attempting a download would take quite some time on 
our 10Gb connection.  What is odd is that if I request the last bit 
received to the end of the file we get a 406 can not be satisfied response  
while if I request one byte less to the end of the file we are only given 
1byte but not the whole file.

I will bump it up and attempt a partial then full download.  Thanks for the 
reply!!

On April 28, 2015 5:03:12 PM Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> wrote:

----- Original Message -----
> From: "Sean" <seapasulli@xxxxxxxxxxxx>
> To: ceph-users@xxxxxxxxxxxxxx
> Sent: Tuesday, April 28, 2015 2:52:35 PM
> Subject:  Civet RadosGW S3 not storing complete obects; 
civetweb logs stop after rotation
>
> Hey yall!
>
> I have a weird issue and I am not sure where to look so any help would
> be appreciated. I have a large ceph giant cluster that has been stable
> and healthy almost entirely since its inception. We have stored over
> 1.5PB into the cluster currently through RGW and everything seems to be
> functioning great. We have downloaded smaller objects without issue but
> last night we did a test on our largest file (almost 1 terabyte) and it
> continuously times out at almost the exact same place. Investigating
> further it looks like Civetweb/RGW is returning that the uploads
> completed even though the objects are truncated. At least when we
> download the objects they seem to be truncated.
>
> I have tried searching through the mailing list archives to see what may
> be going on but it looks like the mailing list DB may be going through
> some mainenance:
>
> ----
> Unable to read word database file
> '/dh/mailman/dap/archives/private/ceph-users-ceph.com/htdig/db.words.db'
> ----
>
> After checking through the gzipped logs I see that civetweb just stops
> logging after a rotation for some reason as well and my last log is from
> the 28th of march. I tried manually running /etc/init.d/radosgw reload
> but this didn't seem to work. As running the download again could take
> all day to error out we instead use the range request to try and pull
> the missing bites.
>
> https://gist.github.com/MurphyMarkW/8e356823cfe00de86a48 -- there is the
> code we are using to download via S3 / boto as well as the returned size
> report and overview of our issue.
> http://pastebin.com/cVLdQBMF-- Here is some of the log from the civetweb
> server they are hitting.
>
> Here is our current config ::
> http://pastebin.com/2SGfSDYG
>
> Current output of ceph health::
> http://pastebin.com/3f6iJEbu
>
> I am thinking that this must be a civetweb/radosgw bug of somekind. My
> question is 1.) is there a way to try and download the object via rados
> directly I am guessing I will need to find the prefix and then just cat
> all of them together and hope I get it right? 2.) Why would ceph say the
> upload went fine but then return a smaller object?
>
>

Note that the returned http resonse returns 206 (partial content):
/var/log/radosgw/client.radosgw.log:2015-04-28 16:08:26.525268 7f6e93fff700 
 2 req 0:1.067030:s3:GET 
/tcga_cghub_protected/ff9b730c-d303-4d49-b28f-e0bf9d8f1c84/759366461d2bf8bb0583d5b9566ce947.bam:get_obj:http 
status=206

It'll only return that if partial content is requested (through the http 
Range header). It's really hard to tell from these logs whether there's any 
actual problem. I suggest bumping up the log level (debug ms = 1, debug rgw 
= 20), and take a look at an entire request (one that include all the 
request http headers).

Yehuda

>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com