Re: Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> From: "Sean" <seapasulli@xxxxxxxxxxxx>
> To: ceph-users@xxxxxxxxxxxxxx
> Sent: Tuesday, April 28, 2015 2:52:35 PM
> Subject:  Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation
> 
> Hey yall!
> 
> I have a weird issue and I am not sure where to look so any help would
> be appreciated. I have a large ceph giant cluster that has been stable
> and healthy almost entirely since its inception. We have stored over
> 1.5PB into the cluster currently through RGW and everything seems to be
> functioning great. We have downloaded smaller objects without issue but
> last night we did a test on our largest file (almost 1 terabyte) and it
> continuously times out at almost the exact same place. Investigating
> further it looks like Civetweb/RGW is returning that the uploads
> completed even though the objects are truncated. At least when we
> download the objects they seem to be truncated.
> 
> I have tried searching through the mailing list archives to see what may
> be going on but it looks like the mailing list DB may be going through
> some mainenance:
> 
> ----
> Unable to read word database file
> '/dh/mailman/dap/archives/private/ceph-users-ceph.com/htdig/db.words.db'
> ----
> 
> After checking through the gzipped logs I see that civetweb just stops
> logging after a rotation for some reason as well and my last log is from
> the 28th of march. I tried manually running /etc/init.d/radosgw reload
> but this didn't seem to work. As running the download again could take
> all day to error out we instead use the range request to try and pull
> the missing bites.
> 
> https://gist.github.com/MurphyMarkW/8e356823cfe00de86a48 -- there is the
> code we are using to download via S3 / boto as well as the returned size
> report and overview of our issue.
> http://pastebin.com/cVLdQBMF-- Here is some of the log from the civetweb
> server they are hitting.
> 
> Here is our current config ::
> http://pastebin.com/2SGfSDYG
> 
> Current output of ceph health::
> http://pastebin.com/3f6iJEbu
> 
> I am thinking that this must be a civetweb/radosgw bug of somekind. My
> question is 1.) is there a way to try and download the object via rados
> directly I am guessing I will need to find the prefix and then just cat
> all of them together and hope I get it right? 2.) Why would ceph say the
> upload went fine but then return a smaller object?
> 
> 


Note that the returned http resonse returns 206 (partial content):
/var/log/radosgw/client.radosgw.log:2015-04-28 16:08:26.525268 7f6e93fff700  2 req 0:1.067030:s3:GET /tcga_cghub_protected/ff9b730c-d303-4d49-b28f-e0bf9d8f1c84/759366461d2bf8bb0583d5b9566ce947.bam:get_obj:http status=206

It'll only return that if partial content is requested (through the http Range header). It's really hard to tell from these logs whether there's any actual problem. I suggest bumping up the log level (debug ms = 1, debug rgw = 20), and take a look at an entire request (one that include all the request http headers).

Yehuda



> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux