Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey yall!

I have a weird issue and I am not sure where to look so any help would be appreciated. I have a large ceph giant cluster that has been stable and healthy almost entirely since its inception. We have stored over 1.5PB into the cluster currently through RGW and everything seems to be functioning great. We have downloaded smaller objects without issue but last night we did a test on our largest file (almost 1 terabyte) and it continuously times out at almost the exact same place. Investigating further it looks like Civetweb/RGW is returning that the uploads completed even though the objects are truncated. At least when we download the objects they seem to be truncated.

I have tried searching through the mailing list archives to see what may be going on but it looks like the mailing list DB may be going through some mainenance:

----
Unable to read word database file '/dh/mailman/dap/archives/private/ceph-users-ceph.com/htdig/db.words.db'
----

After checking through the gzipped logs I see that civetweb just stops logging after a rotation for some reason as well and my last log is from the 28th of march. I tried manually running /etc/init.d/radosgw reload but this didn't seem to work. As running the download again could take all day to error out we instead use the range request to try and pull the missing bites.

https://gist.github.com/MurphyMarkW/8e356823cfe00de86a48 -- there is the code we are using to download via S3 / boto as well as the returned size report and overview of our issue. http://pastebin.com/cVLdQBMF-- Here is some of the log from the civetweb server they are hitting.

Here is our current config ::
http://pastebin.com/2SGfSDYG

Current output of ceph health::
http://pastebin.com/3f6iJEbu

I am thinking that this must be a civetweb/radosgw bug of somekind. My question is 1.) is there a way to try and download the object via rados directly I am guessing I will need to find the prefix and then just cat all of them together and hope I get it right? 2.) Why would ceph say the upload went fine but then return a smaller object?




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux