Hey yall!
I have a weird issue and I am not sure where to look so any help would
be appreciated. I have a large ceph giant cluster that has been stable
and healthy almost entirely since its inception. We have stored over
1.5PB into the cluster currently through RGW and everything seems to be
functioning great. We have downloaded smaller objects without issue but
last night we did a test on our largest file (almost 1 terabyte) and it
continuously times out at almost the exact same place. Investigating
further it looks like Civetweb/RGW is returning that the uploads
completed even though the objects are truncated. At least when we
download the objects they seem to be truncated.
I have tried searching through the mailing list archives to see what may
be going on but it looks like the mailing list DB may be going through
some mainenance:
----
Unable to read word database file
'/dh/mailman/dap/archives/private/ceph-users-ceph.com/htdig/db.words.db'
----
After checking through the gzipped logs I see that civetweb just stops
logging after a rotation for some reason as well and my last log is from
the 28th of march. I tried manually running /etc/init.d/radosgw reload
but this didn't seem to work. As running the download again could take
all day to error out we instead use the range request to try and pull
the missing bites.
https://gist.github.com/MurphyMarkW/8e356823cfe00de86a48 -- there is the
code we are using to download via S3 / boto as well as the returned size
report and overview of our issue.
http://pastebin.com/cVLdQBMF-- Here is some of the log from the civetweb
server they are hitting.
Here is our current config ::
http://pastebin.com/2SGfSDYG
Current output of ceph health::
http://pastebin.com/3f6iJEbu
I am thinking that this must be a civetweb/radosgw bug of somekind. My
question is 1.) is there a way to try and download the object via rados
directly I am guessing I will need to find the prefix and then just cat
all of them together and hope I get it right? 2.) Why would ceph say the
upload went fine but then return a smaller object?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com