----- Original Message ----- > From: "Sean" <seapasulli@xxxxxxxxxxxx> > To: ceph-users@xxxxxxxxxxxxxx > Sent: Tuesday, April 28, 2015 2:52:35 PM > Subject: Civet RadosGW S3 not storing complete obects; civetweb logs stop after rotation > > Hey yall! > > I have a weird issue and I am not sure where to look so any help would > be appreciated. I have a large ceph giant cluster that has been stable > and healthy almost entirely since its inception. We have stored over > 1.5PB into the cluster currently through RGW and everything seems to be > functioning great. We have downloaded smaller objects without issue but > last night we did a test on our largest file (almost 1 terabyte) and it > continuously times out at almost the exact same place. Investigating > further it looks like Civetweb/RGW is returning that the uploads > completed even though the objects are truncated. At least when we > download the objects they seem to be truncated. > > I have tried searching through the mailing list archives to see what may > be going on but it looks like the mailing list DB may be going through > some mainenance: > > ---- > Unable to read word database file > '/dh/mailman/dap/archives/private/ceph-users-ceph.com/htdig/db.words.db' > ---- > > After checking through the gzipped logs I see that civetweb just stops > logging after a rotation for some reason as well and my last log is from > the 28th of march. I tried manually running /etc/init.d/radosgw reload > but this didn't seem to work. As running the download again could take > all day to error out we instead use the range request to try and pull > the missing bites. > > https://gist.github.com/MurphyMarkW/8e356823cfe00de86a48 -- there is the > code we are using to download via S3 / boto as well as the returned size > report and overview of our issue. > http://pastebin.com/cVLdQBMF-- Here is some of the log from the civetweb > server they are hitting. > > Here is our current config :: > http://pastebin.com/2SGfSDYG > > Current output of ceph health:: > http://pastebin.com/3f6iJEbu > > I am thinking that this must be a civetweb/radosgw bug of somekind. My > question is 1.) is there a way to try and download the object via rados > directly I am guessing I will need to find the prefix and then just cat > all of them together and hope I get it right? 2.) Why would ceph say the > upload went fine but then return a smaller object? > > Note that the returned http resonse returns 206 (partial content): /var/log/radosgw/client.radosgw.log:2015-04-28 16:08:26.525268 7f6e93fff700 2 req 0:1.067030:s3:GET /tcga_cghub_protected/ff9b730c-d303-4d49-b28f-e0bf9d8f1c84/759366461d2bf8bb0583d5b9566ce947.bam:get_obj:http status=206 It'll only return that if partial content is requested (through the http Range header). It's really hard to tell from these logs whether there's any actual problem. I suggest bumping up the log level (debug ms = 1, debug rgw = 20), and take a look at an entire request (one that include all the request http headers). Yehuda > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com