Re: "set_req_state_err err_no=27 resorting to 500" with multipart large file upload

Wido den Hollander <wido@xxxxxxxx> · Tue, 29 Oct 2013 12:47:00 +0100

On 10/28/2013 06:31 PM, Yehuda Sadeh wrote:
On Mon, Oct 28, 2013 at 9:24 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
Hi,

I'm testing with some multipart uploads to RGW and I'm hitting a problem
when trying to upload files larger then 1159MB.

The tool I'm using is s3cmd 1.5.1

Ceph version: 0.67.4

It's very specific, this is what I tried (after a lot of narrowing down):

$ dd if=/dev/zero of=1159MB.bin bs=1024k count=1159
$ dd if=/dev/zero of=1160MB.bin bs=1024k count=1160
$ s3cmd put -P 1159MB.bin s3://widodh/1159MB.bin
$ s3cmd put -P 1160MB.bin s3://widodh/1160MB.bin

The 1159MB file works, the 1160MB fails with:

Reading through the source I wasn't able to figure out what err_no=27 means.

Error code 27 is EFBIG, which is "File too large", but searching for '27' or
'EFBIG' in the source code doesn't show me anything.

In the end of the logs it shows:

2013-10-28 17:11:17.009020 7fcab57ba700 10 calculated etag:
c28248dbb69472d7c7fcf27564374fc5-78
2013-10-28 17:11:17.009552 7fcab57ba700 20 get_obj_state:
rctx=0x7fcb28003490 obj=widodh:1160MB.bin state=0x7fcb280680c8
s->prefetch_data=0
2013-10-28 17:11:17.011244 7fcab57ba700  0 setting object
write_tag=default.28902.198197
2013-10-28 17:11:17.020787 7fcab57ba700  0 WARNING: set_req_state_err
err_no=27 resorting to 500
2013-10-28 17:11:17.020835 7fcab57ba700  2 req 198197:0.021338:s3:POST
/1160MB.bin:complete_multipart:http status=500
2013-10-28 17:11:17.021017 7fcab57ba700  1 ====== req done req=0x1c5c680
http_status=500 ======

I tried a lot of sizes, but the tipping point seems to be 1159MB, anything
larger won't work.

Playing with the multipart chunk size doesn't make a difference.

Without multipart (--disable-multipart) the upload succeeds, but a 2500MB
non-multipart upload fails again with the same error code.

Any ideas?

Hmm, that sounds like the xattr limit issue (the 'osd max attr size'
configurable). Are you sure your osds are running 0.67.4 (did you
restart after upgrade)?

That was indeed the issue. The nodes were upgraded to 0.67.4 recently, 
but one didn't get a restart.

Restarting the last node resolved it and it's now working.

Yehuda

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com