Re: Multipart uploads with partsizes larger than 16MiB failing on Nautilus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I had been looking into this issue all day and during testing found
that a specific configuration option we had been setting for years was
the culprit. Not setting this value and letting it fall back to the
default seems to have fixed our issue with multipart uploads.

If you are curious, the configuration option is rgw_obj_stripe_size
which was being set to 67108864 bytes (64MiB). The default is 4194304
bytes (4MiB). This is a documented option
(https://docs.ceph.com/docs/nautilus/radosgw/config-ref/) and from my
testing it seems like using anything but the default (only tried
larger values) breaks multipart uploads.

On Tue, Sep 8, 2020 at 12:12 PM shubjero <shubjero@xxxxxxxxx> wrote:
>
> Hey all,
>
> I'm creating a new post for this issue as we've narrowed the problem
> down to a partsize limitation on multipart upload. We have discovered
> that in our production Nautilus (14.2.11) cluster and our lab Nautilus
> (14.2.10) cluster that multipart uploads with a configured part size
> of greater than 16777216 bytes (16MiB) will return a status 500 /
> internal server error from radosgw.
>
> So far I have increased the following rgw settings/values that looked
> suspect, without any success/improvement with partsizes.
> Such as:
>     "rgw_get_obj_window_size": "16777216",
>     "rgw_put_obj_min_window_size": "16777216",
>
> I am trying to determine if this is because of a conservative default
> setting somewhere that I don't know about or if this is perhaps a bug?
>
> I would appreciate it if someone on Nautilus with rgw could also test
> / provide feedback. It's very easy to reproduce and configuring your
> partsize with aws2cli requires you to put the following in your aws
> 'config'
> s3 =
>   multipart_chunksize = 32MB
>
> rgw server logs during a failed multipart upload (32MB chunk/partsize):
> 2020-09-08 15:59:36.054 7f2d32fa6700  1 ====== starting new request
> req=0x55953dc36930 =====
> 2020-09-08 15:59:36.082 7f2d32fa6700 -1 res_query() failed
> 2020-09-08 15:59:36.138 7f2d32fa6700  1 ====== req done
> req=0x55953dc36930 op status=0 http_status=200 latency=0.0839988s
> ======
> 2020-09-08 16:00:07.285 7f2d3dfbc700  1 ====== starting new request
> req=0x55953dc36930 =====
> 2020-09-08 16:00:07.285 7f2d3dfbc700 -1 res_query() failed
> 2020-09-08 16:00:07.353 7f2d00741700  1 ====== starting new request
> req=0x55954dd5e930 =====
> 2020-09-08 16:00:07.357 7f2d00741700 -1 res_query() failed
> 2020-09-08 16:00:07.413 7f2cc56cb700  1 ====== starting new request
> req=0x55953dc02930 =====
> 2020-09-08 16:00:07.417 7f2cc56cb700 -1 res_query() failed
> 2020-09-08 16:00:07.473 7f2cb26a5700  1 ====== starting new request
> req=0x5595426f6930 =====
> 2020-09-08 16:00:07.473 7f2cb26a5700 -1 res_query() failed
> 2020-09-08 16:00:09.465 7f2d3dfbc700  0 WARNING: set_req_state_err
> err_no=35 resorting to 500
> 2020-09-08 16:00:09.465 7f2d3dfbc700  1 ====== req done
> req=0x55953dc36930 op status=-35 http_status=500 latency=2.17997s
> ======
> 2020-09-08 16:00:09.549 7f2d00741700  0 WARNING: set_req_state_err
> err_no=35 resorting to 500
> 2020-09-08 16:00:09.549 7f2d00741700  1 ====== req done
> req=0x55954dd5e930 op status=-35 http_status=500 latency=2.19597s
> ======
> 2020-09-08 16:00:09.605 7f2cc56cb700  0 WARNING: set_req_state_err
> err_no=35 resorting to 500
> 2020-09-08 16:00:09.609 7f2cc56cb700  1 ====== req done
> req=0x55953dc02930 op status=-35 http_status=500 latency=2.19597s
> ======
> 2020-09-08 16:00:09.641 7f2cb26a5700  0 WARNING: set_req_state_err
> err_no=35 resorting to 500
> 2020-09-08 16:00:09.641 7f2cb26a5700  1 ====== req done
> req=0x5595426f6930 op status=-35 http_status=500 latency=2.16797s
> ======
>
> awscli client side output during a failed multipart upload:
> root@jump:~# aws --no-verify-ssl --endpoint-url
> http://lab-object.cancercollaboratory.org:7480 s3 cp 4GBfile
> s3://troubleshooting
> upload failed: ./4GBfile to s3://troubleshooting/4GBfile An error
> occurred (UnknownError) when calling the UploadPart operation (reached
> max retries: 2): Unknown
>
> Thanks,
>
> Jared Baker
> Cloud Architect for the Cancer Genome Collaboratory
> Ontario Institute for Cancer Research
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux