Re: Multipart uploads with partsizes larger than 16MiB failing on Nautilus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



thanks, Shubjero

Would you consider creating a ceph tracker issue for this?

regards,

Matt

On Tue, Sep 8, 2020 at 4:13 PM shubjero <shubjero@xxxxxxxxx> wrote:
>
> I had been looking into this issue all day and during testing found
> that a specific configuration option we had been setting for years was
> the culprit. Not setting this value and letting it fall back to the
> default seems to have fixed our issue with multipart uploads.
>
> If you are curious, the configuration option is rgw_obj_stripe_size
> which was being set to 67108864 bytes (64MiB). The default is 4194304
> bytes (4MiB). This is a documented option
> (https://docs.ceph.com/docs/nautilus/radosgw/config-ref/) and from my
> testing it seems like using anything but the default (only tried
> larger values) breaks multipart uploads.
>
> On Tue, Sep 8, 2020 at 12:12 PM shubjero <shubjero@xxxxxxxxx> wrote:
> >
> > Hey all,
> >
> > I'm creating a new post for this issue as we've narrowed the problem
> > down to a partsize limitation on multipart upload. We have discovered
> > that in our production Nautilus (14.2.11) cluster and our lab Nautilus
> > (14.2.10) cluster that multipart uploads with a configured part size
> > of greater than 16777216 bytes (16MiB) will return a status 500 /
> > internal server error from radosgw.
> >
> > So far I have increased the following rgw settings/values that looked
> > suspect, without any success/improvement with partsizes.
> > Such as:
> >     "rgw_get_obj_window_size": "16777216",
> >     "rgw_put_obj_min_window_size": "16777216",
> >
> > I am trying to determine if this is because of a conservative default
> > setting somewhere that I don't know about or if this is perhaps a bug?
> >
> > I would appreciate it if someone on Nautilus with rgw could also test
> > / provide feedback. It's very easy to reproduce and configuring your
> > partsize with aws2cli requires you to put the following in your aws
> > 'config'
> > s3 =
> >   multipart_chunksize = 32MB
> >
> > rgw server logs during a failed multipart upload (32MB chunk/partsize):
> > 2020-09-08 15:59:36.054 7f2d32fa6700  1 ====== starting new request
> > req=0x55953dc36930 =====
> > 2020-09-08 15:59:36.082 7f2d32fa6700 -1 res_query() failed
> > 2020-09-08 15:59:36.138 7f2d32fa6700  1 ====== req done
> > req=0x55953dc36930 op status=0 http_status=200 latency=0.0839988s
> > ======
> > 2020-09-08 16:00:07.285 7f2d3dfbc700  1 ====== starting new request
> > req=0x55953dc36930 =====
> > 2020-09-08 16:00:07.285 7f2d3dfbc700 -1 res_query() failed
> > 2020-09-08 16:00:07.353 7f2d00741700  1 ====== starting new request
> > req=0x55954dd5e930 =====
> > 2020-09-08 16:00:07.357 7f2d00741700 -1 res_query() failed
> > 2020-09-08 16:00:07.413 7f2cc56cb700  1 ====== starting new request
> > req=0x55953dc02930 =====
> > 2020-09-08 16:00:07.417 7f2cc56cb700 -1 res_query() failed
> > 2020-09-08 16:00:07.473 7f2cb26a5700  1 ====== starting new request
> > req=0x5595426f6930 =====
> > 2020-09-08 16:00:07.473 7f2cb26a5700 -1 res_query() failed
> > 2020-09-08 16:00:09.465 7f2d3dfbc700  0 WARNING: set_req_state_err
> > err_no=35 resorting to 500
> > 2020-09-08 16:00:09.465 7f2d3dfbc700  1 ====== req done
> > req=0x55953dc36930 op status=-35 http_status=500 latency=2.17997s
> > ======
> > 2020-09-08 16:00:09.549 7f2d00741700  0 WARNING: set_req_state_err
> > err_no=35 resorting to 500
> > 2020-09-08 16:00:09.549 7f2d00741700  1 ====== req done
> > req=0x55954dd5e930 op status=-35 http_status=500 latency=2.19597s
> > ======
> > 2020-09-08 16:00:09.605 7f2cc56cb700  0 WARNING: set_req_state_err
> > err_no=35 resorting to 500
> > 2020-09-08 16:00:09.609 7f2cc56cb700  1 ====== req done
> > req=0x55953dc02930 op status=-35 http_status=500 latency=2.19597s
> > ======
> > 2020-09-08 16:00:09.641 7f2cb26a5700  0 WARNING: set_req_state_err
> > err_no=35 resorting to 500
> > 2020-09-08 16:00:09.641 7f2cb26a5700  1 ====== req done
> > req=0x5595426f6930 op status=-35 http_status=500 latency=2.16797s
> > ======
> >
> > awscli client side output during a failed multipart upload:
> > root@jump:~# aws --no-verify-ssl --endpoint-url
> > http://lab-object.cancercollaboratory.org:7480 s3 cp 4GBfile
> > s3://troubleshooting
> > upload failed: ./4GBfile to s3://troubleshooting/4GBfile An error
> > occurred (UnknownError) when calling the UploadPart operation (reached
> > max retries: 2): Unknown
> >
> > Thanks,
> >
> > Jared Baker
> > Cloud Architect for the Cancer Genome Collaboratory
> > Ontario Institute for Cancer Research
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux