On Thu, Sep 10, 2020 at 10:19 AM shubjero <shubjero@xxxxxxxxx> wrote: > > Hi Casey, > > I was never setting rgw_max_chunk_size in my ceph.conf so it must have > been default? Funny enough I dont even see this configuration > parameter in the documentation > https://docs.ceph.com/docs/nautilus/radosgw/config-ref/ . > > Armed with your information I tried setting the following in my ceph.conf: > > root@ceph-1:~# ceph --admin-daemon > /var/run/ceph/ceph-client.rgw.ceph-1.28726.94406486979736.asok config > show | egrep "rgw_max_chunk_size|rgw_put_obj_min|rgw_obj_stripe_size" > "rgw_max_chunk_size": "67108864", > "rgw_obj_stripe_size": "67108864", > "rgw_put_obj_min_window_size": "67108864", > > And with this configuration I was able to upload with large part sizes > (2GB) using the aws client without error. are you sure there's a benefit to using such large part sizes? a smaller part size should allow the client to stream more uploads at a time. it also makes recovery much cheaper; if a 2GB PUT request times out, the client will retry and send the entire 2GB again. with a smaller part size, the server can commit this data more frequently and limit the amount of bandwidth wasted on retries > > Do you know if there is any expected performance improvement with > larger chunk/stripe/window sizes? As I said previously our use case is > dealing with very large genomic files being uploaded and downloaded > (average is probably 100GB per file). rgw_max_chunk_size specifies how much data we'll send in a single osd request. rgw_obj_stripe_size specifies how much data we'll write to a single rados object before creating a new stripe/object. rgw_put_obj_min_window_size specifies how much object data we'll buffer in memory as we stream chunks out to their osds i don't think we saw any benefit from chunk sizes over 4M, but you're welcome to experiment and measure that in your environment. generally you want a rgw_obj_stripe_size == rgw_max_chunk_size so that each of your writes go to a different rados object; if, for example, your stripe size was 2x the chunk size, we would write two chunks to each rados object - but the osd has to apply these writes sequentially, so you lose some parallelism this way regarding rgw_put_obj_min_window_size, the number of parallel writes we can do is equal to (rgw_put_obj_min_window_size / rgw_max_chunk_size). in a default configuration, this is 16M/4M = 4. you can experiment with a larger multiplier here, but do take overall memory usage into account! If rgw_max_concurrent_requests is 1024 and all of those are large PUT requests, then we'd use up to (rgw_max_concurrent_requests * rgw_put_obj_min_window_size) or 16G of memory in general, i think the default tunings should perform well here. if you have a lot of memory to work with on rgw nodes, you can experiment with larger values of rgw_put_obj_min_window_size > > On Wed, Sep 9, 2020 at 11:29 AM Casey Bodley <cbodley@xxxxxxxxxx> wrote: > > > > What is your rgw_max_chunk_size? It looks like you'll get these > > EDEADLK errors when rgw_max_chunk_size > rgw_put_obj_min_window_size, > > because we try to write in units of chunk size but the window is too > > small to write a single chunk. > > > > On Wed, Sep 9, 2020 at 8:51 AM shubjero <shubjero@xxxxxxxxx> wrote: > > > > > > Will do Matt > > > > > > On Tue, Sep 8, 2020 at 5:36 PM Matt Benjamin <mbenjami@xxxxxxxxxx> wrote: > > > > > > > > thanks, Shubjero > > > > > > > > Would you consider creating a ceph tracker issue for this? > > > > > > > > regards, > > > > > > > > Matt > > > > > > > > On Tue, Sep 8, 2020 at 4:13 PM shubjero <shubjero@xxxxxxxxx> wrote: > > > > > > > > > > I had been looking into this issue all day and during testing found > > > > > that a specific configuration option we had been setting for years was > > > > > the culprit. Not setting this value and letting it fall back to the > > > > > default seems to have fixed our issue with multipart uploads. > > > > > > > > > > If you are curious, the configuration option is rgw_obj_stripe_size > > > > > which was being set to 67108864 bytes (64MiB). The default is 4194304 > > > > > bytes (4MiB). This is a documented option > > > > > (https://docs.ceph.com/docs/nautilus/radosgw/config-ref/) and from my > > > > > testing it seems like using anything but the default (only tried > > > > > larger values) breaks multipart uploads. > > > > > > > > > > On Tue, Sep 8, 2020 at 12:12 PM shubjero <shubjero@xxxxxxxxx> wrote: > > > > > > > > > > > > Hey all, > > > > > > > > > > > > I'm creating a new post for this issue as we've narrowed the problem > > > > > > down to a partsize limitation on multipart upload. We have discovered > > > > > > that in our production Nautilus (14.2.11) cluster and our lab Nautilus > > > > > > (14.2.10) cluster that multipart uploads with a configured part size > > > > > > of greater than 16777216 bytes (16MiB) will return a status 500 / > > > > > > internal server error from radosgw. > > > > > > > > > > > > So far I have increased the following rgw settings/values that looked > > > > > > suspect, without any success/improvement with partsizes. > > > > > > Such as: > > > > > > "rgw_get_obj_window_size": "16777216", > > > > > > "rgw_put_obj_min_window_size": "16777216", > > > > > > > > > > > > I am trying to determine if this is because of a conservative default > > > > > > setting somewhere that I don't know about or if this is perhaps a bug? > > > > > > > > > > > > I would appreciate it if someone on Nautilus with rgw could also test > > > > > > / provide feedback. It's very easy to reproduce and configuring your > > > > > > partsize with aws2cli requires you to put the following in your aws > > > > > > 'config' > > > > > > s3 = > > > > > > multipart_chunksize = 32MB > > > > > > > > > > > > rgw server logs during a failed multipart upload (32MB chunk/partsize): > > > > > > 2020-09-08 15:59:36.054 7f2d32fa6700 1 ====== starting new request > > > > > > req=0x55953dc36930 ===== > > > > > > 2020-09-08 15:59:36.082 7f2d32fa6700 -1 res_query() failed > > > > > > 2020-09-08 15:59:36.138 7f2d32fa6700 1 ====== req done > > > > > > req=0x55953dc36930 op status=0 http_status=200 latency=0.0839988s > > > > > > ====== > > > > > > 2020-09-08 16:00:07.285 7f2d3dfbc700 1 ====== starting new request > > > > > > req=0x55953dc36930 ===== > > > > > > 2020-09-08 16:00:07.285 7f2d3dfbc700 -1 res_query() failed > > > > > > 2020-09-08 16:00:07.353 7f2d00741700 1 ====== starting new request > > > > > > req=0x55954dd5e930 ===== > > > > > > 2020-09-08 16:00:07.357 7f2d00741700 -1 res_query() failed > > > > > > 2020-09-08 16:00:07.413 7f2cc56cb700 1 ====== starting new request > > > > > > req=0x55953dc02930 ===== > > > > > > 2020-09-08 16:00:07.417 7f2cc56cb700 -1 res_query() failed > > > > > > 2020-09-08 16:00:07.473 7f2cb26a5700 1 ====== starting new request > > > > > > req=0x5595426f6930 ===== > > > > > > 2020-09-08 16:00:07.473 7f2cb26a5700 -1 res_query() failed > > > > > > 2020-09-08 16:00:09.465 7f2d3dfbc700 0 WARNING: set_req_state_err > > > > > > err_no=35 resorting to 500 > > > > > > 2020-09-08 16:00:09.465 7f2d3dfbc700 1 ====== req done > > > > > > req=0x55953dc36930 op status=-35 http_status=500 latency=2.17997s > > > > > > ====== > > > > > > 2020-09-08 16:00:09.549 7f2d00741700 0 WARNING: set_req_state_err > > > > > > err_no=35 resorting to 500 > > > > > > 2020-09-08 16:00:09.549 7f2d00741700 1 ====== req done > > > > > > req=0x55954dd5e930 op status=-35 http_status=500 latency=2.19597s > > > > > > ====== > > > > > > 2020-09-08 16:00:09.605 7f2cc56cb700 0 WARNING: set_req_state_err > > > > > > err_no=35 resorting to 500 > > > > > > 2020-09-08 16:00:09.609 7f2cc56cb700 1 ====== req done > > > > > > req=0x55953dc02930 op status=-35 http_status=500 latency=2.19597s > > > > > > ====== > > > > > > 2020-09-08 16:00:09.641 7f2cb26a5700 0 WARNING: set_req_state_err > > > > > > err_no=35 resorting to 500 > > > > > > 2020-09-08 16:00:09.641 7f2cb26a5700 1 ====== req done > > > > > > req=0x5595426f6930 op status=-35 http_status=500 latency=2.16797s > > > > > > ====== > > > > > > > > > > > > awscli client side output during a failed multipart upload: > > > > > > root@jump:~# aws --no-verify-ssl --endpoint-url > > > > > > http://lab-object.cancercollaboratory.org:7480 s3 cp 4GBfile > > > > > > s3://troubleshooting > > > > > > upload failed: ./4GBfile to s3://troubleshooting/4GBfile An error > > > > > > occurred (UnknownError) when calling the UploadPart operation (reached > > > > > > max retries: 2): Unknown > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jared Baker > > > > > > Cloud Architect for the Cancer Genome Collaboratory > > > > > > Ontario Institute for Cancer Research > > > > > _______________________________________________ > > > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > > > > > > > > > > > > -- > > > > > > > > Matt Benjamin > > > > Red Hat, Inc. > > > > 315 West Huron Street, Suite 140A > > > > Ann Arbor, Michigan 48103 > > > > > > > > http://www.redhat.com/en/technologies/storage > > > > > > > > tel. 734-821-5101 > > > > fax. 734-769-8938 > > > > cel. 734-216-5309 > > > > > > > _______________________________________________ > > > ceph-users mailing list -- ceph-users@xxxxxxx > > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx