multipart uploads in reef 18.2.2

Christopher Durham <caduceus42@xxxxxxx> · Mon, 10 Jun 2024 21:13:19 +0000 (UTC)

We have a reef 18.2.2 cluster with 6 radosgw servers on Rocky 8.9. The radosgw servers are not fronted by anything like HAProxy as the clients connect directly to a DNS name via a round-robin DNS. Each of the radosgw servers have a certificate using SAN entries for all 6 radosgw servers as well as the primary DNS name. This has worked wonderfully for four years in distributing our load. This is an RPM install.

Since our update to 18.2.2, we have had some issues with a specific set of clients (spark). They *always* create multipart uploads, both before and after the ceph update from 17.2.6 to 18.2.2, even when there is a single multipart part less than what would otherwise be the threshold. The single multipart part is the norm.

This works fine for a time after restarting the radosgws. This is what happens:

1. The single PUT of a single multipart part with a given uploadId
2. The single POST to otherwise complete the multipart, with the same uploadId used in the PUT

But when the problem occurs, the PUT works, but  The POST fails with a "500 302" error, and the client will continue to try this, eventually returning with a reported error of: "Status code; 500 Error-Code: Internal error, ... The multipart completion is already in progress."While this error makes sense when multiple POSTs happen for the completion of the multipart upload, the first one should not fail.
Sometimes the PUT and the POST happen from different clients or to different servers, even when things are working. But, when things begin to fail, just before the failed POST in the radosgw logs, I get:
s3:complete_multipart failed to acquire lock
Then the multiple "500 302" errors happen. Note that after the multiple "500 302" errors, the spark client DELETEs the object, terminating the multipart upload (no 'leftover' multipart uploads).

There are about 50-60 multiparts happening when this occurs, and as time goes on, I get less and less successful multipart uploads, and eventually I have to restart the rados gateways. There are some other minor GETs and PUTs happening as well unrelated to spark

When this happens, we get multiple sockets on the radosgw servers that are in CLOSE-WAIT state. While I cannot prove it, these appear to be related to the issue at hand as the CLOSE-WAIT states are only coming from IPs associated withthe spark jobs. After I restart the radosgw servers, things are good for a time, and all CLOSE-WAITs disappear until the problem starts.

I have set:
rgw thread pool size = 1024rgw max concurrent requests = 2048
and restarted both the mons and the radosgw servers to no avail. The problem takes maybe 6 hours to start

No object versioning is in effect. Any ideas would be appreciated, thanks.
-Chris

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx