Thanks for your reply. I believe this is exactly our case. Best Regards, Mahnoosh On Tue, Feb 14, 2023 at 9:25 PM J. Eric Ivancich <ivancich@xxxxxxxxxx> wrote: > A bug was reported recently where if a put object occurs when bucket > resharding is finishing up, it would write to the old bucket shard rather > than the new one. From your logs there is evidence that resharding is > underway alongside the put object. > > A fix for that bug is on main and pacific, and the quincy version is not > yet merged. See: > > https://tracker.ceph.com/issues/58034 > > Octopus was EOLed back in August so won’t receive the fix. But it seems > the next releases pacific and quincy will have the fix as will reef. > > Eric > (he/him) > > On Feb 13, 2023, at 11:41 AM, mahnoosh shahidi <mahnooosh.shd@xxxxxxxxx> > wrote: > > Hi all, > > We have a cluster on 15.2.12. We are experiencing an unusual scenario in > S3. User send PUT request to upload an object and RGW returns 200 as a > response status code. The object has been uploaded and can be downloaded > but it does not exist in the bucket list. We also tried to get the bucket > index entry for that object but it does not exist. Below is the log of the > RGW for the request. > > 1 ====== starting new request req=0x7f246c4426b0 ===== > > 2 req 44161 0s initializing for trans_id = > tx00000000000000000ac81-0063e36653-17e18f0-default > 10 rgw api priority: s3=3 s3website=2 > 10 host=192.168.0.201 > 10 meta>> HTTP_X_AMZ_CONTENT_SHA256 > 10 meta>> HTTP_X_AMZ_DATE > 10 x>> > > x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58 > 10 x>> x-amz-date:20230208T090731Z > 10 handler=22RGWHandler_REST_Obj_S3 > 2 req 44161 0s getting op 1 > 10 req 44161 0s s3:put_obj scheduling with dmclock client=2 cost=1 > 10 op=21RGWPutObj_ObjStore_S3 > 2 req 44161 0s s3:put_obj verifying requester > 10 v4 signature format = > 7daee7e343e08d8121e843c6c77da3cc827bd4f4f179548e1c729c130a3e7745 > 10 v4 credential format = > 85ZYESW8HS34DC95MZBT/20230208/us-east-1/s3/aws4_request > 10 access key id = 85ZYESW8HS34DC95MZBT > 10 credential scope = 20230208/us-east-1/s3/aws4_request > 10 req 44161 0s canonical headers format = > content-md5:ttgbNgpWctgMJ0MPORU+LA== > host:192.168.0.201 > > > x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58 > x-amz-date:20230208T090731Z > > 10 payload request hash = > 30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58 > 10 canonical request = PUT > /test7/file508294 > > content-md5:ttgbNgpWctgMJ0MPORU+LA== > host:192.168.0.201 > > > x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58 > x-amz-date:20230208T090731Z > > content-md5;host;x-amz-content-sha256;x-amz-date > 30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58 > 10 canonical request hash = > 2ab4fe4f0fa402435c3237382bdc77e86203406e90a2768a70410f58754bb6ba > 10 string to sign = AWS4-HMAC-SHA256 > 20230208T090731Z > 20230208/us-east-1/s3/aws4_request > 2ab4fe4f0fa402435c3237382bdc77e86203406e90a2768a70410f58754bb6ba > 10 req 44161 0s delaying v4 auth > 10 date_k = > a9dc6afa32600995d313f1b6a4fa40be3a3cd574d25db8789ac966a8e7f43356 > 10 region_k = > b9193e8e261f702b88549da7e81e6a4a7672725996ea8a86269fed665b39670d > 10 service_k = > 34214c91aec1192bcc413e02044e346b31ed4f13df8c15830bdb1d7bd3565126 > 10 signing_k = > 7656d62334d92c982f8c21e0200e760054b214eebab6dbeab577fb655c00a6f4 > 10 generated signature = > 7daee7e343e08d8121e843c6c77da3cc827bd4f4f179548e1c729c130a3e7745 > 2 req 44161 0s s3:put_obj normalizing buckets and tenants > 10 s->object=file508294 s->bucket=test7 > 2 req 44161 0s s3:put_obj init permissions > 10 cache get: name=default.rgw.meta+root+test7 : expiry miss > 10 cache put: name=default.rgw.meta+root+test7 info.flags=0x16 > 10 adding default.rgw.meta+root+test7 to cache LRU end > 10 updating xattr: name=ceph.objclass.version bl.length()=42 > 10 cache get: name=default.rgw.meta+root+test7 : type miss > (requested=0x11, cached=0x16) > 10 cache put: name=default.rgw.meta+root+test7 info.flags=0x11 > 10 moving default.rgw.meta+root+test7 to cache LRU end > 10 cache get: name=default.rgw.meta+users.uid+storage : hit > (requested=0x6, cached=0x17) > 10 cache get: name=default.rgw.meta+users.uid+storage : hit > (requested=0x3, cached=0x17) > 2 req 44161 0.003999945s s3:put_obj recalculating target > 2 req 44161 0.003999945s s3:put_obj reading permissions > 2 req 44161 0.003999945s s3:put_obj init op > 2 req 44161 0.003999945s s3:put_obj verifying op mask > 2 req 44161 0.003999945s s3:put_obj verifying op permissions > 5 req 44161 0.003999945s s3:put_obj Searching permissions for > identity=rgw::auth::SysReqApplier -> > rgw::auth::LocalApplier(acct_user=storage, acct_name=storage, subuser=, > perm_mask=15, is_admin=0) mask=50 > 5 Searching permissions for uid=storage > 5 Found permission: 15 > 5 Searching permissions for group=1 mask=50 > 5 Permissions for group not found > 5 Searching permissions for group=2 mask=50 > 5 Permissions for group not found > 5 req 44161 0.003999945s s3:put_obj -- Getting permissions done for > identity=rgw::auth::SysReqApplier -> > rgw::auth::LocalApplier(acct_user=storage, acct_name=storage, subuser=, > perm_mask=15, is_admin=0), owner=storage, perm=2 > 37:31.066+0330 7f2479c5e700 10 req 44161 0.003999945s s3:put_obj > identity=rgw::auth::SysReqApplier -> > rgw::auth::LocalApplier(acct_user=storage, acct_name=storage, subuser=, > perm_mask=15, is_admin=0) requested perm (type)=2, policy perm=2, > user_perm_mask=2, acl perm=2 > 2 req 44161 0.003999945s s3:put_obj verifying op params > 2 req 44161 0.003999945s s3:put_obj pre-executing > 2 req 44161 0.003999945s s3:put_obj executing > 5 req 44161 0.023999668s s3:put_obj NOTICE: call to > do_aws4_auth_completion > 10 req 44161 0.023999668s s3:put_obj v4 auth ok -- do_aws4_auth_completion > 5 req 44161 0.023999668s s3:put_obj NOTICE: call to > do_aws4_auth_completion > -- > 0 RGWReshardLock::lock failed to acquire lock on > test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1 ret=-16 > 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5, > try again > 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5, > try again > 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5, > try again > 0 RGWReshardLock::lock failed to acquire lock on > test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1 ret=-16 > 10 cache get: name=default.rgw.meta+root+test7 : hit (requested=0x6, > cached=0x17) > 10 cache get: name=default.rgw.meta+root+test7 : hit (requested=0x1, > cached=0x17) > -1 WARNING: The bucket info cache is inconsistent. This is a failure that > should be debugged. I am a nice machine, so I will try to recover. > 10 cache get: > > name=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1 > : hit (requested=0x16, cached=0x17) > 10 cache get: > > name=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1 > : hit (requested=0x13, cached=0x17) > 10 cache put: > > name=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1 > info.flags=0x13 > 10 moving > > default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1 > to cache LRU end > 10 updating xattr: name=ceph.objclass.version bl.length()=42 > 10 updating xattr: name=user.rgw.acl bl.length()=147 > 10 chain_cache_entry: > > cache_locator=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1 > -1 WARNING: The OSD has the same version I have. Something may have gone > squirrelly. An administrator may have forced a change; otherwise there is a > problem somewhere. > 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5, > try again > 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5, > try again > 10 manifest: total_size = 1048576 > 10 setting object > write_tag=c3b354de-c79f-444b-a647-5b272f8148d7.25041136.44161 > 10 cache get: name=default.rgw.log++bucket.sync-source-hints.test7 : hit > (negative entry) > 10 cache get: name=default.rgw.log++bucket.sync-target-hints.test7 : hit > (negative entry) > 10 chain_cache_entry: cache_locator= > 10 cache get: > > name=default.rgw.log++pubsub.user.storage.bucket.test7/c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1 > : hit (negative entry) > 2 req 44161 363.150981592s s3:put_obj completing > 4 write_data failed: Connection reset by peer > 0 ERROR: RESTFUL_IO(s)->complete_header() returned err=Connection reset > by peer > 2 req 44161 363.150981592s s3:put_obj op status=0 > 2 req 44161 363.150981592s s3:put_obj http status=200 > 1 ====== req done req=0x7f246c4426b0 op status=0 http_status=200 > latency=363.150981592s ====== > > > > Anybody has any idea about the reason for this behaviour? > > Best Regards, > Mahnoosh > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx