Re: Missing object in bucket list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



A bug was reported recently where if a put object occurs when bucket resharding is finishing up, it would write to the old bucket shard rather than the new one. From your logs there is evidence that resharding is underway alongside the put object.

A fix for that bug is on main and pacific, and the quincy version is not yet merged. See:

	https://tracker.ceph.com/issues/58034

Octopus was EOLed back in August so won’t receive the fix. But it seems the next releases pacific and quincy will have the fix as will reef.

Eric
(he/him)

> On Feb 13, 2023, at 11:41 AM, mahnoosh shahidi <mahnooosh.shd@xxxxxxxxx> wrote:
> 
> Hi all,
> 
> We have a cluster on 15.2.12. We are experiencing an unusual scenario in
> S3. User send PUT request to upload an object and RGW returns 200 as a
> response status code. The object has been uploaded and can be downloaded
> but it does not exist in the bucket list. We also tried to get the bucket
> index entry for that object but it does not exist. Below is the log of the
> RGW for the request.
> 
> 1 ====== starting new request req=0x7f246c4426b0 =====
>> 2 req 44161 0s initializing for trans_id =
>> tx00000000000000000ac81-0063e36653-17e18f0-default
>> 10 rgw api priority: s3=3 s3website=2
>> 10 host=192.168.0.201
>> 10 meta>> HTTP_X_AMZ_CONTENT_SHA256
>> 10 meta>> HTTP_X_AMZ_DATE
>> 10 x>>
>> x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
>> 10 x>> x-amz-date:20230208T090731Z
>> 10 handler=22RGWHandler_REST_Obj_S3
>> 2 req 44161 0s getting op 1
>> 10 req 44161 0s s3:put_obj scheduling with dmclock client=2 cost=1
>> 10 op=21RGWPutObj_ObjStore_S3
>> 2 req 44161 0s s3:put_obj verifying requester
>> 10 v4 signature format =
>> 7daee7e343e08d8121e843c6c77da3cc827bd4f4f179548e1c729c130a3e7745
>> 10 v4 credential format =
>> 85ZYESW8HS34DC95MZBT/20230208/us-east-1/s3/aws4_request
>> 10 access key id = 85ZYESW8HS34DC95MZBT
>> 10 credential scope = 20230208/us-east-1/s3/aws4_request
>> 10 req 44161 0s canonical headers format =
>> content-md5:ttgbNgpWctgMJ0MPORU+LA==
>> host:192.168.0.201
>> 
>> x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
>> x-amz-date:20230208T090731Z
>> 
>> 10 payload request hash =
>> 30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
>> 10 canonical request = PUT
>> /test7/file508294
>> 
>> content-md5:ttgbNgpWctgMJ0MPORU+LA==
>> host:192.168.0.201
>> 
>> x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
>> x-amz-date:20230208T090731Z
>> 
>> content-md5;host;x-amz-content-sha256;x-amz-date
>> 30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
>> 10 canonical request hash =
>> 2ab4fe4f0fa402435c3237382bdc77e86203406e90a2768a70410f58754bb6ba
>> 10 string to sign = AWS4-HMAC-SHA256
>> 20230208T090731Z
>> 20230208/us-east-1/s3/aws4_request
>> 2ab4fe4f0fa402435c3237382bdc77e86203406e90a2768a70410f58754bb6ba
>> 10 req 44161 0s delaying v4 auth
>> 10 date_k    =
>> a9dc6afa32600995d313f1b6a4fa40be3a3cd574d25db8789ac966a8e7f43356
>> 10 region_k  =
>> b9193e8e261f702b88549da7e81e6a4a7672725996ea8a86269fed665b39670d
>> 10 service_k =
>> 34214c91aec1192bcc413e02044e346b31ed4f13df8c15830bdb1d7bd3565126
>> 10 signing_k =
>> 7656d62334d92c982f8c21e0200e760054b214eebab6dbeab577fb655c00a6f4
>> 10 generated signature =
>> 7daee7e343e08d8121e843c6c77da3cc827bd4f4f179548e1c729c130a3e7745
>> 2 req 44161 0s s3:put_obj normalizing buckets and tenants
>> 10 s->object=file508294 s->bucket=test7
>> 2 req 44161 0s s3:put_obj init permissions
>> 10 cache get: name=default.rgw.meta+root+test7 : expiry miss
>> 10 cache put: name=default.rgw.meta+root+test7 info.flags=0x16
>> 10 adding default.rgw.meta+root+test7 to cache LRU end
>> 10 updating xattr: name=ceph.objclass.version bl.length()=42
>> 10 cache get: name=default.rgw.meta+root+test7 : type miss
>> (requested=0x11, cached=0x16)
>> 10 cache put: name=default.rgw.meta+root+test7 info.flags=0x11
>> 10 moving default.rgw.meta+root+test7 to cache LRU end
>> 10 cache get: name=default.rgw.meta+users.uid+storage : hit
>> (requested=0x6, cached=0x17)
>> 10 cache get: name=default.rgw.meta+users.uid+storage : hit
>> (requested=0x3, cached=0x17)
>> 2 req 44161 0.003999945s s3:put_obj recalculating target
>> 2 req 44161 0.003999945s s3:put_obj reading permissions
>> 2 req 44161 0.003999945s s3:put_obj init op
>> 2 req 44161 0.003999945s s3:put_obj verifying op mask
>> 2 req 44161 0.003999945s s3:put_obj verifying op permissions
>> 5 req 44161 0.003999945s s3:put_obj Searching permissions for
>> identity=rgw::auth::SysReqApplier ->
>> rgw::auth::LocalApplier(acct_user=storage, acct_name=storage, subuser=,
>> perm_mask=15, is_admin=0) mask=50
>> 5 Searching permissions for uid=storage
>> 5 Found permission: 15
>> 5 Searching permissions for group=1 mask=50
>> 5 Permissions for group not found
>> 5 Searching permissions for group=2 mask=50
>> 5 Permissions for group not found
>> 5 req 44161 0.003999945s s3:put_obj -- Getting permissions done for
>> identity=rgw::auth::SysReqApplier ->
>> rgw::auth::LocalApplier(acct_user=storage, acct_name=storage, subuser=,
>> perm_mask=15, is_admin=0), owner=storage, perm=2
>> 37:31.066+0330 7f2479c5e700 10 req 44161 0.003999945s s3:put_obj
>> identity=rgw::auth::SysReqApplier ->
>> rgw::auth::LocalApplier(acct_user=storage, acct_name=storage, subuser=,
>> perm_mask=15, is_admin=0) requested perm (type)=2, policy perm=2,
>> user_perm_mask=2, acl perm=2
>> 2 req 44161 0.003999945s s3:put_obj verifying op params
>> 2 req 44161 0.003999945s s3:put_obj pre-executing
>> 2 req 44161 0.003999945s s3:put_obj executing
>> 5 req 44161 0.023999668s s3:put_obj NOTICE: call to
>> do_aws4_auth_completion
>> 10 req 44161 0.023999668s s3:put_obj v4 auth ok -- do_aws4_auth_completion
>> 5 req 44161 0.023999668s s3:put_obj NOTICE: call to
>> do_aws4_auth_completion
>> --
>> 0 RGWReshardLock::lock failed to acquire lock on
>> test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1 ret=-16
>> 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
>> try again
>> 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
>> try again
>> 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
>> try again
>> 0 RGWReshardLock::lock failed to acquire lock on
>> test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1 ret=-16
>> 10 cache get: name=default.rgw.meta+root+test7 : hit (requested=0x6,
>> cached=0x17)
>> 10 cache get: name=default.rgw.meta+root+test7 : hit (requested=0x1,
>> cached=0x17)
>> -1 WARNING: The bucket info cache is inconsistent. This is a failure that
>> should be debugged. I am a nice machine, so I will try to recover.
>> 10 cache get:
>> name=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
>> : hit (requested=0x16, cached=0x17)
>> 10 cache get:
>> name=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
>> : hit (requested=0x13, cached=0x17)
>> 10 cache put:
>> name=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
>> info.flags=0x13
>> 10 moving
>> default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
>> to cache LRU end
>> 10 updating xattr: name=ceph.objclass.version bl.length()=42
>> 10 updating xattr: name=user.rgw.acl bl.length()=147
>> 10 chain_cache_entry:
>> cache_locator=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
>> -1 WARNING: The OSD has the same version I have. Something may have gone
>> squirrelly. An administrator may have forced a change; otherwise there is a
>> problem somewhere.
>> 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
>> try again
>> 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
>> try again
>> 10 manifest: total_size = 1048576
>> 10 setting object
>> write_tag=c3b354de-c79f-444b-a647-5b272f8148d7.25041136.44161
>> 10 cache get: name=default.rgw.log++bucket.sync-source-hints.test7 : hit
>> (negative entry)
>> 10 cache get: name=default.rgw.log++bucket.sync-target-hints.test7 : hit
>> (negative entry)
>> 10 chain_cache_entry: cache_locator=
>> 10 cache get:
>> name=default.rgw.log++pubsub.user.storage.bucket.test7/c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
>> : hit (negative entry)
>> 2 req 44161 363.150981592s s3:put_obj completing
>> 4 write_data failed: Connection reset by peer
>> 0 ERROR: RESTFUL_IO(s)->complete_header() returned err=Connection reset
>> by peer
>> 2 req 44161 363.150981592s s3:put_obj op status=0
>> 2 req 44161 363.150981592s s3:put_obj http status=200
>> 1 ====== req done req=0x7f246c4426b0 op status=0 http_status=200
>> latency=363.150981592s ======
> 
> 
> Anybody has any idea about the reason for this behaviour?
> 
> Best Regards,
> Mahnoosh
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux