Re: Missing object in bucket list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 Thanks for your reply. I believe this is exactly our case.

Best Regards,
Mahnoosh

On Tue, Feb 14, 2023 at 9:25 PM J. Eric Ivancich <ivancich@xxxxxxxxxx>
wrote:

> A bug was reported recently where if a put object occurs when bucket
> resharding is finishing up, it would write to the old bucket shard rather
> than the new one. From your logs there is evidence that resharding is
> underway alongside the put object.
>
> A fix for that bug is on main and pacific, and the quincy version is not
> yet merged. See:
>
> https://tracker.ceph.com/issues/58034
>
> Octopus was EOLed back in August so won’t receive the fix. But it seems
> the next releases pacific and quincy will have the fix as will reef.
>
> Eric
> (he/him)
>
> On Feb 13, 2023, at 11:41 AM, mahnoosh shahidi <mahnooosh.shd@xxxxxxxxx>
> wrote:
>
> Hi all,
>
> We have a cluster on 15.2.12. We are experiencing an unusual scenario in
> S3. User send PUT request to upload an object and RGW returns 200 as a
> response status code. The object has been uploaded and can be downloaded
> but it does not exist in the bucket list. We also tried to get the bucket
> index entry for that object but it does not exist. Below is the log of the
> RGW for the request.
>
> 1 ====== starting new request req=0x7f246c4426b0 =====
>
> 2 req 44161 0s initializing for trans_id =
> tx00000000000000000ac81-0063e36653-17e18f0-default
> 10 rgw api priority: s3=3 s3website=2
> 10 host=192.168.0.201
> 10 meta>> HTTP_X_AMZ_CONTENT_SHA256
> 10 meta>> HTTP_X_AMZ_DATE
> 10 x>>
>
> x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
> 10 x>> x-amz-date:20230208T090731Z
> 10 handler=22RGWHandler_REST_Obj_S3
> 2 req 44161 0s getting op 1
> 10 req 44161 0s s3:put_obj scheduling with dmclock client=2 cost=1
> 10 op=21RGWPutObj_ObjStore_S3
> 2 req 44161 0s s3:put_obj verifying requester
> 10 v4 signature format =
> 7daee7e343e08d8121e843c6c77da3cc827bd4f4f179548e1c729c130a3e7745
> 10 v4 credential format =
> 85ZYESW8HS34DC95MZBT/20230208/us-east-1/s3/aws4_request
> 10 access key id = 85ZYESW8HS34DC95MZBT
> 10 credential scope = 20230208/us-east-1/s3/aws4_request
> 10 req 44161 0s canonical headers format =
> content-md5:ttgbNgpWctgMJ0MPORU+LA==
> host:192.168.0.201
>
>
> x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
> x-amz-date:20230208T090731Z
>
> 10 payload request hash =
> 30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
> 10 canonical request = PUT
> /test7/file508294
>
> content-md5:ttgbNgpWctgMJ0MPORU+LA==
> host:192.168.0.201
>
>
> x-amz-content-sha256:30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
> x-amz-date:20230208T090731Z
>
> content-md5;host;x-amz-content-sha256;x-amz-date
> 30e14955ebf1352266dc2ff8067e68104607e750abb9d3b36582b8af909fcb58
> 10 canonical request hash =
> 2ab4fe4f0fa402435c3237382bdc77e86203406e90a2768a70410f58754bb6ba
> 10 string to sign = AWS4-HMAC-SHA256
> 20230208T090731Z
> 20230208/us-east-1/s3/aws4_request
> 2ab4fe4f0fa402435c3237382bdc77e86203406e90a2768a70410f58754bb6ba
> 10 req 44161 0s delaying v4 auth
> 10 date_k    =
> a9dc6afa32600995d313f1b6a4fa40be3a3cd574d25db8789ac966a8e7f43356
> 10 region_k  =
> b9193e8e261f702b88549da7e81e6a4a7672725996ea8a86269fed665b39670d
> 10 service_k =
> 34214c91aec1192bcc413e02044e346b31ed4f13df8c15830bdb1d7bd3565126
> 10 signing_k =
> 7656d62334d92c982f8c21e0200e760054b214eebab6dbeab577fb655c00a6f4
> 10 generated signature =
> 7daee7e343e08d8121e843c6c77da3cc827bd4f4f179548e1c729c130a3e7745
> 2 req 44161 0s s3:put_obj normalizing buckets and tenants
> 10 s->object=file508294 s->bucket=test7
> 2 req 44161 0s s3:put_obj init permissions
> 10 cache get: name=default.rgw.meta+root+test7 : expiry miss
> 10 cache put: name=default.rgw.meta+root+test7 info.flags=0x16
> 10 adding default.rgw.meta+root+test7 to cache LRU end
> 10 updating xattr: name=ceph.objclass.version bl.length()=42
> 10 cache get: name=default.rgw.meta+root+test7 : type miss
> (requested=0x11, cached=0x16)
> 10 cache put: name=default.rgw.meta+root+test7 info.flags=0x11
> 10 moving default.rgw.meta+root+test7 to cache LRU end
> 10 cache get: name=default.rgw.meta+users.uid+storage : hit
> (requested=0x6, cached=0x17)
> 10 cache get: name=default.rgw.meta+users.uid+storage : hit
> (requested=0x3, cached=0x17)
> 2 req 44161 0.003999945s s3:put_obj recalculating target
> 2 req 44161 0.003999945s s3:put_obj reading permissions
> 2 req 44161 0.003999945s s3:put_obj init op
> 2 req 44161 0.003999945s s3:put_obj verifying op mask
> 2 req 44161 0.003999945s s3:put_obj verifying op permissions
> 5 req 44161 0.003999945s s3:put_obj Searching permissions for
> identity=rgw::auth::SysReqApplier ->
> rgw::auth::LocalApplier(acct_user=storage, acct_name=storage, subuser=,
> perm_mask=15, is_admin=0) mask=50
> 5 Searching permissions for uid=storage
> 5 Found permission: 15
> 5 Searching permissions for group=1 mask=50
> 5 Permissions for group not found
> 5 Searching permissions for group=2 mask=50
> 5 Permissions for group not found
> 5 req 44161 0.003999945s s3:put_obj -- Getting permissions done for
> identity=rgw::auth::SysReqApplier ->
> rgw::auth::LocalApplier(acct_user=storage, acct_name=storage, subuser=,
> perm_mask=15, is_admin=0), owner=storage, perm=2
> 37:31.066+0330 7f2479c5e700 10 req 44161 0.003999945s s3:put_obj
> identity=rgw::auth::SysReqApplier ->
> rgw::auth::LocalApplier(acct_user=storage, acct_name=storage, subuser=,
> perm_mask=15, is_admin=0) requested perm (type)=2, policy perm=2,
> user_perm_mask=2, acl perm=2
> 2 req 44161 0.003999945s s3:put_obj verifying op params
> 2 req 44161 0.003999945s s3:put_obj pre-executing
> 2 req 44161 0.003999945s s3:put_obj executing
> 5 req 44161 0.023999668s s3:put_obj NOTICE: call to
> do_aws4_auth_completion
> 10 req 44161 0.023999668s s3:put_obj v4 auth ok -- do_aws4_auth_completion
> 5 req 44161 0.023999668s s3:put_obj NOTICE: call to
> do_aws4_auth_completion
> --
> 0 RGWReshardLock::lock failed to acquire lock on
> test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1 ret=-16
> 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
> try again
> 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
> try again
> 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
> try again
> 0 RGWReshardLock::lock failed to acquire lock on
> test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1 ret=-16
> 10 cache get: name=default.rgw.meta+root+test7 : hit (requested=0x6,
> cached=0x17)
> 10 cache get: name=default.rgw.meta+root+test7 : hit (requested=0x1,
> cached=0x17)
> -1 WARNING: The bucket info cache is inconsistent. This is a failure that
> should be debugged. I am a nice machine, so I will try to recover.
> 10 cache get:
>
> name=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
> : hit (requested=0x16, cached=0x17)
> 10 cache get:
>
> name=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
> : hit (requested=0x13, cached=0x17)
> 10 cache put:
>
> name=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
> info.flags=0x13
> 10 moving
>
> default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
> to cache LRU end
> 10 updating xattr: name=ceph.objclass.version bl.length()=42
> 10 updating xattr: name=user.rgw.acl bl.length()=147
> 10 chain_cache_entry:
>
> cache_locator=default.rgw.meta+root+.bucket.meta.test7:c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
> -1 WARNING: The OSD has the same version I have. Something may have gone
> squirrelly. An administrator may have forced a change; otherwise there is a
> problem somewhere.
> 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
> try again
> 0 lifecycle: RGWLC::process() failed to acquire lock on lc.30, sleep 5,
> try again
> 10 manifest: total_size = 1048576
> 10 setting object
> write_tag=c3b354de-c79f-444b-a647-5b272f8148d7.25041136.44161
> 10 cache get: name=default.rgw.log++bucket.sync-source-hints.test7 : hit
> (negative entry)
> 10 cache get: name=default.rgw.log++bucket.sync-target-hints.test7 : hit
> (negative entry)
> 10 chain_cache_entry: cache_locator=
> 10 cache get:
>
> name=default.rgw.log++pubsub.user.storage.bucket.test7/c3b354de-c79f-444b-a647-5b272f8148d7.25388922.1
> : hit (negative entry)
> 2 req 44161 363.150981592s s3:put_obj completing
> 4 write_data failed: Connection reset by peer
> 0 ERROR: RESTFUL_IO(s)->complete_header() returned err=Connection reset
> by peer
> 2 req 44161 363.150981592s s3:put_obj op status=0
> 2 req 44161 363.150981592s s3:put_obj http status=200
> 1 ====== req done req=0x7f246c4426b0 op status=0 http_status=200
> latency=363.150981592s ======
>
>
>
> Anybody has any idea about the reason for this behaviour?
>
> Best Regards,
> Mahnoosh
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux