Re: [EXTERNAL] S3 Object Returns Days after Deletion

"J. Eric Ivancich" <ivancich@xxxxxxxxxx> · Wed, 31 Aug 2022 15:38:15 -0400



+Casey — who might have some insight

> On Aug 31, 2022, at 5:45 AM, Alex Hussein-Kershaw (HE/HIM) <alexhus@xxxxxxxxxxxxx> wrote:
> 
> Hi Eric, 
> 
> Thanks for your response. Answers below.
> 
> Is it the case that the object does not appear when you list the RGW bucket it was in?
> - The reason our client finds it again is because we do list all the objects in the bucket and cache the keys nightly. 
> - I think it is returned as part of the API call to list all objects in the bucket.
> - I can see the "list objects" operation happening in HAProxy logs.    " cephs3/S3:10.245.0.20 72/0/34/361/503 200 33819 - - ---- 23/13/0/1/0 0/0 {lusrebuild} "GET /edin2z6-scsdata/ HTTP/1.1""
> - That's the mechanism that results in us doing a GET on the object directly days after we should have forgotten about it. 
> - It's strange that it takes a few days for this to reproduce - the object doesn't show up in the next nightly list operation, but a subsequent one. 
> 
> You referred to "one side of my cluster”. Does that imply you’re using multisite?
> - Yes (I think you missed that in first line of my last email) - we are using multisite 😊 
> 
> We are using bucket versioning for this bucket (also note the object has a versionID in my initial email - I believe that would show as 'null' if versioning is disabled - although I appreciate sharing output of an unfamiliar tool maybe wasn't that helpful / clear):
> [root@edin2z6 edin2z6_sdc] ~> aws s3api --endpoint=http://127.3.3.3:7480 get-bucket-versioning --bucket edin2z6-scsdata
> {
>    "Status": "Enabled",
>    "MFADelete": "Disabled"
> }
> 
> We don't actually have a use for bucket versioning (and have turned it off for our deployments since this system was deployed - this is just a remnant). If you think this might be the cause of the problem I can disable it and see if it will repro. 
> 
> I'm using the aws s3api CLI tool to get the info I've shared.   
> 
> Thanks for the info regarding the multi-part and tail objects, good to know that this won't be the cause.  
> 
> Kindest regards,
> Alex
> 
> -----Original Message-----
> From: J. Eric Ivancich <ivancich@xxxxxxxxxx> 
> Sent: Tuesday, August 30, 2022 5:19 PM
> To: Alex Hussein-Kershaw (HE/HIM) <alexhus@xxxxxxxxxxxxx>
> Cc: Ceph Users <ceph-users@xxxxxxx>
> Subject: [EXTERNAL] Re:  S3 Object Returns Days after Deletion
> 
> A couple of questions, Alex.
> 
> Is it the case that the object does not appear when you list the RGW bucket it was in?
> 
> You referred to "one side of my cluster”. Does that imply you’re using multisite?
> 
> And just for completeness, this is not a versioned bucket?
> 
> With a size of 6252 bytes, it wouldn’t be a multi-part upload or require tail objects.
> 
> So during a delete the bucket index shard is modified to remove the entry and the head object (which in your case is the only object) is deleted from rados. If there were tail objects, they’d generally get cleaned up over time via RGW’s garbage collection mechanism.
> 
> It should also be noted that the bucket index does not need to be consulted during a GET operation.
> 
> I looked for the string “SSECustomerAlgorithm” in the ceph source code and couldn’t find it. Which tool is generating your “details about the object”?
> 
> Eric
> (he/him)
> 
>> On Aug 30, 2022, at 4:35 AM, Alex Hussein-Kershaw (HE/HIM) <alexhus@xxxxxxxxxxxxx> wrote:
>> 
>> Hi Ceph-Users,
>> 
>> I'm running Ceph 15.2.13 with RGWs and multisite. I've got some odd S3 object behaviour I'm really baffled by. Hoping to get some debugging advice.
>> 
>> The problem I have is that I delete an object, attempt some reads of the deleted object and get hit with 404s (totally reasonable - it no longer exists, right?). However a few days later, the object seems to be magically recreated, and a subsequent GET request that I'd expect to return a 404 returns a 200. Looking on the cluster I can see the object genuinely does still exist.
>> 
>> I have a single client on my Storage Cluster. It contacts the Cluster via HAProxy which I've pasted some logs for below showing the described behaviour.
>> 
>> [15/Aug/2022:05:24:40.612] s3proxy cephs3/S3:10.245.0.23 49/0/0/55/104 204 234 - - ---- 22/15/0/1/0 0/0 {WSD} "DELETE /edin2z6-scsdata/84/40/20220815042412F3DB2300000018-Subscriber HTTP/1.1"
>> [15/Aug/2022:05:24:40.650] s3proxy cephs3/S3:10.245.0.21 69/0/0/22/107 404 455 - - ---- 22/15/0/1/0 0/0 {ISSMgr} "GET /edin2z6-scsdata/84/40/20220815042412F3DB2300000018-Subscriber HTTP/1.1"
>> [15/Aug/2022:05:24:53.549] s3proxy cephs3/S3:10.245.0.20 12/0/0/20/90 404 455 - - ---- 37/16/2/1/0 0/0 {S3Mgr} "GET /edin2z6-scsdata/84/40/20220815042412F3DB2300000018-Subscriber HTTP/1.1"
>> [15/Aug/2022:05:24:53.635] s3proxy cephs3/S3:10.245.0.21 0/0/31/17/63 404 455 - - ---- 37/16/0/1/0 0/0 {S3Mgr} "GET /edin2z6-scsdata/84/40/20220815042412F3DB2300000018-Subscriber HTTP/1.1"
>> [15/Aug/2022:05:24:53.699] s3proxy cephs3/S3:10.245.0.21 1/0/0/19/35 404 455 - - ---- 37/16/0/1/0 0/0 {S3Mgr} "GET /edin2z6-scsdata/84/40/20220815042412F3DB2300000018-Subscriber HTTP/1.1"
>> [15/Aug/2022:05:24:53.733] s3proxy cephs3/S3:10.245.0.23 4/0/0/19/39 404 455 - - ---- 38/16/1/1/0 0/0 {ISSMgr} "GET /edin2z6-scsdata/84/40/20220815042412F3DB2300000018-Subscriber HTTP/1.1"
>> [15/Aug/2022:05:24:53.772] s3proxy cephs3/S3:10.245.0.23 62/0/0/19/98 404 455 - - ---- 39/16/1/1/0 0/0 {S3Mgr} "GET /edin2z6-scsdata/84/40/20220815042412F3DB2300000018-Subscriber HTTP/1.1"
>> [15/Aug/2022:05:24:53.871] s3proxy cephs3/S3:10.245.0.23 1/0/0/22/39 404 455 - - ---- 39/16/0/1/0 0/0 {S3Mgr} "GET /edin2z6-scsdata/84/40/20220815042412F3DB2300000018-Subscriber HTTP/1.1"
>> [15/Aug/2022:05:24:53.899] s3proxy cephs3/S3:10.245.0.20 53/0/0/30/100 404 455 - - ---- 40/16/0/1/0 0/0 {ISSMgr} "GET /edin2z6-scsdata/84/40/20220815042412F3DB2300000018-Subscriber HTTP/1.1"
>> [17/Aug/2022:13:43:20.861] s3proxy cephs3/S3:10.245.0.23 699/0/0/33/749 200 6815 - - ---- 17/12/0/1/0 0/0 {other} "GET /edin2z6-scsdata/84/40/20220815042412F3DB2300000018-Subscriber HTTP/1.1"
>> 
>> Some details about the object:
>> 
>> {
>>   "AcceptRanges": "bytes",
>>   "ContentType": "binary/octet-stream",
>>   "LastModified": "Mon, 15 Aug 2022 04:24:40 GMT",
>>   "ContentLength": 6252,
>>   "Expires": "Thu, 01 Jan 1970 00:00:00 UTC",
>>   "SSECustomerAlgorithm": "AES256",
>>   "VersionId": "muroZd4apIM6RIkNpSEQfh8ZrYfFJWs",
>>   "ETag": "\"0803e745fbeac3be88e82adf2ef6240b\"",
>>   "SSECustomerKeyMD5": "2b6tFaOW0qSq1FOhX+WgZw==",
>>   "Metadata": {}
>> }
>> 
>> The only logical reason I could come up with that this object still exists is that it was recreated (my client shouldn't have - but obviously not impossible), but the LastModified date above rules this out, so I think it must be a Ceph thing.
>> 
>> How can the delete succeed, the object be temporarily deleted, and then pop back into existence?
>> 
>> Not sure if relevant, one side of my cluster reports health, the other side reports some large omap objects:
>> 
>> $ ceph health detail
>> HEALTH_WARN 5 large omap objects
>> [WRN] LARGE_OMAP_OBJECTS: 5 large omap objects
>>   5 large objects found in pool 'siteB.rgw.buckets.index'
>>   Search the cluster log for 'Large omap object found' for more details.
>> 
>> Thanks,
>> 
>> Alex Kershaw
>> Software Engineer
>> Office: 01316 500883
>> alexhus@xxxxxxxxxxxxx<mailto:alexhus@xxxxxxxxxxxxx>
>> 
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an 
>> email to ceph-users-leave@xxxxxxx
>> 
> 

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx