We are doing that as well. But we need to be able to check specific buckets additionally. For that we use this second approach.
Since we double-check all output from our script anyway (to see if NoSuchKey actually happens), we can rule out false positives.
So far all the files detected this way actually have the issue (they show up in the list of S3 Objects, but return a 404 on GET).
For our purposes, I wrote a Python script that enumerates all objects:
s3 = boto3.resource('s3', endpoint_url=endpoint,
aws_access_key_id=access_key, aws_secret_access_key=secret_key)
file_list = []
for f in tqdm(s3.Bucket(bucket).objects desc='Listing objects...',
unit=' obj'):
file_list.append(f.key)
And then runs a Spark job that reads 1 byte from every object, recording
any botocore.exceptions.ClientError:
def test_object_absent(s3, obj_name):
try:
s3.Object(bucket, obj_name).get()['Body'].read(1)
return False
except ClientError:
return True
With 600 executors on 130 hosts, it takes about 30 seconds for a 300k
object bucket.
On 19/11/2020 09:21, Janek Bevendorff wrote:
I would recommend you get a dump with rados ls -p poolname (can be
several GB, mine is 61GB) and grep (or ack, which is faster) for the
names there to get an overview of what is there and what isn't.
Looking up the names directly can easily give you the wrong picture,
because it is kinda complicated to derive the correct RADOS name from
an S3 object name and a single typo will give you a not-found error,
even if the object is there.
On 19/11/2020 09:12, Denis Krienbühl wrote:
Thanks, we are currently scanning our object storage. It looks like
we can detect the missing objects that return “No Such Key” looking
at all “__multipart_” objects returned by radosgw-admin bucket
radoslist, and checking if they exist using rados stat. We are
currently not looking at shadow objects as our approach already
yields more instances of this problem.
On 19 Nov 2020, at 09:09, Janek Bevendorff
<janek.bevendorff@xxxxxxxxxxxxx> wrote:
- The head object had a size of 0.
- There was an object with a ’shadow’ in its name, belonging to
that path.
That is normal. What is not normal is if there are NO shadow objects.
On 18/11/2020 10:06, Denis Krienbühl wrote:
It looks like a single-part object. But we did replace that object
last night from backup, so I can’t know for sure if the lost one
was like that.
Another engineer that looked at the Rados objects last night did
notice two things:
- The head object had a size of 0.
- There was an object with a ’shadow’ in its name, belonging to
that path.
I’m not knowledgable about Rados, so I’m not sure this is helpful.
On 18 Nov 2020, at 10:01, Janek Bevendorff
<janek.bevendorff@xxxxxxxxxxxxx> wrote:
Sorry, it's radosgw-admin object stat --bucket=BUCKETNAME
--object=OBJECTNAME (forgot the "object" there)
On 18/11/2020 09:58, Janek Bevendorff wrote:
The object, a Docker layer, that went missing has not been
touched in 2 months. It worked for a while, but then suddenly
went missing.
Was the object a multipart object? You can check by running
radosgw-admin stat --bucket=BUCKETNAME --object=OBJECTNAME. It
should say something "ns": "multipart" in the output. If it says
"ns": "shadow", it's a single-part object.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx