Hi,
Background: We're having issues with our index pool (slow requests / time outs causes crashing of an OSD and a recovery -> application issues). We know we have very big buckets (eg. bucket of 77 million objects with only 16 shards) that need a reshard so we were looking at the resharding process.
First thing we would like to do is making a backup of the bucket index, but this failed with:
# radosgw-admin -n client.radosgw.be-west-3 bi list --bucket=priv-prod-up-alex > /var/backup/priv-prod-up-alex.list.backup
2017-07-03 21:28:30.325613 7f07fb8bc9c0 0 System already converted
ERROR: bi_list(): (4) Interrupted system call
When I grep for "idx" and I count these:
# grep idx priv-prod-up-alex.list.backup | wc -l
2294942
When I do a bucket stats for that bucket I get:
# radosgw-admin -n client.radosgw.be-west-3 bucket stats --bucket=priv-prod-up-alex | grep num_objects
2017-07-03 21:33:05.776499 7faca49b89c0 0 System already converted
"num_objects": 20148575
It looks like there are 18 million objects missing and the backup is not complete (not sure if that's a correct assumption?). We're also afraid that the resharding command will face the same issue.
Has anyone seen this behaviour before or any thoughts on how to fix it?
Background: We're having issues with our index pool (slow requests / time outs causes crashing of an OSD and a recovery -> application issues). We know we have very big buckets (eg. bucket of 77 million objects with only 16 shards) that need a reshard so we were looking at the resharding process.
First thing we would like to do is making a backup of the bucket index, but this failed with:
# radosgw-admin -n client.radosgw.be-west-3 bi list --bucket=priv-prod-up-alex > /var/backup/priv-prod-up-alex.list.backup
2017-07-03 21:28:30.325613 7f07fb8bc9c0 0 System already converted
ERROR: bi_list(): (4) Interrupted system call
When I grep for "idx" and I count these:
# grep idx priv-prod-up-alex.list.backup | wc -l
2294942
When I do a bucket stats for that bucket I get:
# radosgw-admin -n client.radosgw.be-west-3 bucket stats --bucket=priv-prod-up-alex | grep num_objects
2017-07-03 21:33:05.776499 7faca49b89c0 0 System already converted
"num_objects": 20148575
It looks like there are 18 million objects missing and the backup is not complete (not sure if that's a correct assumption?). We're also afraid that the resharding command will face the same issue.
Has anyone seen this behaviour before or any thoughts on how to fix it?
We were also wondering if we really need the backup. As the resharding process creates a complete new index and keeps the old bucket, is there maybe a possibility to relink your bucket to the old bucket in case of issues? Or am I missing something important here?
Any help would be greatly appreciated, thanks!
Regards,
Maarten
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com