Re: Bucket resharding: "radosgw-admin bi list" ERROR

Maarten De Quick <mdequick85@xxxxxxxxx> · Wed, 5 Jul 2017 10:45:42 +0200

Hi Andreas,

Interesting as we are also on Jewel 10.2.7. We do care about the data in the bucket so we really need the reshard process to run properly :).
Could you maybe share how you linked the bucket to the new index by hand? That would already give me some extra insight.
Thanks!

Regards,
Maarten

On Wed, Jul 5, 2017 at 10:21 AM, Andreas Calminder <andreas.calminder@xxxxxxxxxx> wrote:
Hi,

I had a similar problem while resharding an oversized non-sharded

bucket in Jewel (10.2.7), the bi_list exited with ERROR: bi_list():

(4) Interrupted system call at, what seemed like the very end of the

operation. I went ahead and resharded the bucket anyway and the

reshard process ended the same way, seemingly at the end. Reshard

didn't link the bucket to new instance id though so I had to do that

by hand and then purge the index from the old instance id.

Note that I didn't care about the data in the bucket, I just wanted to

reshard the index so I could delete the bucket without my radosgw and

osds crashing due to out of memory issues.

Regards,

Andreas

On 4 July 2017 at 20:46, Maarten De Quick <mdequick85@xxxxxxxxx> wrote:

> Hi,

>

> Background: We're having issues with our index pool (slow requests / time

> outs causes crashing of an OSD and a recovery -> application issues). We

> know we have very big buckets (eg. bucket of 77 million objects with only 16

> shards) that need a reshard so we were looking at the resharding process.

>

> First thing we would like to do is making a backup of the bucket index, but

> this failed with:

>

> # radosgw-admin -n client.radosgw.be-west-3 bi list

> --bucket=priv-prod-up-alex > /var/backup/priv-prod-up-alex.list.backup

> 2017-07-03 21:28:30.325613 7f07fb8bc9c0  0 System already converted

> ERROR: bi_list(): (4) Interrupted system call

>

> When I grep for "idx" and I count these:

>  # grep idx priv-prod-up-alex.list.backup | wc -l

> 2294942

> When I do a bucket stats for that bucket I get:

> # radosgw-admin -n client.radosgw.be-west-3 bucket stats

> --bucket=priv-prod-up-alex | grep num_objects

> 2017-07-03 21:33:05.776499 7faca49b89c0  0 System already converted

>             "num_objects": 20148575

>

> It looks like there are 18 million objects missing and the backup is not

> complete (not sure if that's a correct assumption?). We're also afraid that

> the resharding command will face the same issue.

> Has anyone seen this behaviour before or any thoughts on how to fix it?

>

> We were also wondering if we really need the backup. As the resharding

> process creates a complete new index and keeps the old bucket, is there

> maybe a possibility to relink your bucket to the old bucket in case of

> issues? Or am I missing something important here?

>

> Any help would be greatly appreciated, thanks!

>

> Regards,

> Maarten

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com