Re: Bucket resharding: "radosgw-admin bi list" ERROR

Andreas Calminder <andreas.calminder@xxxxxxxxxx> · Wed, 5 Jul 2017 11:29:42 +0200

Sure thing!
I noted the new and old bucket instance id.

backup the bucket metadata
# radosgw-admin --cluster ceph-prod metadata get
bucket:1001/large_bucket > large_bucket.metadata.bak.json

# cp large_bucket.metadata.bak.json large_bucket.metadata.patched.json

set bucket_id in large_bucket.metadata.patched.json to the new bucket
instance id and replace the metadata in the bucket with
large_bucket.metadata.patched.json
# radosgw-admin --cluster ceph-prod metadata put
bucket:1001/large_bucket < large_bucket.metadata.patched.json

verify that bucket_id has been updated
# radosgw-admin --cluster ceph-prod metadata get bucket:1001/large_bucket

Try and access some objects in the updated bucket read/write, note
that any write operations at this point will still be slow as the old
instance id still have a large index, at least our cluster behaved
like that. Then purge the index from the old bucket instance id
#  radosgw-admin --cluster ceph-prod bi purge --bucket
1001/large_bucket --bucket-id old_bucket_instance_id

After that write operations against the index went smooth.

As said before, I didn't care about the data in the bucket at all, the
above steps is potentially dangerous and flat out wrong. But..
worksforme(tm)

/andreas

On 5 July 2017 at 10:45, Maarten De Quick <mdequick85@xxxxxxxxx> wrote:
> Hi Andreas,
>
> Interesting as we are also on Jewel 10.2.7. We do care about the data in the
> bucket so we really need the reshard process to run properly :).
> Could you maybe share how you linked the bucket to the new index by hand?
> That would already give me some extra insight.
> Thanks!
>
> Regards,
> Maarten
>
> On Wed, Jul 5, 2017 at 10:21 AM, Andreas Calminder
> <andreas.calminder@xxxxxxxxxx> wrote:
>>
>> Hi,
>> I had a similar problem while resharding an oversized non-sharded
>> bucket in Jewel (10.2.7), the bi_list exited with ERROR: bi_list():
>> (4) Interrupted system call at, what seemed like the very end of the
>> operation. I went ahead and resharded the bucket anyway and the
>> reshard process ended the same way, seemingly at the end. Reshard
>> didn't link the bucket to new instance id though so I had to do that
>> by hand and then purge the index from the old instance id.
>> Note that I didn't care about the data in the bucket, I just wanted to
>> reshard the index so I could delete the bucket without my radosgw and
>> osds crashing due to out of memory issues.
>>
>> Regards,
>> Andreas
>>
>> On 4 July 2017 at 20:46, Maarten De Quick <mdequick85@xxxxxxxxx> wrote:
>> > Hi,
>> >
>> > Background: We're having issues with our index pool (slow requests /
>> > time
>> > outs causes crashing of an OSD and a recovery -> application issues). We
>> > know we have very big buckets (eg. bucket of 77 million objects with
>> > only 16
>> > shards) that need a reshard so we were looking at the resharding
>> > process.
>> >
>> > First thing we would like to do is making a backup of the bucket index,
>> > but
>> > this failed with:
>> >
>> > # radosgw-admin -n client.radosgw.be-west-3 bi list
>> > --bucket=priv-prod-up-alex > /var/backup/priv-prod-up-alex.list.backup
>> > 2017-07-03 21:28:30.325613 7f07fb8bc9c0  0 System already converted
>> > ERROR: bi_list(): (4) Interrupted system call
>> >
>> > When I grep for "idx" and I count these:
>> >  # grep idx priv-prod-up-alex.list.backup | wc -l
>> > 2294942
>> > When I do a bucket stats for that bucket I get:
>> > # radosgw-admin -n client.radosgw.be-west-3 bucket stats
>> > --bucket=priv-prod-up-alex | grep num_objects
>> > 2017-07-03 21:33:05.776499 7faca49b89c0  0 System already converted
>> >             "num_objects": 20148575
>> >
>> > It looks like there are 18 million objects missing and the backup is not
>> > complete (not sure if that's a correct assumption?). We're also afraid
>> > that
>> > the resharding command will face the same issue.
>> > Has anyone seen this behaviour before or any thoughts on how to fix it?
>> >
>> > We were also wondering if we really need the backup. As the resharding
>> > process creates a complete new index and keeps the old bucket, is there
>> > maybe a possibility to relink your bucket to the old bucket in case of
>> > issues? Or am I missing something important here?
>> >
>> > Any help would be greatly appreciated, thanks!
>> >
>> > Regards,
>> > Maarten
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com