Sure thing! I noted the new and old bucket instance id. backup the bucket metadata # radosgw-admin --cluster ceph-prod metadata get bucket:1001/large_bucket > large_bucket.metadata.bak.json # cp large_bucket.metadata.bak.json large_bucket.metadata.patched.json set bucket_id in large_bucket.metadata.patched.json to the new bucket instance id and replace the metadata in the bucket with large_bucket.metadata.patched.json # radosgw-admin --cluster ceph-prod metadata put bucket:1001/large_bucket < large_bucket.metadata.patched.json verify that bucket_id has been updated # radosgw-admin --cluster ceph-prod metadata get bucket:1001/large_bucket Try and access some objects in the updated bucket read/write, note that any write operations at this point will still be slow as the old instance id still have a large index, at least our cluster behaved like that. Then purge the index from the old bucket instance id # radosgw-admin --cluster ceph-prod bi purge --bucket 1001/large_bucket --bucket-id old_bucket_instance_id After that write operations against the index went smooth. As said before, I didn't care about the data in the bucket at all, the above steps is potentially dangerous and flat out wrong. But.. worksforme(tm) /andreas On 5 July 2017 at 10:45, Maarten De Quick <mdequick85@xxxxxxxxx> wrote: > Hi Andreas, > > Interesting as we are also on Jewel 10.2.7. We do care about the data in the > bucket so we really need the reshard process to run properly :). > Could you maybe share how you linked the bucket to the new index by hand? > That would already give me some extra insight. > Thanks! > > Regards, > Maarten > > On Wed, Jul 5, 2017 at 10:21 AM, Andreas Calminder > <andreas.calminder@xxxxxxxxxx> wrote: >> >> Hi, >> I had a similar problem while resharding an oversized non-sharded >> bucket in Jewel (10.2.7), the bi_list exited with ERROR: bi_list(): >> (4) Interrupted system call at, what seemed like the very end of the >> operation. I went ahead and resharded the bucket anyway and the >> reshard process ended the same way, seemingly at the end. Reshard >> didn't link the bucket to new instance id though so I had to do that >> by hand and then purge the index from the old instance id. >> Note that I didn't care about the data in the bucket, I just wanted to >> reshard the index so I could delete the bucket without my radosgw and >> osds crashing due to out of memory issues. >> >> Regards, >> Andreas >> >> On 4 July 2017 at 20:46, Maarten De Quick <mdequick85@xxxxxxxxx> wrote: >> > Hi, >> > >> > Background: We're having issues with our index pool (slow requests / >> > time >> > outs causes crashing of an OSD and a recovery -> application issues). We >> > know we have very big buckets (eg. bucket of 77 million objects with >> > only 16 >> > shards) that need a reshard so we were looking at the resharding >> > process. >> > >> > First thing we would like to do is making a backup of the bucket index, >> > but >> > this failed with: >> > >> > # radosgw-admin -n client.radosgw.be-west-3 bi list >> > --bucket=priv-prod-up-alex > /var/backup/priv-prod-up-alex.list.backup >> > 2017-07-03 21:28:30.325613 7f07fb8bc9c0 0 System already converted >> > ERROR: bi_list(): (4) Interrupted system call >> > >> > When I grep for "idx" and I count these: >> > # grep idx priv-prod-up-alex.list.backup | wc -l >> > 2294942 >> > When I do a bucket stats for that bucket I get: >> > # radosgw-admin -n client.radosgw.be-west-3 bucket stats >> > --bucket=priv-prod-up-alex | grep num_objects >> > 2017-07-03 21:33:05.776499 7faca49b89c0 0 System already converted >> > "num_objects": 20148575 >> > >> > It looks like there are 18 million objects missing and the backup is not >> > complete (not sure if that's a correct assumption?). We're also afraid >> > that >> > the resharding command will face the same issue. >> > Has anyone seen this behaviour before or any thoughts on how to fix it? >> > >> > We were also wondering if we really need the backup. As the resharding >> > process creates a complete new index and keeps the old bucket, is there >> > maybe a possibility to relink your bucket to the old bucket in case of >> > issues? Or am I missing something important here? >> > >> > Any help would be greatly appreciated, thanks! >> > >> > Regards, >> > Maarten >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com