Hi Andreas,
Interesting as we are also on Jewel 10.2.7. We do care about the data in the bucket so we really need the reshard process to run properly :).On Wed, Jul 5, 2017 at 10:21 AM, Andreas Calminder <andreas.calminder@xxxxxxxxxx> wrote:
Hi,
I had a similar problem while resharding an oversized non-sharded
bucket in Jewel (10.2.7), the bi_list exited with ERROR: bi_list():
(4) Interrupted system call at, what seemed like the very end of the
operation. I went ahead and resharded the bucket anyway and the
reshard process ended the same way, seemingly at the end. Reshard
didn't link the bucket to new instance id though so I had to do that
by hand and then purge the index from the old instance id.
Note that I didn't care about the data in the bucket, I just wanted to
reshard the index so I could delete the bucket without my radosgw and
osds crashing due to out of memory issues.
Regards,
Andreas
On 4 July 2017 at 20:46, Maarten De Quick <mdequick85@xxxxxxxxx> wrote:
> Hi,
>
> Background: We're having issues with our index pool (slow requests / time
> outs causes crashing of an OSD and a recovery -> application issues). We
> know we have very big buckets (eg. bucket of 77 million objects with only 16
> shards) that need a reshard so we were looking at the resharding process.
>
> First thing we would like to do is making a backup of the bucket index, but
> this failed with:
>
> # radosgw-admin -n client.radosgw.be-west-3 bi list
> --bucket=priv-prod-up-alex > /var/backup/priv-prod-up-alex.list.backup
> 2017-07-03 21:28:30.325613 7f07fb8bc9c0 0 System already converted
> ERROR: bi_list(): (4) Interrupted system call
>
> When I grep for "idx" and I count these:
> # grep idx priv-prod-up-alex.list.backup | wc -l
> 2294942
> When I do a bucket stats for that bucket I get:
> # radosgw-admin -n client.radosgw.be-west-3 bucket stats
> --bucket=priv-prod-up-alex | grep num_objects
> 2017-07-03 21:33:05.776499 7faca49b89c0 0 System already converted
> "num_objects": 20148575
>
> It looks like there are 18 million objects missing and the backup is not
> complete (not sure if that's a correct assumption?). We're also afraid that
> the resharding command will face the same issue.
> Has anyone seen this behaviour before or any thoughts on how to fix it?
>
> We were also wondering if we really need the backup. As the resharding
> process creates a complete new index and keeps the old bucket, is there
> maybe a possibility to relink your bucket to the old bucket in case of
> issues? Or am I missing something important here?
>
> Any help would be greatly appreciated, thanks!
>
> Regards,
> Maarten
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
>
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com