Re: RGW: Reshard index of non-master zones in multi-site

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Iain,

Resharding is not supported in multisite. The issue is that the master zone needs to be authoritative for all metadata. If bucket reshard commands run on the secondary zone, they create new bucket instance metadata that the master zone never sees, so replication can't reconcile those changes.

The 'stale-instances rm' command is not safe to run in multisite because it can misidentify as 'stale' some bucket instances that were deleted on the master zone, where data sync on the secondary zone hasn't yet finished deleting all of the objects it contained. Deleting these bucket instances and their associated bucket index objects would leave any remaining objects behind as orphans and leak storage capacity.

On Thu, Apr 4, 2019 at 3:28 PM Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote:
On Wed, 3 Apr 2019 at 09:41, Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote:
>
> On Tue, 19 Feb 2019 at 10:11, Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote:
> >
> >
> > # ./radosgw-gc-bucket-indexes.sh master.rgw.buckets.index | wc -l
> > 7511
> >
> > # ./radosgw-gc-bucket-indexes.sh secondary1.rgw.buckets.index | wc -l
> > 3509
> >
> > # ./radosgw-gc-bucket-indexes.sh secondary2.rgw.buckets.index | wc -l
> > 3801
> >
>
> Documentation is a horrid mess around the subject on multi-site resharding
>
> http://docs.ceph.com/docs/mimic/radosgw/dynamicresharding/#manual-bucket-resharding
>
> https://www.suse.com/documentation/suse-enterprise-storage-5/book_storage_admin/data/ogw_bucket_sharding.html
> (Manual Resharding)
>
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/object_gateway_guide_for_red_hat_enterprise_linux/index#manually-resharding-buckets-with-multisite-rgw
>
> All disagree with each other over the correct process to reshard
> indexes in multi-site.  Worse, none of them seem to work correctly
> anyway.
>
> Changelog of 13.2.5 looked promising up until the sentence: "These
> commands should not be used on a multisite setup as the stale
> instances may be unlikely to be from a reshard and can have
> consequences".
>
> http://docs.ceph.com/docs/master/releases/mimic/#v13-2-5-mimic
>

The stale-instances feature only correctly identifies one stale shard.

# radosgw-admin reshard stale-instances list
[
    "mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1"
]

I can confirm this is one of the orphaned index objects.

# rados -p .rgw.buckets.index ls | grep
0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.0
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.3
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.9
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.5
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.2
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.7
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.1
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.10
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.4
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.6
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.11
.dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.8

I would assume then that unlike what documentation says, it's safe to
run 'reshard stale-instances rm' on a multi-site setup.

However it is quite telling if the author of this feature doesn't
trust what they have written to work correctly.

There are still thousands of stale index objects that 'stale-instances
list' didn't pick up though.  But it appears that radosgw-admin only
looks at 'metadata list bucket' data, and not what is physically
inside the pool.

--
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux