Re: RGW: Reshard index of non-master zones in multi-site

Matt Benjamin <mbenjami@xxxxxxxxxx> · Sun, 7 Apr 2019 23:01:22 -0400

Hi Christian,

Dynamic bucket-index sharding for multi-site setups is being worked
on, and will land in the N release cycle.

regards,

Matt

On Sun, Apr 7, 2019 at 6:59 PM Christian Balzer <chibi@xxxxxxx> wrote:
>
> On Fri, 5 Apr 2019 11:42:28 -0400 Casey Bodley wrote:
>
> > Hi Iain,
> >
> > Resharding is not supported in multisite. The issue is that the master zone
> > needs to be authoritative for all metadata. If bucket reshard commands run
> > on the secondary zone, they create new bucket instance metadata that the
> > master zone never sees, so replication can't reconcile those changes.
> >
>
> Unless the above should read "dynamic resharding..." this is in clear
> contrast to the documentation by Redhat Iain cited.
>
> But given how costly manual resharding is including service interruption,
> that's not really a option for most people either.
>
> Looks like Ceph is out of the race for multi-PB use case here, unless
> multi-site and dynamic resharding are less than 6 months away.
>
> Regards,
>
> Christian
>
> > The 'stale-instances rm' command is not safe to run in multisite because it
> > can misidentify as 'stale' some bucket instances that were deleted on the
> > master zone, where data sync on the secondary zone hasn't yet finished
> > deleting all of the objects it contained. Deleting these bucket instances
> > and their associated bucket index objects would leave any remaining objects
> > behind as orphans and leak storage capacity.
> >
> > On Thu, Apr 4, 2019 at 3:28 PM Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote:
> >
> > > On Wed, 3 Apr 2019 at 09:41, Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote:
> > > >
> > > > On Tue, 19 Feb 2019 at 10:11, Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote:
> > > > >
> > > > >
> > > > > # ./radosgw-gc-bucket-indexes.sh master.rgw.buckets.index | wc -l
> > > > > 7511
> > > > >
> > > > > # ./radosgw-gc-bucket-indexes.sh secondary1.rgw.buckets.index | wc -l
> > > > > 3509
> > > > >
> > > > > # ./radosgw-gc-bucket-indexes.sh secondary2.rgw.buckets.index | wc -l
> > > > > 3801
> > > > >
> > > >
> > > > Documentation is a horrid mess around the subject on multi-site
> > > resharding
> > > >
> > > >
> > > http://docs.ceph.com/docs/mimic/radosgw/dynamicresharding/#manual-bucket-resharding
> > > >
> > > >
> > > https://www.suse.com/documentation/suse-enterprise-storage-5/book_storage_admin/data/ogw_bucket_sharding.html
> > > > (Manual Resharding)
> > > >
> > > >
> > > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/object_gateway_guide_for_red_hat_enterprise_linux/index#manually-resharding-buckets-with-multisite-rgw
> > > >
> > > > All disagree with each other over the correct process to reshard
> > > > indexes in multi-site.  Worse, none of them seem to work correctly
> > > > anyway.
> > > >
> > > > Changelog of 13.2.5 looked promising up until the sentence: "These
> > > > commands should not be used on a multisite setup as the stale
> > > > instances may be unlikely to be from a reshard and can have
> > > > consequences".
> > > >
> > > > http://docs.ceph.com/docs/master/releases/mimic/#v13-2-5-mimic
> > > >
> > >
> > > The stale-instances feature only correctly identifies one stale shard.
> > >
> > > # radosgw-admin reshard stale-instances list
> > > [
> > >     "mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1"
> > > ]
> > >
> > > I can confirm this is one of the orphaned index objects.
> > >
> > > # rados -p .rgw.buckets.index ls | grep
> > > 0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.0
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.3
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.9
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.5
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.2
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.7
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.1
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.10
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.4
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.6
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.11
> > > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.8
> > >
> > > I would assume then that unlike what documentation says, it's safe to
> > > run 'reshard stale-instances rm' on a multi-site setup.
> > >
> > > However it is quite telling if the author of this feature doesn't
> > > trust what they have written to work correctly.
> > >
> > > There are still thousands of stale index objects that 'stale-instances
> > > list' didn't pick up though.  But it appears that radosgw-admin only
> > > looks at 'metadata list bucket' data, and not what is physically
> > > inside the pool.
> > >
> > > --
> > > Iain Buclaw
> > >
> > > *(p < e ? p++ : p) = (c & 0x0f) + '0';
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@xxxxxxxxxxxxxx
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
>
>
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx           Rakuten Communications
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com