Re: RGW: Reshard index of non-master zones in multi-site

Christian Balzer <chibi@xxxxxxx> · Mon, 8 Apr 2019 07:59:20 +0900

On Fri, 5 Apr 2019 11:42:28 -0400 Casey Bodley wrote:

> Hi Iain,
> 
> Resharding is not supported in multisite. The issue is that the master zone
> needs to be authoritative for all metadata. If bucket reshard commands run
> on the secondary zone, they create new bucket instance metadata that the
> master zone never sees, so replication can't reconcile those changes.
> 

Unless the above should read "dynamic resharding..." this is in clear
contrast to the documentation by Redhat Iain cited.

But given how costly manual resharding is including service interruption,
that's not really a option for most people either.

Looks like Ceph is out of the race for multi-PB use case here, unless
multi-site and dynamic resharding are less than 6 months away.

Regards,

Christian

> The 'stale-instances rm' command is not safe to run in multisite because it
> can misidentify as 'stale' some bucket instances that were deleted on the
> master zone, where data sync on the secondary zone hasn't yet finished
> deleting all of the objects it contained. Deleting these bucket instances
> and their associated bucket index objects would leave any remaining objects
> behind as orphans and leak storage capacity.
> 
> On Thu, Apr 4, 2019 at 3:28 PM Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote:
> 
> > On Wed, 3 Apr 2019 at 09:41, Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote:  
> > >
> > > On Tue, 19 Feb 2019 at 10:11, Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote:  
> > > >
> > > >
> > > > # ./radosgw-gc-bucket-indexes.sh master.rgw.buckets.index | wc -l
> > > > 7511
> > > >
> > > > # ./radosgw-gc-bucket-indexes.sh secondary1.rgw.buckets.index | wc -l
> > > > 3509
> > > >
> > > > # ./radosgw-gc-bucket-indexes.sh secondary2.rgw.buckets.index | wc -l
> > > > 3801
> > > >  
> > >
> > > Documentation is a horrid mess around the subject on multi-site  
> > resharding  
> > >
> > >  
> > http://docs.ceph.com/docs/mimic/radosgw/dynamicresharding/#manual-bucket-resharding  
> > >
> > >  
> > https://www.suse.com/documentation/suse-enterprise-storage-5/book_storage_admin/data/ogw_bucket_sharding.html  
> > > (Manual Resharding)
> > >
> > >  
> > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/object_gateway_guide_for_red_hat_enterprise_linux/index#manually-resharding-buckets-with-multisite-rgw  
> > >
> > > All disagree with each other over the correct process to reshard
> > > indexes in multi-site.  Worse, none of them seem to work correctly
> > > anyway.
> > >
> > > Changelog of 13.2.5 looked promising up until the sentence: "These
> > > commands should not be used on a multisite setup as the stale
> > > instances may be unlikely to be from a reshard and can have
> > > consequences".
> > >
> > > http://docs.ceph.com/docs/master/releases/mimic/#v13-2-5-mimic
> > >  
> >
> > The stale-instances feature only correctly identifies one stale shard.
> >
> > # radosgw-admin reshard stale-instances list
> > [
> >     "mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1"
> > ]
> >
> > I can confirm this is one of the orphaned index objects.
> >
> > # rados -p .rgw.buckets.index ls | grep
> > 0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.0
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.3
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.9
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.5
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.2
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.7
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.1
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.10
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.4
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.6
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.11
> > .dir.0ef1a91a-4aee-427e-bdf8-30589abb2d3e.97248676.1.8
> >
> > I would assume then that unlike what documentation says, it's safe to
> > run 'reshard stale-instances rm' on a multi-site setup.
> >
> > However it is quite telling if the author of this feature doesn't
> > trust what they have written to work correctly.
> >
> > There are still thousands of stale index objects that 'stale-instances
> > list' didn't pick up though.  But it appears that radosgw-admin only
> > looks at 'metadata list bucket' data, and not what is physically
> > inside the pool.
> >
> > --
> > Iain Buclaw
> >
> > *(p < e ? p++ : p) = (c & 0x0f) + '0';
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >  

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Communications
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com