Re: RGW Multisite metadata sync init

Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx> · Thu, 7 Sep 2017 23:27:34 +0300

On Thu, Sep 7, 2017 at 11:02 PM, David Turner <drakonstein@xxxxxxxxx> wrote:
> I created a test user named 'ice' and then used it to create a bucket named
> ice.  The bucket ice can be found in the second dc, but not the user.
> `mdlog list` showed ice for the bucket, but not for the user.  I performed
> the same test in the internal realm and it showed the user and bucket both
> in `mdlog list`.
>

Maybe your radosgw-admin command is running with a ceph user that
doesn't have permissions to write to the log pool? (probably not,
because you are able to run the sync init commands).
Another very slim explanation would be if you had for some reason
overlapping zones configuration that shared some of the config but not
all of it, having radosgw running against the correct one and
radosgw-admin against the bad one. I don't think it's the second
option.

Yehuda

>
>
> On Thu, Sep 7, 2017 at 3:27 PM Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx>
> wrote:
>>
>> On Thu, Sep 7, 2017 at 10:04 PM, David Turner <drakonstein@xxxxxxxxx>
>> wrote:
>> > One realm is called public with a zonegroup called public-zg with a zone
>> > for
>> > each datacenter.  The second realm is called internal with a zonegroup
>> > called internal-zg with a zone for each datacenter.  they each have
>> > their
>> > own rgw's and load balancers.  The needs of our public facing rgw's and
>> > load
>> > balancers vs internal use ones was different enough that we split them
>> > up
>> > completely.  We also have a local realm that does not use multisite and
>> > a
>> > 4th realm called QA that mimics the public realm as much as possible for
>> > staging configuration stages for the rgw daemons.  All 4 realms have
>> > their
>> > own buckets, users, etc and that is all working fine.  For all of the
>> > radosgw-admin commands I am using the proper identifiers to make sure
>> > that
>> > each datacenter and realm are running commands on exactly what I expect
>> > them
>> > to (--rgw-realm=public --rgw-zonegroup=public-zg --rgw-zone=public-dc1
>> > --source-zone=public-dc2).
>> >
>> > The data sync issue was in the internal realm but running a data sync
>> > init
>> > and kickstarting the rgw daemons in each datacenter fixed the data
>> > discrepancies (I'm thinking it had something to do with a power failure
>> > a
>> > few months back that I just noticed recently).  The metadata sync issue
>> > is
>> > in the public realm.  I have no idea what is causing this to not sync
>> > properly since running a `metadata sync init` catches it back up to the
>> > primary zone, but then it doesn't receive any new users created after
>> > that.
>> >
>>
>> Sounds like an issue with the metadata log in the primary master zone.
>> Not sure what could go wrong there, but maybe the master zone doesn't
>> know that it is a master zone, or it's set to not log metadata. Or
>> maybe there's a problem when the secondary is trying to fetch the
>> metadata log. Maybe some kind of # of shards mismatch (though not
>> likely).
>> Try to see if the master logs any changes: should use the
>> 'radosgw-admin mdlog list' command.
>>
>> Yehuda
>>
>> > On Thu, Sep 7, 2017 at 2:52 PM Yehuda Sadeh-Weinraub <yehuda@xxxxxxxxxx>
>> > wrote:
>> >>
>> >> On Thu, Sep 7, 2017 at 7:44 PM, David Turner <drakonstein@xxxxxxxxx>
>> >> wrote:
>> >> > Ok, I've been testing, investigating, researching, etc for the last
>> >> > week
>> >> > and
>> >> > I don't have any problems with data syncing.  The clients on one side
>> >> > are
>> >> > creating multipart objects while the multisite sync is creating them
>> >> > as
>> >> > whole objects and one of the datacenters is slower at cleaning up the
>> >> > shadow
>> >> > files.  That's the big discrepancy between object counts in the pools
>> >> > between datacenters.  I created a tool that goes through for each
>> >> > bucket
>> >> > in
>> >> > a realm and does a recursive listing of all objects in it for both
>> >> > datacenters and compares the 2 lists for any differences.  The data
>> >> > is
>> >> > definitely in sync between the 2 datacenters down to the modified
>> >> > time
>> >> > and
>> >> > byte of each file in s3.
>> >> >
>> >> > The metadata is still not syncing for the other realm, though.  If I
>> >> > run
>> >> > `metadata sync init` then the second datacenter will catch up with
>> >> > all
>> >> > of
>> >> > the new users, but until I do that newly created users on the primary
>> >> > side
>> >> > don't exist on the secondary side.  `metadata sync status`, `sync
>> >> > status`,
>> >> > `metadata sync run` (only left running for 30 minutes before I ctrl+c
>> >> > it),
>> >> > etc don't show any problems... but the new users just don't exist on
>> >> > the
>> >> > secondary side until I run `metadata sync init`.  I created a new
>> >> > bucket
>> >> > with the new user and the bucket shows up in the second datacenter,
>> >> > but
>> >> > no
>> >> > objects because the objects don't have a valid owner.
>> >> >
>> >> > Thank you all for the help with the data sync issue.  You pushed me
>> >> > into
>> >> > good directions.  Does anyone have any insight as to what is
>> >> > preventing
>> >> > the
>> >> > metadata from syncing in the other realm?  I have 2 realms being sync
>> >> > using
>> >> > multi-site and it's only 1 of them that isn't getting the metadata
>> >> > across.
>> >> > As far as I can tell it is configured identically.
>> >>
>> >> What do you mean you have two realms? Zones and zonegroups need to
>> >> exist in the same realm in order for meta and data sync to happen
>> >> correctly. Maybe I'm misunderstanding.
>> >>
>> >> Yehuda
>> >>
>> >> >
>> >> > On Thu, Aug 31, 2017 at 12:46 PM David Turner <drakonstein@xxxxxxxxx>
>> >> > wrote:
>> >> >>
>> >> >> All of the messages from sync error list are listed below.  The
>> >> >> number
>> >> >> on
>> >> >> the left is how many times the error message is found.
>> >> >>
>> >> >>    1811                     "message": "failed to sync bucket
>> >> >> instance:
>> >> >> (16) Device or resource busy"
>> >> >>       7                     "message": "failed to sync bucket
>> >> >> instance:
>> >> >> (5) Input\/output error"
>> >> >>      65                     "message": "failed to sync object"
>> >> >>
>> >> >> On Tue, Aug 29, 2017 at 10:00 AM Orit Wasserman
>> >> >> <owasserm@xxxxxxxxxx>
>> >> >> wrote:
>> >> >>>
>> >> >>>
>> >> >>> Hi David,
>> >> >>>
>> >> >>> On Mon, Aug 28, 2017 at 8:33 PM, David Turner
>> >> >>> <drakonstein@xxxxxxxxx>
>> >> >>> wrote:
>> >> >>>>
>> >> >>>> The vast majority of the sync error list is "failed to sync bucket
>> >> >>>> instance: (16) Device or resource busy".  I can't find anything on
>> >> >>>> Google
>> >> >>>> about this error message in relation to Ceph.  Does anyone have
>> >> >>>> any
>> >> >>>> idea
>> >> >>>> what this means? and/or how to fix it?
>> >> >>>
>> >> >>>
>> >> >>> Those are intermediate errors resulting from several radosgw trying
>> >> >>> to
>> >> >>> acquire the same sync log shard lease. It doesn't effect the sync
>> >> >>> progress.
>> >> >>> Are there any other errors?
>> >> >>>
>> >> >>> Orit
>> >> >>>>
>> >> >>>>
>> >> >>>> On Fri, Aug 25, 2017 at 2:48 PM Casey Bodley <cbodley@xxxxxxxxxx>
>> >> >>>> wrote:
>> >> >>>>>
>> >> >>>>> Hi David,
>> >> >>>>>
>> >> >>>>> The 'data sync init' command won't touch any actual object data,
>> >> >>>>> no.
>> >> >>>>> Resetting the data sync status will just cause a zone to restart
>> >> >>>>> a
>> >> >>>>> full sync
>> >> >>>>> of the --source-zone's data changes log. This log only lists
>> >> >>>>> which
>> >> >>>>> buckets/shards have changes in them, which causes radosgw to
>> >> >>>>> consider them
>> >> >>>>> for bucket sync. So while the command may silence the warnings
>> >> >>>>> about
>> >> >>>>> data
>> >> >>>>> shards being behind, it's unlikely to resolve the issue with
>> >> >>>>> missing
>> >> >>>>> objects
>> >> >>>>> in those buckets.
>> >> >>>>>
>> >> >>>>> When data sync is behind for an extended period of time, it's
>> >> >>>>> usually
>> >> >>>>> because it's stuck retrying previous bucket sync failures. The
>> >> >>>>> 'sync
>> >> >>>>> error
>> >> >>>>> list' may help narrow down where those failures are.
>> >> >>>>>
>> >> >>>>> There is also a 'bucket sync init' command to clear the bucket
>> >> >>>>> sync
>> >> >>>>> status. Following that with a 'bucket sync run' should restart a
>> >> >>>>> full sync
>> >> >>>>> on the bucket, pulling in any new objects that are present on the
>> >> >>>>> source-zone. I'm afraid that those commands haven't seen a lot of
>> >> >>>>> polish or
>> >> >>>>> testing, however.
>> >> >>>>>
>> >> >>>>> Casey
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> On 08/24/2017 04:15 PM, David Turner wrote:
>> >> >>>>>
>> >> >>>>> Apparently the data shards that are behind go in both directions,
>> >> >>>>> but
>> >> >>>>> only one zone is aware of the problem.  Each cluster has objects
>> >> >>>>> in
>> >> >>>>> their
>> >> >>>>> data pool that the other doesn't have.  I'm thinking about
>> >> >>>>> initiating a
>> >> >>>>> `data sync init` on both sides (one at a time) to get them back
>> >> >>>>> on
>> >> >>>>> the same
>> >> >>>>> page.  Does anyone know if that command will overwrite any local
>> >> >>>>> data that
>> >> >>>>> the zone has that the other doesn't if you run `data sync init`
>> >> >>>>> on
>> >> >>>>> it?
>> >> >>>>>
>> >> >>>>> On Thu, Aug 24, 2017 at 1:51 PM David Turner
>> >> >>>>> <drakonstein@xxxxxxxxx>
>> >> >>>>> wrote:
>> >> >>>>>>
>> >> >>>>>> After restarting the 2 RGW daemons on the second site again,
>> >> >>>>>> everything caught up on the metadata sync.  Is there something
>> >> >>>>>> about having
>> >> >>>>>> 2 RGW daemons on each side of the multisite that might be
>> >> >>>>>> causing
>> >> >>>>>> an issue
>> >> >>>>>> with the sync getting stale?  I have another realm set up the
>> >> >>>>>> same
>> >> >>>>>> way that
>> >> >>>>>> is having a hard time with its data shards being behind.  I
>> >> >>>>>> haven't
>> >> >>>>>> told
>> >> >>>>>> them to resync, but yesterday I noticed 90 shards were behind.
>> >> >>>>>> It's caught
>> >> >>>>>> back up to only 17 shards behind, but the oldest change not
>> >> >>>>>> applied
>> >> >>>>>> is 2
>> >> >>>>>> months old and no order of restarting RGW daemons is helping to
>> >> >>>>>> resolve
>> >> >>>>>> this.
>> >> >>>>>>
>> >> >>>>>> On Thu, Aug 24, 2017 at 10:59 AM David Turner
>> >> >>>>>> <drakonstein@xxxxxxxxx>
>> >> >>>>>> wrote:
>> >> >>>>>>>
>> >> >>>>>>> I have a RGW Multisite 10.2.7 set up for bi-directional
>> >> >>>>>>> syncing.
>> >> >>>>>>> This has been operational for 5 months and working fine.  I
>> >> >>>>>>> recently created
>> >> >>>>>>> a new user on the master zone, used that user to create a
>> >> >>>>>>> bucket,
>> >> >>>>>>> and put in
>> >> >>>>>>> a public-acl object in there.  The Bucket created on the second
>> >> >>>>>>> site, but
>> >> >>>>>>> the user did not and the object errors out complaining about
>> >> >>>>>>> the
>> >> >>>>>>> access_key
>> >> >>>>>>> not existing.
>> >> >>>>>>>
>> >> >>>>>>> That led me to think that the metadata isn't syncing, while
>> >> >>>>>>> bucket
>> >> >>>>>>> and data both are.  I've also confirmed that data is syncing
>> >> >>>>>>> for
>> >> >>>>>>> other
>> >> >>>>>>> buckets as well in both directions. The sync status from the
>> >> >>>>>>> second site was
>> >> >>>>>>> this.
>> >> >>>>>>>
>> >> >>>>>>>   metadata sync syncing
>> >> >>>>>>>
>> >> >>>>>>>                 full sync: 0/64 shards
>> >> >>>>>>>
>> >> >>>>>>>                 incremental sync: 64/64 shards
>> >> >>>>>>>
>> >> >>>>>>>                 metadata is caught up with master
>> >> >>>>>>>
>> >> >>>>>>>       data sync source: f4c12327-4721-47c9-a365-86332d84c227
>> >> >>>>>>> (public-atl01)
>> >> >>>>>>>
>> >> >>>>>>>                         syncing
>> >> >>>>>>>
>> >> >>>>>>>                         full sync: 0/128 shards
>> >> >>>>>>>
>> >> >>>>>>>                         incremental sync: 128/128 shards
>> >> >>>>>>>
>> >> >>>>>>>                         data is caught up with source
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> Sync status leads me to think that the second site believes it
>> >> >>>>>>> is
>> >> >>>>>>> up
>> >> >>>>>>> to date, even though it is missing a freshly created user.  I
>> >> >>>>>>> restarted all
>> >> >>>>>>> of the rgw daemons for the zonegroup, but it didn't trigger
>> >> >>>>>>> anything to fix
>> >> >>>>>>> the missing user in the second site.  I did some googling and
>> >> >>>>>>> found the sync
>> >> >>>>>>> init commands mentioned in a few ML posts and used metadata
>> >> >>>>>>> sync
>> >> >>>>>>> init and
>> >> >>>>>>> now have this as the sync status.
>> >> >>>>>>>
>> >> >>>>>>>   metadata sync preparing for full sync
>> >> >>>>>>>
>> >> >>>>>>>                 full sync: 64/64 shards
>> >> >>>>>>>
>> >> >>>>>>>                 full sync: 0 entries to sync
>> >> >>>>>>>
>> >> >>>>>>>                 incremental sync: 0/64 shards
>> >> >>>>>>>
>> >> >>>>>>>                 metadata is behind on 70 shards
>> >> >>>>>>>
>> >> >>>>>>>                 oldest incremental change not applied:
>> >> >>>>>>> 2017-03-01
>> >> >>>>>>> 21:13:43.0.126971s
>> >> >>>>>>>
>> >> >>>>>>>       data sync source: f4c12327-4721-47c9-a365-86332d84c227
>> >> >>>>>>> (public-atl01)
>> >> >>>>>>>
>> >> >>>>>>>                         syncing
>> >> >>>>>>>
>> >> >>>>>>>                         full sync: 0/128 shards
>> >> >>>>>>>
>> >> >>>>>>>                         incremental sync: 128/128 shards
>> >> >>>>>>>
>> >> >>>>>>>                         data is caught up with source
>> >> >>>>>>>
>> >> >>>>>>>
>> >> >>>>>>> It definitely triggered a fresh sync and told it to forget
>> >> >>>>>>> about
>> >> >>>>>>> what
>> >> >>>>>>> it's previously applied as the date of the oldest change not
>> >> >>>>>>> applied is the
>> >> >>>>>>> day we initially set up multisite for this zone.  The problem
>> >> >>>>>>> is
>> >> >>>>>>> that was
>> >> >>>>>>> over 12 hours ago and the sync stat hasn't caught up on any
>> >> >>>>>>> shards
>> >> >>>>>>> yet.
>> >> >>>>>>>
>> >> >>>>>>> Does anyone have any suggestions other than blast the second
>> >> >>>>>>> site
>> >> >>>>>>> and
>> >> >>>>>>> set it back up with a fresh start (the only option I can think
>> >> >>>>>>> of
>> >> >>>>>>> at this
>> >> >>>>>>> point)?
>> >> >>>>>>>
>> >> >>>>>>> Thank you,
>> >> >>>>>>> David Turner
>> >> >>>>>
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> _______________________________________________
>> >> >>>>> ceph-users mailing list
>> >> >>>>> ceph-users@xxxxxxxxxxxxxx
>> >> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >>>>>
>> >> >>>>>
>> >> >>>>> _______________________________________________
>> >> >>>>> ceph-users mailing list
>> >> >>>>> ceph-users@xxxxxxxxxxxxxx
>> >> >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >>>>
>> >> >>>>
>> >> >>>> _______________________________________________
>> >> >>>> ceph-users mailing list
>> >> >>>> ceph-users@xxxxxxxxxxxxxx
>> >> >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >>>>
>> >> >
>> >> > _______________________________________________
>> >> > ceph-users mailing list
>> >> > ceph-users@xxxxxxxxxxxxxx
>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >> >
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com