Re: radosgw sync falling behind regularly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



HI Casey,

We're still trying to figure this sync problem out, if you could possibly tell us anything further we would be deeply grateful!

Our errors are coming from 'data sync'.   In `sync status` we pretty constantly show one shard behind, but a different one each time we run it.

Here's a paste -- these commands were run in rapid succession.

root@sv3-ceph-rgw1:~# radosgw-admin sync status
          realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
      zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
           zone 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
                source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
root@sv3-ceph-rgw1:~# radosgw-admin sync status
          realm b3e2afe7-2254-494a-9a34-ce50358779fd (savagebucket)
      zonegroup de6af748-1a2f-44a1-9d44-30799cf1313e (us)
           zone 331d3f1e-1b72-4c56-bb5a-d1d0fcf6d0b8 (sv3-prod)
  metadata sync syncing
                full sync: 0/64 shards
                incremental sync: 64/64 shards
                metadata is caught up with master
      data sync source: 107d29a0-b732-4bf1-a26e-1f64f820e839 (dc11-prod)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is behind on 1 shards
                        behind shards: [30]
                        oldest incremental change not applied: 2019-01-19 22:53:23.0.16109s
                source: 1e27bf9c-3a2f-4845-85b6-33a24bbe1c04 (sv5-corp)
                        syncing
                        full sync: 0/128 shards
                        incremental sync: 128/128 shards
                        data is caught up with source
root@sv3-ceph-rgw1:~#


Below I'm pasting a small section of log.  Thanks so much for looking!

Trey Palmer


root@sv3-ceph-rgw1:/var/log/ceph# tail -f ceph-rgw-sv3-ceph-rgw1.log | grep -i error
2019-03-08 11:43:07.208572 7fa080cc7700  0 data sync: ERROR: failed to read remote data log info: ret=-2
2019-03-08 11:43:07.211348 7fa080cc7700  0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2
2019-03-08 11:43:07.267117 7fa080cc7700  0 data sync: ERROR: failed to read remote data log info: ret=-2
2019-03-08 11:43:07.269631 7fa080cc7700  0 meta sync: ERROR: RGWBackoffControlCR called coroutine returned -2
2019-03-08 11:43:07.895192 7fa080cc7700  0 data sync: ERROR: init sync on dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2
2019-03-08 11:43:08.046685 7fa080cc7700  0 data sync: ERROR: init sync on dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2
2019-03-08 11:43:08.171277 7fa0870eb700  0 ERROR: failed to get bucket instance info for .bucket.meta.phowe_superset:phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
2019-03-08 11:43:08.171748 7fa0850e7700  0 ERROR: failed to get bucket instance info for .bucket.meta.gdfp_dev:gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
2019-03-08 11:43:08.175867 7fa08a0f1700  0 meta sync: ERROR: can't remove key: bucket.instance:phowe_superset/phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233 ret=-2
2019-03-08 11:43:08.176755 7fa0820e1700  0 data sync: ERROR: init sync on whoiswho/whoiswho:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.293 failed, retcode=-2
2019-03-08 11:43:08.176872 7fa0820e1700  0 data sync: ERROR: init sync on dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2
2019-03-08 11:43:08.176885 7fa093103700  0 ERROR: failed to get bucket instance info for .bucket.meta.phowe_superset:phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
2019-03-08 11:43:08.176925 7fa0820e1700  0 data sync: ERROR: failed to retrieve bucket info for bucket=phowe_superset/phowe_superset:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.233
2019-03-08 11:43:08.177916 7fa0910ff700  0 meta sync: ERROR: can't remove key: bucket.instance:gdfp_dev/gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158 ret=-2
2019-03-08 11:43:08.178815 7fa08b0f3700  0 ERROR: failed to get bucket instance info for .bucket.meta.gdfp_dev:gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
2019-03-08 11:43:08.178847 7fa0820e1700  0 data sync: ERROR: failed to retrieve bucket info for bucket=gdfp_dev/gdfp_dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.158
2019-03-08 11:43:08.179492 7fa0820e1700  0 data sync: ERROR: init sync on adcreative/adcreative:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.21 failed, retcode=-2
2019-03-08 11:43:08.179529 7fa0820e1700  0 data sync: ERROR: init sync on vulnerability_report/vulnerability-report:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.421 failed, retcode=-2
2019-03-08 11:43:08.179770 7fa0820e1700  0 data sync: ERROR: init sync on early_osquery/early-osquery:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.339 failed, retcode=-2
2019-03-08 11:43:08.217393 7fa0820e1700  0 data sync: ERROR: init sync on bugsnag_integration/bugsnag-integration:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.328 failed, retcode=-2
2019-03-08 11:43:08.233847 7fa0820e1700  0 data sync: ERROR: init sync on vulnerability_report/vulnerability-report:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.421 failed, retcode=-2
2019-03-08 11:43:08.233917 7fa0820e1700  0 data sync: ERROR: init sync on dmv/dmv:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.134 failed, retcode=-2
2019-03-08 11:43:08.233998 7fa0820e1700  0 data sync: ERROR: init sync on early_osquery/early-osquery:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.339 failed, retcode=-2
2019-03-08 11:43:08.273391 7fa0820e1700  0 data sync: ERROR: init sync on bugsnag_integration/bugsnag-integration:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.328 failed, retcode=-2
2019-03-08 11:43:08.745150 7fa0840e5700  0 ERROR: failed to get bucket instance info for .bucket.meta.event_dashboard:event_dashboard:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.148
2019-03-08 11:43:08.745408 7fa08c0f5700  0 ERROR: failed to get bucket instance info for .bucket.meta.produktizr_doc:produktizr_doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.241
2019-03-08 11:43:08.749571 7fa0820e1700  0 data sync: ERROR: init sync on ceph/ceph:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.427 failed, retcode=-2
2019-03-08 11:43:08.750472 7fa0820e1700  0 data sync: ERROR: init sync on terraform_dev/terraform-dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.418 failed, retcode=-2
2019-03-08 11:43:08.750508 7fa08e0f9700  0 meta sync: ERROR: can't remove key: bucket.instance:event_dashboard/event_dashboard:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.148 ret=-2
2019-03-08 11:43:08.751094 7fa0868ea700  0 meta sync: ERROR: can't remove key: bucket.instance:produktizr_doc/produktizr_doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.241 ret=-2
2019-03-08 11:43:08.751331 7fa08a8f2700  0 ERROR: failed to get bucket instance info for .bucket.meta.event_dashboard:event_dashboard:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.148
2019-03-08 11:43:08.751387 7fa0820e1700  0 data sync: ERROR: failed to retrieve bucket info for bucket=event_dashboard/event_dashboard:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.148
2019-03-08 11:43:08.751497 7fa0820e1700  0 data sync: ERROR: init sync on pithos_doc/pithos-doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.393 failed, retcode=-2
2019-03-08 11:43:08.751619 7fa0820e1700  0 data sync: ERROR: init sync on jmeter_sc/jmeter-sc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.360 failed, retcode=-2
2019-03-08 11:43:08.752037 7fa0900fd700  0 ERROR: failed to get bucket instance info for .bucket.meta.produktizr_doc:produktizr_doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.241
2019-03-08 11:43:08.752063 7fa0820e1700  0 data sync: ERROR: failed to retrieve bucket info for bucket=produktizr_doc/produktizr_doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.241
2019-03-08 11:43:08.752462 7fa0820e1700  0 data sync: ERROR: init sync on goinfosb/goinfosb:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.160 failed, retcode=-2
2019-03-08 11:43:08.793707 7fa0820e1700  0 data sync: ERROR: init sync on kafkadrm/kafkadrm:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.183 failed, retcode=-2
2019-03-08 11:43:08.809748 7fa0820e1700  0 data sync: ERROR: init sync on terraform_dev/terraform-dev:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.418 failed, retcode=-2
2019-03-08 11:43:08.809804 7fa0820e1700  0 data sync: ERROR: init sync on pithos_doc/pithos-doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.393 failed, retcode=-2
2019-03-08 11:43:08.809917 7fa0820e1700  0 data sync: ERROR: init sync on jmeter_sc/jmeter-sc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.360 failed, retcode=-2
2019-03-08 11:43:09.345180 7fa0840e5700  0 ERROR: failed to get bucket instance info for .bucket.meta.spins_on_the_ledger:spins_on_the_ledger:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.274
2019-03-08 11:43:09.349186 7fa0820e1700  0 data sync: ERROR: init sync on steno/steno:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.279 failed, retcode=-2
2019-03-08 11:43:09.349235 7fa0820e1700  0 data sync: ERROR: init sync on adjuster_kafka/adjuster-kafka:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.308 failed, retcode=-2
2019-03-08 11:43:09.349809 7fa0820e1700  0 data sync: ERROR: init sync on oauth/oauth:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.223 failed, retcode=-2
2019-03-08 11:43:09.351909 7fa08d0f7700  0 meta sync: ERROR: can't remove key: bucket.instance:spins_on_the_ledger/spins_on_the_ledger:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.274 ret=-2
2019-03-08 11:43:09.352412 7fa0820e1700  0 data sync: ERROR: init sync on sre_jmeter/sre-jmeter:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.635 failed, retcode=-2
2019-03-08 11:43:09.352609 7fa08f0fb700  0 ERROR: failed to get bucket instance info for .bucket.meta.spins_on_the_ledger:spins_on_the_ledger:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.274
2019-03-08 11:43:09.352635 7fa0820e1700  0 data sync: ERROR: failed to retrieve bucket info for bucket=spins_on_the_ledger/spins_on_the_ledger:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.274
2019-03-08 11:43:09.352831 7fa0820e1700  0 data sync: ERROR: init sync on charon_analytics/charon-analytics:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.331 failed, retcode=-2
2019-03-08 11:43:09.352903 7fa0820e1700  0 data sync: ERROR: init sync on kafka_doc/kafka-doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.362 failed, retcode=-2
2019-03-08 11:43:09.353337 7fa0820e1700  0 data sync: ERROR: init sync on serversidesequencing/serversidesequencing:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.263 failed, retcode=-2
2019-03-08 11:43:09.389559 7fa0820e1700  0 data sync: ERROR: init sync on radio_publicapi/radio-publicapi:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.401 failed, retcode=-2
2019-03-08 11:43:09.402324 7fa0820e1700  0 data sync: ERROR: init sync on adjuster_kafka/adjuster-kafka:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.308 failed, retcode=-2
2019-03-08 11:43:09.405314 7fa0820e1700  0 data sync: ERROR: init sync on charon_analytics/charon-analytics:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.331 failed, retcode=-2
2019-03-08 11:43:09.406046 7fa0820e1700  0 data sync: ERROR: init sync on kafka_doc/kafka-doc:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18467.362 failed, retcode=-2
2019-03-08 11:43:09.441428 7fa0820e1700  0 data sync: ERROR: init sync on radio_publicapi/radio-publicapi:1e27bf9c-3a2f-4845-85b6-33a24bbe1c04.18330.401 failed, retcode=-2






On Fri, Mar 8, 2019 at 10:29 AM Casey Bodley <cbodley@xxxxxxxxxx> wrote:
(cc ceph-users)

Can you tell whether these sync errors are coming from metadata sync or
data sync? Are they blocking sync from making progress according to your
'sync status'?

On 3/8/19 10:23 AM, Trey Palmer wrote:
> Casey,
>
> Having done the 'reshard stale-instances delete' earlier on the advice
> of another list member, we have tons of sync errors on deleted
> buckets, as you mention.
>
> After 'data sync init' we're still seeing all of these errors on
> deleted buckets.
>
> Since buckets are metadata, it occurred to me this morning that
> buckets are metadata so a 'sync init' wouldn't refresh that info. 
>  But a 'metadata sync init' might get rid of the stale bucket sync
> info and stop the sync errors.   Would that be the way to go?
>
> Thanks,
>
> Trey
>
>
>
> On Wed, Mar 6, 2019 at 11:47 AM Casey Bodley <cbodley@xxxxxxxxxx
> <mailto:cbodley@xxxxxxxxxx>> wrote:
>
>     Hi Trey,
>
>     I think it's more likely that these stale metadata entries are from
>     deleted buckets, rather than accidental bucket reshards. When a
>     bucket
>     is deleted in a multisite configuration, we don't delete its bucket
>     instance because other zones may still need to sync the object
>     deletes -
>     and they can't make progress on sync if the bucket metadata
>     disappears.
>     These leftover bucket instances look the same to the 'reshard
>     stale-instances' commands, but I'd be cautious about using that to
>     remove them in multisite, as it may cause more sync errors and
>     potentially leak storage if they still contain objects.
>
>     Regarding 'datalog trim', that alone isn't safe because it could trim
>     entries that hadn't been applied on other zones yet, causing them to
>     miss some updates. What you can do is run 'data sync init' on each
>     zone,
>     and restart gateways. This will restart with a data full sync (which
>     will scan all buckets for changes), and skip past any datalog entries
>     from before the full sync. I was concerned that the bug in error
>     handling (ie "ERROR: init sync on...") would also affect full
>     sync, but
>     that doesn't appear to be the case - so I do think that's worth
>     trying.
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux