Re: RGW: Reshard index of non-master zones in multi-site

Iain Buclaw <ibuclaw@xxxxxxxxxx> · Tue, 19 Feb 2019 09:59:39 +0100

On Wed, 6 Feb 2019 at 09:28, Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote:
>
> On Tue, 5 Feb 2019 at 10:04, Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote:
> >
> > On Tue, 5 Feb 2019 at 09:46, Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote:
> > >
> > > Hi,
> > >
> > > Following the update of one secondary site from 12.2.8 to 12.2.11, the
> > > following warning have come up.
> > >
> > > HEALTH_WARN 1 large omap objects
> > > LARGE_OMAP_OBJECTS 1 large omap objects
> > >     1 large objects found in pool '.rgw.buckets.index'
> > >     Search the cluster log for 'Large omap object found' for more details.
> > >
> >
> > [...]
> >
> > > Is this the reason why resharding hasn't propagated?
> > >
> >
> > Furthermore, infact it looks like the index is broken on the secondaries.
> >
> > On the master:
> >
> > # radosgw-admin bi get --bucket=mybucket --object=myobject
> > {
> >     "type": "plain",
> >     "idx": "myobject",
> >     "entry": {
> >         "name": "myobject",
> >         "instance": "",
> >         "ver": {
> >             "pool": 28,
> >             "epoch": 8848
> >         },
> >         "locator": "",
> >         "exists": "true",
> >         "meta": {
> >             "category": 1,
> >             "size": 9200,
> >             "mtime": "2018-03-27 21:12:56.612172Z",
> >             "etag": "c365c324cda944d2c3b687c0785be735",
> >             "owner": "mybucket",
> >             "owner_display_name": "Bucket User",
> >             "content_type": "application/octet-stream",
> >             "accounted_size": 9194,
> >             "user_data": ""
> >         },
> >         "tag": "0ef1a91a-4aee-427e-bdf8-30589abb2d3e.36603989.137292",
> >         "flags": 0,
> >         "pending_map": [],
> >         "versioned_epoch": 0
> >     }
> > }
> >
> >
> > On the secondaries:
> >
> > # radosgw-admin bi get --bucket=mybucket --object=myobject
> > ERROR: bi_get(): (2) No such file or directory
> >
> > How does one go about rectifying this mess?
> >
>
> Random blog in language I don't understand seems to allude to using
> radosgw-admin bi put to restore backed up indexes, but not under what
> circumstances you would use such a command.
>
> https://cloud.tencent.com/developer/article/1032854
>
> Would this be safe to run on secondaries?
>

Removed the bucket on the secondaries and scheduled new sync.  However
this gets stuck at some point and radosgw is complaining about:

data sync: WARNING: skipping data log entry for missing bucket
mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.92151615.1:21

Hopeless that RGW can't even do a simple job right, I removed the
problematic bucket on the master, but now there are now hundreds of
shard objects inside the index pool, all look to be orphaned, and
still the warnings for missing bucket continue to happen on the
secondaries.  In some cases there's an object on the secondary that
doesn't exist on the master.

All the while, ceph is still complaining about large omap files.

$ ceph daemon mon.ceph-mon-1 config get
osd_deep_scrub_large_omap_object_value_sum_threshold
{
    "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824"
}

It seems implausible that the cluster is still complaining about this
when the largest omap contains 71405 entries.

I can't run bi purge or metadata rm on the unreferenced entries
because the bucket itself is no more.  Can I remove objects from the
index pool using 'rados rm' ?

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com