On Wed, 6 Feb 2019 at 09:28, Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote: > > On Tue, 5 Feb 2019 at 10:04, Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote: > > > > On Tue, 5 Feb 2019 at 09:46, Iain Buclaw <ibuclaw@xxxxxxxxxx> wrote: > > > > > > Hi, > > > > > > Following the update of one secondary site from 12.2.8 to 12.2.11, the > > > following warning have come up. > > > > > > HEALTH_WARN 1 large omap objects > > > LARGE_OMAP_OBJECTS 1 large omap objects > > > 1 large objects found in pool '.rgw.buckets.index' > > > Search the cluster log for 'Large omap object found' for more details. > > > > > > > [...] > > > > > Is this the reason why resharding hasn't propagated? > > > > > > > Furthermore, infact it looks like the index is broken on the secondaries. > > > > On the master: > > > > # radosgw-admin bi get --bucket=mybucket --object=myobject > > { > > "type": "plain", > > "idx": "myobject", > > "entry": { > > "name": "myobject", > > "instance": "", > > "ver": { > > "pool": 28, > > "epoch": 8848 > > }, > > "locator": "", > > "exists": "true", > > "meta": { > > "category": 1, > > "size": 9200, > > "mtime": "2018-03-27 21:12:56.612172Z", > > "etag": "c365c324cda944d2c3b687c0785be735", > > "owner": "mybucket", > > "owner_display_name": "Bucket User", > > "content_type": "application/octet-stream", > > "accounted_size": 9194, > > "user_data": "" > > }, > > "tag": "0ef1a91a-4aee-427e-bdf8-30589abb2d3e.36603989.137292", > > "flags": 0, > > "pending_map": [], > > "versioned_epoch": 0 > > } > > } > > > > > > On the secondaries: > > > > # radosgw-admin bi get --bucket=mybucket --object=myobject > > ERROR: bi_get(): (2) No such file or directory > > > > How does one go about rectifying this mess? > > > > Random blog in language I don't understand seems to allude to using > radosgw-admin bi put to restore backed up indexes, but not under what > circumstances you would use such a command. > > https://cloud.tencent.com/developer/article/1032854 > > Would this be safe to run on secondaries? > Removed the bucket on the secondaries and scheduled new sync. However this gets stuck at some point and radosgw is complaining about: data sync: WARNING: skipping data log entry for missing bucket mybucket:0ef1a91a-4aee-427e-bdf8-30589abb2d3e.92151615.1:21 Hopeless that RGW can't even do a simple job right, I removed the problematic bucket on the master, but now there are now hundreds of shard objects inside the index pool, all look to be orphaned, and still the warnings for missing bucket continue to happen on the secondaries. In some cases there's an object on the secondary that doesn't exist on the master. All the while, ceph is still complaining about large omap files. $ ceph daemon mon.ceph-mon-1 config get osd_deep_scrub_large_omap_object_value_sum_threshold { "osd_deep_scrub_large_omap_object_value_sum_threshold": "1073741824" } It seems implausible that the cluster is still complaining about this when the largest omap contains 71405 entries. I can't run bi purge or metadata rm on the unreferenced entries because the bucket itself is no more. Can I remove objects from the index pool using 'rados rm' ? -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com