If you can wait a few weeks until the next release of luminous there will be tooling to do this safely. Abhishek Lekshmanan of SUSE contributed the PR. It adds some sub-commands to radosgw-admin: radosgw-admin reshard stale-instances list radosgw-admin reshard stale-instances rm If you do it manually you should proceed with extreme caution as you could do some damage that you might not be able to recover from. Eric On 1/3/19 11:31 AM, Bryan Stillwell wrote: > Josef, > > > > I've noticed that when dynamic resharding is on it'll reshard some of > our bucket indices daily (sometimes more). This causes a lot of wasted > space in the .rgw.buckets.index pool which might be what you are seeing. > > > > You can get a listing of all the bucket instances in your cluster with > this command: > > > > radosgw-admin metadata list bucket.instance | jq -r '.[]' | sort > > > > Give that a try and see if you see the same problem. It seems that once > you remove the old bucket instances the omap dbs don't reduce in size > until you compact them. > > > > Bryan > > > > *From: *Josef Zelenka <josef.zelenka@xxxxxxxxxxxxxxxx> > *Date: *Thursday, January 3, 2019 at 3:49 AM > *To: *"J. Eric Ivancich" <ivancich@xxxxxxxxxx> > *Cc: *"ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>, Bryan > Stillwell <bstillwell@xxxxxxxxxxx> > *Subject: *Re: Omap issues - metadata creating too many > > > > Hi, i had the default - so it was on(according to ceph kb). turned it > > off, but the issue persists. i noticed Bryan Stillwell(cc-ing him) had > > the same issue (reported about it yesterday) - tried his tips about > > compacting, but it doesn't do anything, however i have to add to his > > last point, this happens even with bluestore. Is there anything we can > > do to clean up the omap manually? > > > > Josef > > > > On 18/12/2018 23:19, J. Eric Ivancich wrote: > > On 12/17/18 9:18 AM, Josef Zelenka wrote: > > Hi everyone, i'm running a Luminous 12.2.5 cluster with 6 hosts on > > ubuntu 16.04 - 12 HDDs for data each, plus 2 SSD metadata OSDs(three > > nodes have an additional SSD i added to have more space to > rebalance the > > metadata). CUrrently, the cluster is used mainly as a radosgw > storage, > > with 28tb data in total, replication 2x for both the metadata > and data > > pools(a cephfs isntance is running alongside there, but i don't > think > > it's the perpetrator - this happenned likely before we had it). All > > pools aside from the data pool of the cephfs and data pool of the > > radosgw are located on the SSD's. Now, the interesting thing - > at random > > times, the metadata OSD's fill up their entire capacity with > OMAP data > > and go to r/o mode and we have no other option currently than > deleting > > them and re-creating. The fillup comes at a random time, it > doesn't seem > > to be triggered by anything and it isn't caused by some data > influx. It > > seems like some kind of a bug to me to be honest, but i'm not > certain - > > anyone else seen this behavior with their radosgw? Thanks a lot > > Hi Josef, > > > > Do you have rgw_dynamic_resharding turned on? Try turning it off and see > > if the behavior continues. > > > > One theory is that dynamic resharding is triggered and possibly not > > completing. This could add a lot of data to omap for the incomplete > > bucket index shards. After a delay it tries resharding again, possibly > > failing again, and adding more data to the omap. This continues. > > > > If this is the ultimate issue we have some commits on the upstream > > luminous branch that are designed to address this set of issues. > > > > But we should first see if this is the cause. > > > > Eric > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com