Re: Omap issues - metadata creating too many

"J. Eric Ivancich" <ivancich@xxxxxxxxxx> · Tue, 18 Dec 2018 17:19:24 -0500

On 12/17/18 9:18 AM, Josef Zelenka wrote:
> Hi everyone, i'm running a Luminous 12.2.5 cluster with 6 hosts on
> ubuntu 16.04 - 12 HDDs for data each, plus 2 SSD metadata OSDs(three
> nodes have an additional SSD i added to have more space to rebalance the
> metadata). CUrrently, the cluster is used mainly as a radosgw storage,
> with 28tb data in total, replication 2x for both the metadata and data
> pools(a cephfs isntance is running alongside there, but i don't think
> it's the perpetrator - this happenned likely before we had it). All
> pools aside from the data pool of the cephfs and data pool of the
> radosgw are located on the SSD's. Now, the interesting thing - at random
> times, the metadata OSD's fill up their entire capacity with OMAP data
> and go to r/o mode and we have no other option currently than deleting
> them and re-creating. The fillup comes at a random time, it doesn't seem
> to be triggered by anything and it isn't caused by some data influx. It
> seems like some kind of a bug to me to be honest, but i'm not certain -
> anyone else seen this behavior with their radosgw? Thanks a lot

Hi Josef,

Do you have rgw_dynamic_resharding turned on? Try turning it off and see
if the behavior continues.

One theory is that dynamic resharding is triggered and possibly not
completing. This could add a lot of data to omap for the incomplete
bucket index shards. After a delay it tries resharding again, possibly
failing again, and adding more data to the omap. This continues.

If this is the ultimate issue we have some commits on the upstream
luminous branch that are designed to address this set of issues.

But we should first see if this is the cause.

Eric
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com