Re: [ceph-large] radosgw index pool too big

Thomas Bennett <thomas@xxxxxxxxx> · Wed, 9 Jan 2019 16:24:37 +0200

Hi Varada,

Thanks for the update.

I've turned dynamic resharding off and I'm busy checking my pool for stale indexes to see if I'm experiencing the same problem. I found there is also some reports on ceph-users mailing list:

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-November/030983.html

Cheers,
Tom

On Wed, Jan 9, 2019 at 4:10 PM Varada Kari (System Engineer) <varadaraja.kari@xxxxxxxxxxxx> wrote:
I am not sure, if this problem was reported earlier. We debugged using rocksdb tools and observed the keys being repeated in some fashion, found out to be valid keys and some more observation on rgw nodes revealed resharding happening too frequently on the same buckets, multiple times without any IO on them. Then the later steps followed once we identified the problem. Our setup is having close to a billion objects and we were bitten badly by the dynamic resharding and aggressive balancer. Later we adjusted the balancer frequency to few hours and slowly we recovered the cluster to normal state. Don't forget to run compaction either by admin socket or setting a config option to compact on mount and restarting the osds. We experienced some lookups hitting the suicide timeout on reading few keys and on rebalancing by balancer. 

Regards,
Varada

On Wed, Jan 9, 2019 at 6:14 PM Thomas Bennett <thomas@xxxxxxxxx> wrote:
Awesome, thanks!

Is there an email thread that I can follow somewhere?

Regards,
Tom

On Wed, 09 Jan 2019 at 13:26, Varada Kari (System Engineer) <varadaraja.kari@xxxxxxxxxxxx> wrote:
Hi,
we also faced same situation as index pool getting full in Luminous(12.2.3) release. For us we have enabled bucket resharding which was running in a loop and filled up all the index osds and not deleting old reshard entries. 
As a resolution, we have disabled bucket resharding to arrest the problem and compiled latest Luminous code(12.2.11, which not released yet) and deleted all the old reshard entries. New command options were added to delete/purge the old reshard entries. After this step ran manual compaction on all the osds to reduce the read latencies on BlueFS. 

Regards,
Varada

On Wed, Jan 9, 2019 at 4:17 PM Thomas Bennett <thomas@xxxxxxxxx> wrote:
Hi Wido,
Thanks for your reply.

Are you storing a lot of (small) objects in the buckets?

No. All objects are < 4MB, around 10MB.

How much real data is there in the buckets data pool?

Only 7% used - 0.4 PB.

With 51 PGs on the NVMe you are on the low side, you will want to have

this hovering around 150 or even 200 on NVMe drives to get the best

performance.

Thanks. Do you think this also relates to the large omaps?

Cheers,
Tom
_______________________________________________

Ceph-large mailing list

Ceph-large@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com

-- 
Thomas Bennett

SARAO
Science Data Processing

-- 
Thomas Bennett

SARAO
Science Data Processing

_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com