Fast growing monstore during large recovery

Wido den Hollander <wido@xxxxxxxx> · Tue, 8 Nov 2016 16:13:38 +0100 (CET)

Hi,

Last Friday evening I got a call from a customer which had set it's tunables to 'optimal' since he saw a warning.

This 2.0000 OSD (8PB) cluster was initially installed with Firefly and upgraded to Hammer and Jewel.

His change caused a 88% degradation in the cluster which he left running for over 5 hours before the MON stores grew beyond 15GB and he called me.

I eventually reverted the change since another hour later we were at 26GB of MON store and only a few percent additional recovery had been done.

We had 50% of space (80GB) left on the MON stores and I wasn't convinced we would make it without running out of space on the MONs (5x), so I fetched the old CRUSHMap from a OSDMap and injected it back in. A few hours later we were back to HEALTH_OK.

What I learned is that the MON stores can grow quite fast, but are also heavy on disk I/O.

In this case the SSDs weren't the best (850 Pro, don't ask) and they couldn't keep up with all the changes. They are being swapped now for the Intel S3710 400GB and Samsung SM863 480GB (mixing vendors).

The main reasons for the large SSDs:
- Performance
- Enough space to store a very large MON database

Something to keep in mind with a large cluster. A big re-shuffle of data can lead to MON stores growing rather large.

Just wanted to share this.

Wido
_______________________________________________
Ceph-large mailing list
Ceph-large@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-large-ceph.com