Re: Newbie to Ceph jacked up his monitor

Eneko Lacunza <elacunza@xxxxxxxxx> · Tue, 24 Mar 2020 10:56:50 +0100

Hi Jarett,

El 23/3/20 a las 3:52, Jarett DeAngelis escribió:
So, I thought I’d post with what I learned re: what to do with this problem.

This system is a 3-node Proxmox cluster, and each node had:

1 x 1TB NVMe
2 x 512GB HDD

I had maybe 100GB of data in this system total. Then I added:

2 x 256GB SSD
1 x 1TB HDD

To each system, and let it start rebalancing. When it started the management interface showed the storage as being out of order in various ways, but it was clear that Ceph was rebalancing PGs across the 3 nodes and the “broken” part of the graphic display was shrinking as it spread data across the added OSDs.

In the process, however, the monitors racked up ENORMOUS amounts of files. On one machine, the boot drive only has 64GB of space total so the partition where /var/lib/ceph/somethingsomething.db lived was only 27GB. This filled up very, very fast, and eventually killed the monitor on that node. I figured out you can `ceph-monstore-tool compact` or `ceph-kvstore-tool rocksdb /path compact` to get the system to truncate the files in there, but even when I scheduled those jobs to run on each monitor every minute the amount of space being taken up by those rocksdb files grew and grew until they threatened to kill the monitors on the nodes with larger amounts of space too. Other, dumber measures I tried taking to give the system more space for these files ended up screwing up my Proxmox system, so now I have to reinstall.

What can be done about this problem so that I don’t have this issue when I try to implement again?

This is quite low in details to understand well what happened. I'll 
supose you added the new disks all at once, without waiting the 
rebalancing to finish?

I'd suggest:

- Use same size disks. If you add new disks, don't add smaller disks.
- Add one disk (OSD) at a time.

We usually use 20GB root partitions for monitors, never had any problem 
with disk size (small clusters like yours).

Cheers
Eneko

--
Zuzendari Teknikoa / Director Técnico
Binovo IT Human Project, S.L.
Telf. 943569206
Astigarragako bidea 2, 2º izq. oficina 11; 20180 Oiartzun (Gipuzkoa)
www.binovo.es
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx