Re: loosing one node from a 3-node cluster

"Wolfpaw - Dale Corse" <dale@xxxxxxxxxxx> · Mon, 4 Apr 2022 19:00:00 -0600

Hi Felix,

Where are your monitors located? Do you have one on each node?

  <https://image.signature.email/images/60ccfa60a0bcd.jpg> 

Dale Corse 

CEO/CTO 

Cell:  <tel:780-504-1756> 780-504-1756 

24/7 NOC:  <tel:888-965-3729> 888-965-3729 

 <https://www.wolfpaw.com> www.wolfpaw.com   <https://image.signature.email/img/spacer.gif>  <https://linkedin.com/in/dale-corse-2343a> 

From: Felix Joussein [mailto:felix.joussein@xxxxxx] 
Sent: Monday, April 4, 2022 6:54 PM
To: ceph-users@xxxxxxx
Subject:  loosing one node from a 3-node cluster

Hi Everyone,

I run a 3-node proxmox+ceph cluster in my home-lab serving as rdb storage for virtual machines for 2 years now.

When I installed it, I did some testing to ensure, that when one node would fail, the remaining 2 nodes would keep the system up while the 3rd node is being replaced.

Recently I had to reboot a node on that cluster and realized, that the redundancy was gone.

Each of the 3 nodes has 4x4TB OSDs which makes 16TB per node or 48 in total.

As mentioned, I use proxmox, so I used their interface to set up the OSDs and Pools.

I have 2 Pools. One for my Virtual machines, one for ceph-fs.

Each pool's size/min is 3/2, has 256 PGs and Autoscaler on.

And now here's what I don't understand: I have the impression, that for what reason ever, it seams, as if my cluster would be over provisioned:

As the command outputs below show, ceph-iso_metadata consume 19TB accordingly to ceph df, how ever, the mounted ceph-iso filesystem is only 9.2TB big.

Same goes with my ceph-vm storage, that ceph belives is 8.3TB but in reality is only 6.3TB (accordingly to the proxmox gui).

The problem now is obvious: out of my 48TB Rawdata I should not be using more then 16TB, else I can't afford to loose a node.

Now Ceph tells me, that in total I am using 27TB, but compared to the mounted volumes/storages I am not using more then 16TB.

So, where are the 11TB (27-16) gone?

What am I not understanding?

Thank you for any hint on that.

regards,

Felix

ceph df
--- RAW STORAGE ---
CLASS  SIZE    AVAIL   USED    RAW USED  %RAW USED
hdd    44 TiB  17 TiB  27 TiB    27 TiB      61.70
TOTAL  44 TiB  17 TiB  27 TiB    27 TiB      61.70

--- POOLS ---
POOL                   ID  PGS  STORED   OBJECTS  USED     %USED  MAX AVAIL
device_health_metrics   1    1      0 B        0      0 B      0    3.0 TiB
ceph-vm                 2  256  2.7 TiB  804.41k  8.3 TiB  47.76    3.0 TiB
ceph-iso_data           3  256  6.1 TiB    3.11M   19 TiB  67.23    3.0 TiB
ceph-iso_metadata       4   32  3.1 GiB  132.51k  9.3 GiB   0.10    3.0 TiB

rados df
POOL_NAME                 USED  OBJECTS  CLONES   COPIES  MISSING_ON_PRIMARY  UNFOUND  DEGRADED       RD_OPS      RD       WR_OPS       WR  USED COMPR  UNDER COMPR
ceph-iso_data           19 TiB  3105013       0  9315039                   0        0         0        75202  97 GiB        28776  9.2 MiB         0 B          0 B
ceph-iso_metadata      9.3 GiB   132515       0   397545                   0        0         0  15856613330  13 TiB  28336539064   93 TiB         0 B          0 B
ceph-vm                8.3 TiB   804409       0  2413227                   0        0         0     94160784  40 TiB     62581002  4.4 TiB         0 B          0 B
device_health_metrics      0 B        0       0        0                   0        0         0            0     0 B            0      0 B         0 B          0 B

total_objects    4041937
total_used       27 TiB
total_avail      17 TiB
total_space      44 TiB

df -h

Size    Used  Avail  Avail% mounted on

9,2T    6,2T  3,1T   67%    /mnt/pve/ceph-iso

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx