Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

Kai Stian Olstad <ceph+list@xxxxxxxxxx> · Mon, 20 Sep 2021 10:11:38 +0200

On 17.09.2021 16:10, Eugen Block wrote:
Since I'm trying to test different erasure encoding plugin and  
technique I don't want the balancer active.
So I tried setting it to none as Eguene suggested, and to my  surprise 
I did not get any degraded messages at all, and the cluster  was in 
HEALTH_OK the whole time.

Interesting, maybe the balancer works differently now? Or it works
differently under heavy load?

It would be strange that the balancer normal operation is to put the 
cluster in degraded mode.

The only suspicious lines I see are these:

 Sep 17 06:30:01 pech-mon-1 conmon[1337]: debug
2021-09-17T06:30:01.402+0000 7f66b0329700  1 heartbeat_map
reset_timeout 'Monitor::cpu_tp thread 0x7f66b0329700' had timed out
after 0.000000000s

But I'm not sure if this is related. The out OSDs shouldn't have any
impact on this test.

Did you monitor the network saturation during these tests with iftop
or something similar?

I did not, so I rerun the test this morning.

All the servers have 2x25Gbit/s NIC in bonding with LACP 802.3ad 
layer3+4.

The peak on the active monitor was 27 Mbit/s and less on the other 2 
monitors.
I also checked the CPU(Xeon 5222 3.8 GHz) and non of the cores was 
saturated,
and network statistics show no errors or drops.

So perhaps there is a bug in the balancer code?

--
Kai Stian Olstad
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx