ceph-mon hanging when setting hdd osd's out

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

we are in the process of upgrading one of our three containerised ceph clusters from luminous(12.2.12) to nautilus(14.2.9). We already upgraded all ceph-mons, ceph-mgrs and all of our ceph-osd hosts that are full flash nodes with optane cache and nvme data devices. Our process to upgrade osds is to completely purge a complete host and redeploy the osd's with nautilus and lvm under it.

This all went fine until we started touching our hdd nodes which serve a pool that provides a cephfs with erasure coding. The problem we are facing at the moment is that when we set a single hdd osd out the ceph command starts hanging for a couple of tens of seconds and the ceph quorum gets degraded because one of the ceph-mons gets marked as out with a lease_timeout. In the logs we could see some slow ops from the failing ceph-mon which were mon_subscribe events from osd's (full flash and hdd) hanging between all_read and dispatched for around 10 seconds.

In our metrics we can see that the memory consumption of that one ceph-mon (not the leader!) increases to up to 60GB and also the cpu usage increases dramatically. Looking into logs does not show any obvious problem, we can see that the cluster sets the osd out and starts backfilling to other osds, but at some point the failing mon stops logging completely for around one minute then resumes logging after rejoining the quorum and keeps on logging normal backfilling behaviour.

Here are some heap stats after the cluster was responsive and the ceph-mon was behaving normal again:

MALLOC:      711286936 (  678.3 MiB) Bytes in use by application 
MALLOC: +  27156045824 (25898.0 MiB) Bytes in page heap freelist
MALLOC: +     17420216 (   16.6 MiB) Bytes in central cache freelist
MALLOC: +      9370880 (    8.9 MiB) Bytes in transfer cache freelist
MALLOC: +     25277104 (   24.1 MiB) Bytes in thread cache freelists
MALLOC: +    104857600 (  100.0 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: =  28024258560 (26726.0 MiB) Actual memory used (physical + swap)
MALLOC: +  75189862400 (71706.6 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: = 103214120960 (98432.7 MiB) Virtual address space used
MALLOC:
MALLOC:          33647              Spans in use
MALLOC:             23              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.

We tested all this in a test cluster without any problems.

Does anyone has an idea what could be going on or where to look for further debugging?

Best Regards
Max
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux