Hi Marius,
Have you changed any of the default settings? You've got a huge number
of pglog entries. Do you have any other pools as well? Even though
pglog is only taking up 6-7GB of the 37GB used, that's a bit of a red
flag for me. Something we don't track via the mempools is taking up a
ton of memory and it'll take some work to track down what it is. IF you
can reproduce this easily, it might be worth trying a heap dump.
Instructions are here:
https://docs.ceph.com/en/latest/rados/troubleshooting/memory-profiling/
Try to leave the profiler running while memory is in the process of
growing beyond the osd_memory_target. If that doesn't tell us anything,
valgrind or analyzing a core dump is probably the next step.
Mark
On 12/5/21 9:44 AM, Marius Leustean wrote:
I've got a small cluster with
- 8 hosts and 1 OSD (4TB SSD) per host.
- version: 16.2.6
- pool pg_num=256
The cluster serves as RBD backend for VMs. There is a relatively small load
on the cluster. Each VM has a few snapshots, which are saved into another
HDD pool.
Below is a capture from one of the OSDs (but they all behave pretty much
the same), during a healthy state of the cluster:
- docker stats / top reports ~37GB consumed by the ceph-osd container.
- Heap dump reports are pretty much the same as top / docker stats.
- dump_mempools reports much less: around ~11GB RAM total (pglog around
~7GB).
ceph tell osd.3 heap dump
osd.3 dumping heap profile now.
------------------------------------------------
MALLOC: 30292314848 (28889.0 MiB) Bytes in use by application
MALLOC: + 1236992 ( 1.2 MiB) Bytes in page heap freelist
MALLOC: + 1090272792 ( 1039.8 MiB) Bytes in central cache freelist
MALLOC: + 7152128 ( 6.8 MiB) Bytes in transfer cache freelist
MALLOC: + 109016328 ( 104.0 MiB) Bytes in thread cache freelists
MALLOC: + 154140672 ( 147.0 MiB) Bytes in malloc metadata
MALLOC: ------------
MALLOC: = 31654133760 (30187.7 MiB) Actual memory used (physical + swap)
MALLOC: + 12524781568 (11944.6 MiB) Bytes released to OS (aka unmapped)
MALLOC: ------------
MALLOC: = 44178915328 (42132.3 MiB) Virtual address space used
MALLOC:
MALLOC: 2119728 Spans in use
MALLOC: 52 Thread heaps in use
MALLOC: 8192 Tcmalloc page size
------------------------------------------------
pprof:
File: ceph-osd
Type: inuse_space
Showing nodes accounting for 142.83MB, 100% of 142.83MB total
Dropped 2 nodes (cum <= 0.71MB)
flat flat% sum% cum cum%
142.83MB 100% 100% 142.83MB 100% [ceph-osd]
0 0% 100% 142.83MB 100% [libc-2.28.so]
0 0% 100% 142.83MB 100% [libpthread-2.28.so]
0 0% 100% 12.71MB 8.90% [libstdc++.so.6.0.25]
mempools:
{
"mempool": {
"by_pool": {
"bloom_filter": {
"items": 0,
"bytes": 0
},
"bluestore_alloc": {
"items": 16162345,
"bytes": 189696680
},
"bluestore_cache_data": {
"items": 98509,
"bytes": 1669081348
},
"bluestore_cache_onode": {
"items": 144259,
"bytes": 88863544
},
"bluestore_cache_meta": {
"items": 13760035,
"bytes": 117467523
},
"bluestore_cache_other": {
"items": 14800061,
"bytes": 554492280
},
"bluestore_Buffer": {
"items": 50604,
"bytes": 4857984
},
"bluestore_Extent": {
"items": 2737534,
"bytes": 131401632
},
"bluestore_Blob": {
"items": 2713453,
"bytes": 282199112
},
"bluestore_SharedBlob": {
"items": 2709774,
"bytes": 303494688
},
"bluestore_inline_bl": {
"items": 1999,
"bytes": 751616
},
"bluestore_fsck": {
"items": 0,
"bytes": 0
},
"bluestore_txc": {
"items": 14,
"bytes": 10976
},
"bluestore_writing_deferred": {
"items": 74,
"bytes": 284674
},
"bluestore_writing": {
"items": 53,
"bytes": 346278
},
"bluefs": {
"items": 38190,
"bytes": 530976
},
"bluefs_file_reader": {
"items": 533,
"bytes": 76146048
},
"bluefs_file_writer": {
"items": 3,
"bytes": 576
},
"buffer_anon": {
"items": 151865,
"bytes": 30323982
},
"buffer_meta": {
"items": 150716,
"bytes": 13263008
},
"osd": {
"items": 107,
"bytes": 1210384
},
"osd_mapbl": {
"items": 0,
"bytes": 0
},
"osd_pglog": {
"items": 65355549,
"bytes": 6813635624
},
"osdmap": {
"items": 52714,
"bytes": 1537680
},
"osdmap_mapping": {
"items": 0,
"bytes": 0
},
"pgmap": {
"items": 0,
"bytes": 0
},
"mds_co": {
"items": 0,
"bytes": 0
},
"unittest_1": {
"items": 0,
"bytes": 0
},
"unittest_2": {
"items": 0,
"bytes": 0
}
},
"total": {
"items": 118928391,
"bytes": 10279596613
}
}
}
Any feedback is much appreciated.
Thanks,
Marius.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx