Re: OSD huge memory consumption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Marius,

Have you changed any of the default settings?  You've got a huge number of pglog entries.  Do you have any other pools as well? Even though pglog is only taking up 6-7GB of the 37GB used, that's a bit of a red flag for me.  Something we don't track via the mempools is taking up a ton of memory and it'll take some work to track down what it is.  IF you can reproduce this easily, it might be worth trying a heap dump.  Instructions are here:

https://docs.ceph.com/en/latest/rados/troubleshooting/memory-profiling/


Try to leave the profiler running while memory is in the process of growing beyond the osd_memory_target. If that doesn't tell us anything, valgrind or analyzing a core dump is probably the next step.

Mark

On 12/5/21 9:44 AM, Marius Leustean wrote:
I've got a small cluster with
- 8 hosts and 1 OSD (4TB SSD) per host.
- version: 16.2.6
- pool pg_num=256

The cluster serves as RBD backend for VMs. There is a relatively small load
on the cluster. Each VM has a few snapshots, which are saved into another
HDD pool.

Below is a capture from one of the OSDs (but they all behave pretty much
the same), during a healthy state of the cluster:
- docker stats / top reports ~37GB consumed by the ceph-osd container.
- Heap dump reports are pretty much the same as top / docker stats.
- dump_mempools reports much less: around ~11GB RAM total (pglog around
~7GB).

ceph tell osd.3 heap dump



osd.3 dumping heap profile now.

------------------------------------------------

MALLOC:    30292314848 (28889.0 MiB) Bytes in use by application

MALLOC: +      1236992 (    1.2 MiB) Bytes in page heap freelist

MALLOC: +   1090272792 ( 1039.8 MiB) Bytes in central cache freelist

MALLOC: +      7152128 (    6.8 MiB) Bytes in transfer cache freelist

MALLOC: +    109016328 (  104.0 MiB) Bytes in thread cache freelists

MALLOC: +    154140672 (  147.0 MiB) Bytes in malloc metadata

MALLOC:   ------------

MALLOC: =  31654133760 (30187.7 MiB) Actual memory used (physical + swap)

MALLOC: +  12524781568 (11944.6 MiB) Bytes released to OS (aka unmapped)

MALLOC:   ------------

MALLOC: =  44178915328 (42132.3 MiB) Virtual address space used

MALLOC:

MALLOC:        2119728              Spans in use

MALLOC:             52              Thread heaps in use

MALLOC:           8192              Tcmalloc page size

------------------------------------------------


pprof:

File: ceph-osd

Type: inuse_space

Showing nodes accounting for 142.83MB, 100% of 142.83MB total

Dropped 2 nodes (cum <= 0.71MB)

       flat  flat%   sum%        cum   cum%

   142.83MB   100%   100%   142.83MB   100%  [ceph-osd]

          0     0%   100%   142.83MB   100%  [libc-2.28.so]

          0     0%   100%   142.83MB   100%  [libpthread-2.28.so]

          0     0%   100%    12.71MB  8.90%  [libstdc++.so.6.0.25]


mempools:


{

     "mempool": {

         "by_pool": {

             "bloom_filter": {

                 "items": 0,

                 "bytes": 0

             },

             "bluestore_alloc": {

                 "items": 16162345,

                 "bytes": 189696680

             },

             "bluestore_cache_data": {

                 "items": 98509,

                 "bytes": 1669081348

             },

             "bluestore_cache_onode": {

                 "items": 144259,

                 "bytes": 88863544

             },

             "bluestore_cache_meta": {

                 "items": 13760035,

                 "bytes": 117467523

             },

             "bluestore_cache_other": {

                 "items": 14800061,

                 "bytes": 554492280

             },

             "bluestore_Buffer": {

                 "items": 50604,

                 "bytes": 4857984

             },

             "bluestore_Extent": {

                 "items": 2737534,

                 "bytes": 131401632

             },

             "bluestore_Blob": {

                 "items": 2713453,

                 "bytes": 282199112

             },

             "bluestore_SharedBlob": {

                 "items": 2709774,

                 "bytes": 303494688

             },

             "bluestore_inline_bl": {

                 "items": 1999,

                 "bytes": 751616

             },

             "bluestore_fsck": {

                 "items": 0,

                 "bytes": 0

             },

             "bluestore_txc": {

                 "items": 14,

                 "bytes": 10976

             },

             "bluestore_writing_deferred": {

                 "items": 74,

                 "bytes": 284674

             },

             "bluestore_writing": {

                 "items": 53,

                 "bytes": 346278

             },

             "bluefs": {

                 "items": 38190,

                 "bytes": 530976

             },

             "bluefs_file_reader": {

                 "items": 533,

                 "bytes": 76146048

             },

             "bluefs_file_writer": {

                 "items": 3,

                 "bytes": 576

             },

             "buffer_anon": {

                 "items": 151865,

                 "bytes": 30323982

             },

             "buffer_meta": {

                 "items": 150716,

                 "bytes": 13263008

             },

             "osd": {

                 "items": 107,

                 "bytes": 1210384

             },

             "osd_mapbl": {

                 "items": 0,

                 "bytes": 0

             },

             "osd_pglog": {

                 "items": 65355549,

                 "bytes": 6813635624

             },

             "osdmap": {

                 "items": 52714,

                 "bytes": 1537680

             },

             "osdmap_mapping": {

                 "items": 0,

                 "bytes": 0

             },

             "pgmap": {

                 "items": 0,

                 "bytes": 0

             },

             "mds_co": {

                 "items": 0,

                 "bytes": 0

             },

             "unittest_1": {

                 "items": 0,

                 "bytes": 0

             },

             "unittest_2": {

                 "items": 0,

                 "bytes": 0

             }

         },

         "total": {

             "items": 118928391,

             "bytes": 10279596613

         }

     }

}


Any feedback is much appreciated.


Thanks,


Marius.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux