Re: OSD memory leak?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Looks like the image attachment got removed. Please find it here: https://imgur.com/a/3tabzCN

=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Frank Schilder <frans@xxxxxx>
Sent: 31 August 2020 14:42
To: Mark Nelson; Dan van der Ster; ceph-users
Subject:  Re: OSD memory leak?

Hi Dan and Mark,

sorry, took a bit longer. I uploaded a new archive containing files with the following format (https://files.dtu.dk/u/jb0uS6U9LlCfvS5L/heap_profiling-2020-08-31.tgz?l - valid 60 days):

- osd.195.profile.*.heap - raw heap dump file
- osd.195.profile.*.heap.txt - output of conversion with --text
- osd.195.profile.*.heap-base0001.txt - output of conversion with --text against first dump as base
- osd.195.*.heap_stats - output of ceph daemon osd.195 heap stats, every hour
- osd.195.*.mempools - output of ceph daemon osd.195 dump_mempools, every hour
- osd.195.*.perf - output of ceph daemon osd.195 perf dump, every hour, counters are reset

Only for the last couple of days are converted files included, post-conversion of everything simply takes too long.

Please find also attached a recording of memory usage on one of the relevant OSD nodes. I marked restarts of all OSDs/the host with vertical red lines. What is worrying is the self-amplifying nature of the leak. ts not a linear process, it looks at least quadratic if not exponential. What we are looking for is, given the comparably short uptime, probably still in the lower percentages with increasing rate. The OSDs just started to overrun their limit:

top - 14:38:49 up 155 days, 19:17,  1 user,  load average: 5.99, 4.59, 4.59
Tasks: 684 total,   1 running, 293 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.9 us,  0.9 sy,  0.0 ni, 89.6 id,  7.6 wa,  0.0 hi,  0.1 si,  0.0 st
KiB Mem : 65727628 total,  6937548 free, 41921260 used, 16868820 buff/cache
KiB Swap: 93532160 total, 90199040 free,  3333120 used.  6740136 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
4099023 ceph      20   0 5918704   3.8g   9700 S   1.7  6.1 378:37.01 /usr/bin/ceph-osd --cluster ceph -f -i 35 --setuser cep+
4097639 ceph      20   0 5340924   3.0g  11428 S  87.1  4.7  14636:30 /usr/bin/ceph-osd --cluster ceph -f -i 195 --setuser ce+
4097974 ceph      20   0 3648188   2.3g   9628 S   8.3  3.6   1375:58 /usr/bin/ceph-osd --cluster ceph -f -i 201 --setuser ce+
4098322 ceph      20   0 3478980   2.2g   9688 S   5.3  3.6   1426:05 /usr/bin/ceph-osd --cluster ceph -f -i 223 --setuser ce+
4099374 ceph      20   0 3446784   2.2g   9252 S   4.6  3.5   1142:14 /usr/bin/ceph-osd --cluster ceph -f -i 205 --setuser ce+
4098679 ceph      20   0 3832140   2.2g   9796 S   6.6  3.5   1248:26 /usr/bin/ceph-osd --cluster ceph -f -i 132 --setuser ce+
4100782 ceph      20   0 3641608   2.2g   9652 S   7.9  3.5   1278:10 /usr/bin/ceph-osd --cluster ceph -f -i 207 --setuser ce+
4095944 ceph      20   0 3375672   2.2g   8968 S   7.3  3.5   1250:02 /usr/bin/ceph-osd --cluster ceph -f -i 108 --setuser ce+
4096956 ceph      20   0 3509376   2.2g   9456 S   7.9  3.5   1157:27 /usr/bin/ceph-osd --cluster ceph -f -i 203 --setuser ce+
4099731 ceph      20   0 3563652   2.2g   8972 S   3.6  3.5   1421:48 /usr/bin/ceph-osd --cluster ceph -f -i 61 --setuser cep+
4096262 ceph      20   0 3531988   2.2g   9040 S   9.9  3.5   1600:15 /usr/bin/ceph-osd --cluster ceph -f -i 121 --setuser ce+
4100442 ceph      20   0 3359736   2.1g   9804 S   4.3  3.4   1185:53 /usr/bin/ceph-osd --cluster ceph -f -i 226 --setuser ce+
4096617 ceph      20   0 3443060   2.1g   9432 S   5.0  3.4   1449:29 /usr/bin/ceph-osd --cluster ceph -f -i 199 --setuser ce+
4097298 ceph      20   0 3483532   2.1g   9600 S   5.6  3.3   1265:28 /usr/bin/ceph-osd --cluster ceph -f -i 97 --setuser cep+
4100093 ceph      20   0 3428348   2.0g   9568 S   3.3  3.2   1298:53 /usr/bin/ceph-osd --cluster ceph -f -i 197 --setuser ce+
4095630 ceph      20   0 3440160   2.0g   8976 S   3.6  3.2   1451:35 /usr/bin/ceph-osd --cluster ceph -f -i 62 --setuser cep+

Generally speaking, increasing the cache minimum seems to help with keeping important information in RAM. Unfortunately, it also means that swap usage starts much earlier.

Best regards and thanks for your help,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux