Looks like the image attachment got removed. Please find it here: https://imgur.com/a/3tabzCN ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Frank Schilder <frans@xxxxxx> Sent: 31 August 2020 14:42 To: Mark Nelson; Dan van der Ster; ceph-users Subject: Re: OSD memory leak? Hi Dan and Mark, sorry, took a bit longer. I uploaded a new archive containing files with the following format (https://files.dtu.dk/u/jb0uS6U9LlCfvS5L/heap_profiling-2020-08-31.tgz?l - valid 60 days): - osd.195.profile.*.heap - raw heap dump file - osd.195.profile.*.heap.txt - output of conversion with --text - osd.195.profile.*.heap-base0001.txt - output of conversion with --text against first dump as base - osd.195.*.heap_stats - output of ceph daemon osd.195 heap stats, every hour - osd.195.*.mempools - output of ceph daemon osd.195 dump_mempools, every hour - osd.195.*.perf - output of ceph daemon osd.195 perf dump, every hour, counters are reset Only for the last couple of days are converted files included, post-conversion of everything simply takes too long. Please find also attached a recording of memory usage on one of the relevant OSD nodes. I marked restarts of all OSDs/the host with vertical red lines. What is worrying is the self-amplifying nature of the leak. ts not a linear process, it looks at least quadratic if not exponential. What we are looking for is, given the comparably short uptime, probably still in the lower percentages with increasing rate. The OSDs just started to overrun their limit: top - 14:38:49 up 155 days, 19:17, 1 user, load average: 5.99, 4.59, 4.59 Tasks: 684 total, 1 running, 293 sleeping, 0 stopped, 0 zombie %Cpu(s): 1.9 us, 0.9 sy, 0.0 ni, 89.6 id, 7.6 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 65727628 total, 6937548 free, 41921260 used, 16868820 buff/cache KiB Swap: 93532160 total, 90199040 free, 3333120 used. 6740136 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4099023 ceph 20 0 5918704 3.8g 9700 S 1.7 6.1 378:37.01 /usr/bin/ceph-osd --cluster ceph -f -i 35 --setuser cep+ 4097639 ceph 20 0 5340924 3.0g 11428 S 87.1 4.7 14636:30 /usr/bin/ceph-osd --cluster ceph -f -i 195 --setuser ce+ 4097974 ceph 20 0 3648188 2.3g 9628 S 8.3 3.6 1375:58 /usr/bin/ceph-osd --cluster ceph -f -i 201 --setuser ce+ 4098322 ceph 20 0 3478980 2.2g 9688 S 5.3 3.6 1426:05 /usr/bin/ceph-osd --cluster ceph -f -i 223 --setuser ce+ 4099374 ceph 20 0 3446784 2.2g 9252 S 4.6 3.5 1142:14 /usr/bin/ceph-osd --cluster ceph -f -i 205 --setuser ce+ 4098679 ceph 20 0 3832140 2.2g 9796 S 6.6 3.5 1248:26 /usr/bin/ceph-osd --cluster ceph -f -i 132 --setuser ce+ 4100782 ceph 20 0 3641608 2.2g 9652 S 7.9 3.5 1278:10 /usr/bin/ceph-osd --cluster ceph -f -i 207 --setuser ce+ 4095944 ceph 20 0 3375672 2.2g 8968 S 7.3 3.5 1250:02 /usr/bin/ceph-osd --cluster ceph -f -i 108 --setuser ce+ 4096956 ceph 20 0 3509376 2.2g 9456 S 7.9 3.5 1157:27 /usr/bin/ceph-osd --cluster ceph -f -i 203 --setuser ce+ 4099731 ceph 20 0 3563652 2.2g 8972 S 3.6 3.5 1421:48 /usr/bin/ceph-osd --cluster ceph -f -i 61 --setuser cep+ 4096262 ceph 20 0 3531988 2.2g 9040 S 9.9 3.5 1600:15 /usr/bin/ceph-osd --cluster ceph -f -i 121 --setuser ce+ 4100442 ceph 20 0 3359736 2.1g 9804 S 4.3 3.4 1185:53 /usr/bin/ceph-osd --cluster ceph -f -i 226 --setuser ce+ 4096617 ceph 20 0 3443060 2.1g 9432 S 5.0 3.4 1449:29 /usr/bin/ceph-osd --cluster ceph -f -i 199 --setuser ce+ 4097298 ceph 20 0 3483532 2.1g 9600 S 5.6 3.3 1265:28 /usr/bin/ceph-osd --cluster ceph -f -i 97 --setuser cep+ 4100093 ceph 20 0 3428348 2.0g 9568 S 3.3 3.2 1298:53 /usr/bin/ceph-osd --cluster ceph -f -i 197 --setuser ce+ 4095630 ceph 20 0 3440160 2.0g 8976 S 3.6 3.2 1451:35 /usr/bin/ceph-osd --cluster ceph -f -i 62 --setuser cep+ Generally speaking, increasing the cache minimum seems to help with keeping important information in RAM. Unfortunately, it also means that swap usage starts much earlier. Best regards and thanks for your help, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx