Hi John, Did you make any progress on investigating this? Today I also saw huge relative buffer_anon usage on our 2 active mds's running 14.2.8: "mempool": { "by_pool": { "bloom_filter": { "items": 2322, "bytes": 2322 }, ... "buffer_anon": { "items": 4947214, "bytes": 19785847411 }, ... "osdmap": { "items": 4036, "bytes": 89488 }, ... "mds_co": { "items": 9248718, "bytes": 157725128 }, ... }, "total": { "items": 14202290, "bytes": 19943664349 } } That mds has `mds cache memory limit = 15353442304` and there was no health warning about the mds memory usage exceeding the limit. (I only noticed because some other crons on the mds's were going oom). Patrick: is there any known memory leak in nautilus mds's ? Any tips to debug this further? Cheers, Dan On Wed, Mar 4, 2020 at 8:38 PM John Madden <jmadden.com@xxxxxxxxx> wrote: > > Though it appears potentially(?) better, I'm still having issues with > this on 14.2.8. Kick off the ~20 threads sequentially reading ~1M > files and buffer_anon still grows apparently without bound. > > mds.1 tcmalloc heap stats:------------------------------------------------ > MALLOC: 53710413656 (51222.2 MiB) Bytes in use by application > MALLOC: + 0 ( 0.0 MiB) Bytes in page heap freelist > MALLOC: + 334028128 ( 318.6 MiB) Bytes in central cache freelist > MALLOC: + 11210608 ( 10.7 MiB) Bytes in transfer cache freelist > MALLOC: + 11105240 ( 10.6 MiB) Bytes in thread cache freelists > MALLOC: + 77525152 ( 73.9 MiB) Bytes in malloc metadata > MALLOC: ------------ > MALLOC: = 54144282784 (51636.0 MiB) Actual memory used (physical + swap) > MALLOC: + 49963008 ( 47.6 MiB) Bytes released to OS (aka unmapped) > MALLOC: ------------ > MALLOC: = 54194245792 (51683.7 MiB) Virtual address space used > MALLOC: > MALLOC: 262021 Spans in use > MALLOC: 18 Thread heaps in use > MALLOC: 8192 Tcmalloc page size > ------------------------------------------------ > > The byte count appears to grow even as the item count drops, though > the trend is for both to increase over the life of the workload: > ceph daemon mds.1 dump_mempools | jq .mempool.by_pool.buffer_anon: > > { > "items": 28045, > "bytes": 24197601109 > } > { > "items": 27132, > "bytes": 24262495865 > } > { > "items": 27105, > "bytes": 24262537939 > } > { > "items": 33309, > "bytes": 29754507505 > } > { > "items": 36160, > "bytes": 31803033733 > } > { > "items": 56772, > "bytes": 51062350351 > } > > Is there further data/debug I can retrieve to help track this down? > > > On Wed, Feb 19, 2020 at 4:38 PM John Madden <jmadden.com@xxxxxxxxx> wrote: > > > > Ah, no, I hadn't seen that. Patiently awaiting .8 then. Thanks! > > > > On Mon, Feb 17, 2020 at 8:52 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote: > > > > > > On Mon, Feb 10, 2020 at 8:31 PM John Madden <jmadden.com@xxxxxxxxx> wrote: > > > > > > > > Upgraded to 14.2.7, doesn't appear to have affected the behavior. As requested: > > > > > > In case it wasn't clear -- the fix that Patrick mentioned was > > > postponed to 14.2.8. > > > > > > -- dan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx