Re: MDS: obscene buffer_anon memory use when scanning lots of files

"Yan, Zheng" <ukernel@xxxxxxxxx> · Tue, 14 Apr 2020 23:45:51 +0800

On Tue, Apr 14, 2020 at 9:41 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> On Tue, Apr 14, 2020 at 2:50 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> >
> > On Sun, Apr 12, 2020 at 9:33 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> > >
> > > Hi John,
> > >
> > > Did you make any progress on investigating this?
> > >
> > > Today I also saw huge relative buffer_anon usage on our 2 active mds's
> > > running 14.2.8:
> > >
> > >     "mempool": {
> > >         "by_pool": {
> > >             "bloom_filter": {
> > >                 "items": 2322,
> > >                 "bytes": 2322
> > >             },
> > >             ...
> > >             "buffer_anon": {
> > >                 "items": 4947214,
> > >                 "bytes": 19785847411
> > >             },
> > >             ...
> > >             "osdmap": {
> > >                 "items": 4036,
> > >                 "bytes": 89488
> > >             },
> > >             ...
> > >             "mds_co": {
> > >                 "items": 9248718,
> > >                 "bytes": 157725128
> > >             },
> > >             ...
> > >         },
> > >         "total": {
> > >             "items": 14202290,
> > >             "bytes": 19943664349
> > >         }
> > >     }
> > >
> > > That mds has `mds cache memory limit = 15353442304` and there was no
> > > health warning about the mds memory usage exceeding the limit.
> > > (I only noticed because some other crons on the mds's were going oom).
> > >
> > > Patrick: is there any known memory leak in nautilus mds's ?
> >
> > I restarted one MDS with ms_type = simple and that MDS maintained a
> > normal amount of buffer_anon for several hours, while the other active
> > MDS (with async ms type) saw its buffer_anon grow by some ~10GB
> > overnight.
> > So, it seems there are still memory leaks with ms_type = async in 14.2.8.
> >
> > OTOH, the whole cluster is kinda broken now due to
> > https://tracker.ceph.com/issues/45080, which may be related to the
> > ms_type=simple .. I'm still debugging.
>
> Indeed, the combination of msgr v2 and `ms type = simple` on a
> ceph-mds leads to deadlocked mds ops as soon as any osd restarts.
> Looks like we have to find the root cause of the memory leak rather
> than working around it with ms type = simple.
>
> Dan
>

I opened https://tracker.ceph.com/issues/45090. It can explain the
buffer_anon memory use.

Regards
Yan, Zheng

>
> >
> > Cheers, Dan
> >
> > > Any tips to debug this further?
> > >
> > > Cheers, Dan
> > >
> > > On Wed, Mar 4, 2020 at 8:38 PM John Madden <jmadden.com@xxxxxxxxx> wrote:
> > > >
> > > > Though it appears potentially(?) better, I'm still having issues with
> > > > this on 14.2.8. Kick off the ~20 threads sequentially reading ~1M
> > > > files and buffer_anon still grows apparently without bound.
> > > >
> > > > mds.1 tcmalloc heap stats:------------------------------------------------
> > > > MALLOC:    53710413656 (51222.2 MiB) Bytes in use by application
> > > > MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
> > > > MALLOC: +    334028128 (  318.6 MiB) Bytes in central cache freelist
> > > > MALLOC: +     11210608 (   10.7 MiB) Bytes in transfer cache freelist
> > > > MALLOC: +     11105240 (   10.6 MiB) Bytes in thread cache freelists
> > > > MALLOC: +     77525152 (   73.9 MiB) Bytes in malloc metadata
> > > > MALLOC:   ------------
> > > > MALLOC: =  54144282784 (51636.0 MiB) Actual memory used (physical + swap)
> > > > MALLOC: +     49963008 (   47.6 MiB) Bytes released to OS (aka unmapped)
> > > > MALLOC:   ------------
> > > > MALLOC: =  54194245792 (51683.7 MiB) Virtual address space used
> > > > MALLOC:
> > > > MALLOC:         262021              Spans in use
> > > > MALLOC:             18              Thread heaps in use
> > > > MALLOC:           8192              Tcmalloc page size
> > > > ------------------------------------------------
> > > >
> > > > The byte count appears to grow even as the item count drops, though
> > > > the trend is for both to increase over the life of the workload:
> > > > ceph daemon mds.1 dump_mempools | jq .mempool.by_pool.buffer_anon:
> > > >
> > > > {
> > > >   "items": 28045,
> > > >   "bytes": 24197601109
> > > > }
> > > > {
> > > >   "items": 27132,
> > > >   "bytes": 24262495865
> > > > }
> > > > {
> > > >   "items": 27105,
> > > >   "bytes": 24262537939
> > > > }
> > > > {
> > > >   "items": 33309,
> > > >   "bytes": 29754507505
> > > > }
> > > > {
> > > >   "items": 36160,
> > > >   "bytes": 31803033733
> > > > }
> > > > {
> > > >   "items": 56772,
> > > >   "bytes": 51062350351
> > > > }
> > > >
> > > > Is there further data/debug I can retrieve to help track this down?
> > > >
> > > >
> > > > On Wed, Feb 19, 2020 at 4:38 PM John Madden <jmadden.com@xxxxxxxxx> wrote:
> > > > >
> > > > > Ah, no, I hadn't seen that. Patiently awaiting .8 then. Thanks!
> > > > >
> > > > > On Mon, Feb 17, 2020 at 8:52 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Mon, Feb 10, 2020 at 8:31 PM John Madden <jmadden.com@xxxxxxxxx> wrote:
> > > > > > >
> > > > > > > Upgraded to 14.2.7, doesn't appear to have affected the behavior. As requested:
> > > > > >
> > > > > > In case it wasn't clear -- the fix that Patrick mentioned was
> > > > > > postponed to 14.2.8.
> > > > > >
> > > > > > -- dan
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx