Re: MDS: obscene buffer_anon memory use when scanning lots of files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Apr 12, 2020 at 9:33 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> Hi John,
>
> Did you make any progress on investigating this?
>
> Today I also saw huge relative buffer_anon usage on our 2 active mds's
> running 14.2.8:
>
>     "mempool": {
>         "by_pool": {
>             "bloom_filter": {
>                 "items": 2322,
>                 "bytes": 2322
>             },
>             ...
>             "buffer_anon": {
>                 "items": 4947214,
>                 "bytes": 19785847411
>             },
>             ...
>             "osdmap": {
>                 "items": 4036,
>                 "bytes": 89488
>             },
>             ...
>             "mds_co": {
>                 "items": 9248718,
>                 "bytes": 157725128
>             },
>             ...
>         },
>         "total": {
>             "items": 14202290,
>             "bytes": 19943664349
>         }
>     }
>
> That mds has `mds cache memory limit = 15353442304` and there was no
> health warning about the mds memory usage exceeding the limit.
> (I only noticed because some other crons on the mds's were going oom).
>
> Patrick: is there any known memory leak in nautilus mds's ?

I restarted one MDS with ms_type = simple and that MDS maintained a
normal amount of buffer_anon for several hours, while the other active
MDS (with async ms type) saw its buffer_anon grow by some ~10GB
overnight.
So, it seems there are still memory leaks with ms_type = async in 14.2.8.

OTOH, the whole cluster is kinda broken now due to
https://tracker.ceph.com/issues/45080, which may be related to the
ms_type=simple .. I'm still debugging.

Cheers, Dan

> Any tips to debug this further?
>
> Cheers, Dan
>
> On Wed, Mar 4, 2020 at 8:38 PM John Madden <jmadden.com@xxxxxxxxx> wrote:
> >
> > Though it appears potentially(?) better, I'm still having issues with
> > this on 14.2.8. Kick off the ~20 threads sequentially reading ~1M
> > files and buffer_anon still grows apparently without bound.
> >
> > mds.1 tcmalloc heap stats:------------------------------------------------
> > MALLOC:    53710413656 (51222.2 MiB) Bytes in use by application
> > MALLOC: +            0 (    0.0 MiB) Bytes in page heap freelist
> > MALLOC: +    334028128 (  318.6 MiB) Bytes in central cache freelist
> > MALLOC: +     11210608 (   10.7 MiB) Bytes in transfer cache freelist
> > MALLOC: +     11105240 (   10.6 MiB) Bytes in thread cache freelists
> > MALLOC: +     77525152 (   73.9 MiB) Bytes in malloc metadata
> > MALLOC:   ------------
> > MALLOC: =  54144282784 (51636.0 MiB) Actual memory used (physical + swap)
> > MALLOC: +     49963008 (   47.6 MiB) Bytes released to OS (aka unmapped)
> > MALLOC:   ------------
> > MALLOC: =  54194245792 (51683.7 MiB) Virtual address space used
> > MALLOC:
> > MALLOC:         262021              Spans in use
> > MALLOC:             18              Thread heaps in use
> > MALLOC:           8192              Tcmalloc page size
> > ------------------------------------------------
> >
> > The byte count appears to grow even as the item count drops, though
> > the trend is for both to increase over the life of the workload:
> > ceph daemon mds.1 dump_mempools | jq .mempool.by_pool.buffer_anon:
> >
> > {
> >   "items": 28045,
> >   "bytes": 24197601109
> > }
> > {
> >   "items": 27132,
> >   "bytes": 24262495865
> > }
> > {
> >   "items": 27105,
> >   "bytes": 24262537939
> > }
> > {
> >   "items": 33309,
> >   "bytes": 29754507505
> > }
> > {
> >   "items": 36160,
> >   "bytes": 31803033733
> > }
> > {
> >   "items": 56772,
> >   "bytes": 51062350351
> > }
> >
> > Is there further data/debug I can retrieve to help track this down?
> >
> >
> > On Wed, Feb 19, 2020 at 4:38 PM John Madden <jmadden.com@xxxxxxxxx> wrote:
> > >
> > > Ah, no, I hadn't seen that. Patiently awaiting .8 then. Thanks!
> > >
> > > On Mon, Feb 17, 2020 at 8:52 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Mon, Feb 10, 2020 at 8:31 PM John Madden <jmadden.com@xxxxxxxxx> wrote:
> > > > >
> > > > > Upgraded to 14.2.7, doesn't appear to have affected the behavior. As requested:
> > > >
> > > > In case it wasn't clear -- the fix that Patrick mentioned was
> > > > postponed to 14.2.8.
> > > >
> > > > -- dan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux