Re: ceph-fuse using excessive memory

"Yan, Zheng" <ukernel@xxxxxxxxx> · Wed, 26 Sep 2018 21:16:53 +0800

Thinks for the log. I think it's caused by http://tracker.ceph.com/issues/36192

Regards
Yan, Zheng
On Wed, Sep 26, 2018 at 1:51 AM Andras Pataki
<apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi Zheng,
>
> Here is a debug dump:
> https://users.flatironinstitute.org/apataki/public_www/7f0011f676112cd4/
> I have also included some other corresponding information (cache dump,
> mempool dump, perf dump and ceph.conf).  This corresponds to a 100GB
> ceph-fuse process while the client code is running.  I can reproduce
> this issue at will in about 6 to 8 hours of running one of our
> scientific jobs - and I can also run a more instrumented/patched/etc.
> code to try.
>
> Andras
>
>
> On 9/24/18 10:06 PM, Yan, Zheng wrote:
> > On Tue, Sep 25, 2018 at 2:23 AM Andras Pataki
> > <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
> >> The whole cluster, including ceph-fuse is version 12.2.7.
> >>
> > If this issue happens again, please set "debug_objectcacher" option of
> > ceph-fuse to 15 (for 30 seconds) and set ceph-fuse log to us
> >
> > Regards
> > Yan, Zheng
> >
> >
> >> Andras
> >>
> >> On 9/24/18 6:27 AM, Yan, Zheng wrote:
> >>> On Fri, Sep 21, 2018 at 5:40 AM Andras Pataki
> >>> <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
> >>>> I've done some more experiments playing with client config parameters,
> >>>> and it seems like the the client_oc_size parameter is very correlated to
> >>>> how big ceph-fuse grows.  With its default value of 200MB, ceph-fuse
> >>>> gets to about 22GB of RSS, with our previous client_oc_size value of
> >>>> 2GB, the ceph-fuse process grows to 211GB. After this size is reached,
> >>>> its memory usage levels out.  So it seems like there is an issue
> >>>> accounting for memory for the client cache - whatever client_oc_size is
> >>>> set to, about 100 times more memory gets used in our case at least.
> >>>>
> >>> ceph-fuse version ?
> >>>
> >>>> Andras
> >>>>
> >>>> On 9/19/18 6:06 PM, Andras Pataki wrote:
> >>>>> Hi Zheng,
> >>>>>
> >>>>> It looks like the memory growth happens even with the simple messenger:
> >>>>>
> >>>>> [root@worker1032 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok
> >>>>> config get ms_type
> >>>>> {
> >>>>>       "ms_type": "simple"
> >>>>> }
> >>>>> [root@worker1032 ~]# ps -auxwww | grep ceph-fuse
> >>>>> root      179133 82.2 13.5 77281896 71644120 ?   Sl   12:48 258:09
> >>>>> ceph-fuse --id=admin --conf=/etc/ceph/ceph.conf /mnt/ceph -o
> >>>>> rw,fsname=ceph,dev,suid
> >>>>> [root@worker1032 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok
> >>>>> dump_mempools
> >>>>> {
> >>>>> ... snip ...
> >>>>>       "buffer_anon": {
> >>>>>           "items": 16753337,
> >>>>>           "bytes": 68782648777
> >>>>>       },
> >>>>>       "buffer_meta": {
> >>>>>           "items": 771,
> >>>>>           "bytes": 67848
> >>>>>       },
> >>>>> ... snip ...
> >>>>>       "osdmap": {
> >>>>>           "items": 28582,
> >>>>>           "bytes": 431840
> >>>>>       },
> >>>>> ... snip ...
> >>>>>
> >>>>>       "total": {
> >>>>>           "items": 16782690,
> >>>>>           "bytes": 68783148465
> >>>>>       }
> >>>>> }
> >>>>> Andras
> >>>>>
> >>>>>
> >>>>> On 9/6/18 11:58 PM, Yan, Zheng wrote:
> >>>>>> Could you please try make ceph-fuse use simple messenger (add "ms type
> >>>>>> = simple" to client section of ceph.conf).
> >>>>>>
> >>>>>> Regards
> >>>>>> Yan, Zheng
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Wed, Sep 5, 2018 at 10:09 PM Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >>>>>>> On Wed, 5 Sep 2018, Andras Pataki wrote:
> >>>>>>>> Hi cephers,
> >>>>>>>>
> >>>>>>>> Every so often we have a ceph-fuse process that grows to rather
> >>>>>>>> large size (up
> >>>>>>>> to eating up the whole memory of the machine).  Here is an example
> >>>>>>>> of a 200GB
> >>>>>>>> RSS size ceph-fuse instance:
> >>>>>>>>
> >>>>>>>> # ceph daemon /var/run/ceph/ceph-client.admin.asok dump_mempools
> >>>>>>>> {
> >>>>>>>>        "bloom_filter": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "bluestore_alloc": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "bluestore_cache_data": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "bluestore_cache_onode": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "bluestore_cache_other": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "bluestore_fsck": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "bluestore_txc": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "bluestore_writing_deferred": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "bluestore_writing": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "bluefs": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "buffer_anon": {
> >>>>>>>>            "items": 51534897,
> >>>>>>>>            "bytes": 207321872398
> >>>>>>>>        },
> >>>>>>>>        "buffer_meta": {
> >>>>>>>>            "items": 64,
> >>>>>>>>            "bytes": 5632
> >>>>>>>>        },
> >>>>>>>>        "osd": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "osd_mapbl": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "osd_pglog": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "osdmap": {
> >>>>>>>>            "items": 28593,
> >>>>>>>>            "bytes": 431872
> >>>>>>>>        },
> >>>>>>>>        "osdmap_mapping": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "pgmap": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "mds_co": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "unittest_1": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "unittest_2": {
> >>>>>>>>            "items": 0,
> >>>>>>>>            "bytes": 0
> >>>>>>>>        },
> >>>>>>>>        "total": {
> >>>>>>>>            "items": 51563554,
> >>>>>>>>            "bytes": 207322309902
> >>>>>>>>        }
> >>>>>>>> }
> >>>>>>>>
> >>>>>>>> The general cache size looks like this (if it is helpful I can put
> >>>>>>>> a whole
> >>>>>>>> cache dump somewhere):
> >>>>>>>>
> >>>>>>>> # ceph daemon /var/run/ceph/ceph-client.admin.asok dump_cache |
> >>>>>>>> grep path | wc
> >>>>>>>> -l
> >>>>>>>> 84085
> >>>>>>>> # ceph daemon /var/run/ceph/ceph-client.admin.asok dump_cache |
> >>>>>>>> grep name | wc
> >>>>>>>> -l
> >>>>>>>> 168186
> >>>>>>>>
> >>>>>>>> Any ideas what 'buffer_anon' is and what could be eating up the
> >>>>>>>> 200GB of
> >>>>>>>> RAM?
> >>>>>>> buffer_anon is memory consumed by the bufferlist class that hasn't been
> >>>>>>> explicitly put into a separate mempool category.  The question is
> >>>>>>> where/why are buffers getting pinned in memory.  Can you dump the
> >>>>>>> perfcounters?  That might give some hint.
> >>>>>>>
> >>>>>>> My guess is a leak, or a problem with the ObjectCacher code that is
> >>>>>>> preventing it from timming older buffers.
> >>>>>>>
> >>>>>>> How reproducible is the situation?  Any idea what workloads trigger it?
> >>>>>>>
> >>>>>>> Thanks!
> >>>>>>> sage
> >>>>>>>
> >>>>>>>> We are running with a few ceph-fuse specific parameters increased in
> >>>>>>>> ceph.conf:
> >>>>>>>>
> >>>>>>>>       # Description:  Set the number of inodes that the client keeps in
> >>>>>>>>       the metadata cache.
> >>>>>>>>       # Default:      16384
> >>>>>>>>       client_cache_size = 262144
> >>>>>>>>
> >>>>>>>>       # Description:  Set the maximum number of dirty bytes in the
> >>>>>>>> object
> >>>>>>>>       cache.
> >>>>>>>>       # Default:      104857600 (100MB)
> >>>>>>>>       client_oc_max_dirty = 536870912
> >>>>>>>>
> >>>>>>>>       # Description:  Set the maximum number of objects in the object
> >>>>>>>> cache.
> >>>>>>>>       # Default:      1000
> >>>>>>>>       client_oc_max_objects = 8192
> >>>>>>>>
> >>>>>>>>       # Description:  Set how many bytes of data will the client cache.
> >>>>>>>>       # Default:      209715200 (200 MB)
> >>>>>>>>       client_oc_size = 2147483640
> >>>>>>>>
> >>>>>>>>       # Description:  Set the maximum number of bytes that the kernel
> >>>>>>>>       reads ahead for future read operations. Overridden by the
> >>>>>>>>       client_readahead_max_periods setting.
> >>>>>>>>       # Default:      0 (unlimited)
> >>>>>>>>       #client_readahead_max_bytes = 67108864
> >>>>>>>>
> >>>>>>>>       # Description:  Set the number of file layout periods (object
> >>>>>>>> size *
> >>>>>>>>       number of stripes) that the kernel reads ahead. Overrides the
> >>>>>>>>       client_readahead_max_bytes setting.
> >>>>>>>>       # Default:      4
> >>>>>>>>       client_readahead_max_periods = 64
> >>>>>>>>
> >>>>>>>>       # Description:  Set the minimum number bytes that the kernel reads
> >>>>>>>>       ahead.
> >>>>>>>>       # Default:      131072 (128KB)
> >>>>>>>>       client_readahead_min = 4194304
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> We are running a 12.2.7 ceph cluster, and the cluster is otherwise
> >>>>>>>> healthy.
> >>>>>>>>
> >>>>>>>> Any hints would be appreciated.  Thanks,
> >>>>>>>>
> >>>>>>>> Andras
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>> ceph-users mailing list
> >>>>>>> ceph-users@xxxxxxxxxxxxxx
> >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com