Re: ceph-fuse using excessive memory

Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> · Wed, 19 Sep 2018 18:06:02 -0400

Hi Zheng,

It looks like the memory growth happens even with the simple messenger:

[root@worker1032 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok 
config get ms_type
{
    "ms_type": "simple"
}
[root@worker1032 ~]# ps -auxwww | grep ceph-fuse
root      179133 82.2 13.5 77281896 71644120 ?   Sl   12:48 258:09 
ceph-fuse --id=admin --conf=/etc/ceph/ceph.conf /mnt/ceph -o 
rw,fsname=ceph,dev,suid
[root@worker1032 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok 
dump_mempools
{
... snip ...
    "buffer_anon": {
        "items": 16753337,
        "bytes": 68782648777
    },
    "buffer_meta": {
        "items": 771,
        "bytes": 67848
    },
... snip ...
    "osdmap": {
        "items": 28582,
        "bytes": 431840
    },
... snip ...

    "total": {
        "items": 16782690,
        "bytes": 68783148465
    }
}
Andras

On 9/6/18 11:58 PM, Yan, Zheng wrote:
Could you please try make ceph-fuse use simple messenger (add "ms type
= simple" to client section of ceph.conf).

Regards
Yan, Zheng

On Wed, Sep 5, 2018 at 10:09 PM Sage Weil <sage@xxxxxxxxxxxx> wrote:
On Wed, 5 Sep 2018, Andras Pataki wrote:
Hi cephers,

Every so often we have a ceph-fuse process that grows to rather large size (up
to eating up the whole memory of the machine).  Here is an example of a 200GB
RSS size ceph-fuse instance:

# ceph daemon /var/run/ceph/ceph-client.admin.asok dump_mempools
{
     "bloom_filter": {
         "items": 0,
         "bytes": 0
     },
     "bluestore_alloc": {
         "items": 0,
         "bytes": 0
     },
     "bluestore_cache_data": {
         "items": 0,
         "bytes": 0
     },
     "bluestore_cache_onode": {
         "items": 0,
         "bytes": 0
     },
     "bluestore_cache_other": {
         "items": 0,
         "bytes": 0
     },
     "bluestore_fsck": {
         "items": 0,
         "bytes": 0
     },
     "bluestore_txc": {
         "items": 0,
         "bytes": 0
     },
     "bluestore_writing_deferred": {
         "items": 0,
         "bytes": 0
     },
     "bluestore_writing": {
         "items": 0,
         "bytes": 0
     },
     "bluefs": {
         "items": 0,
         "bytes": 0
     },
     "buffer_anon": {
         "items": 51534897,
         "bytes": 207321872398
     },
     "buffer_meta": {
         "items": 64,
         "bytes": 5632
     },
     "osd": {
         "items": 0,
         "bytes": 0
     },
     "osd_mapbl": {
         "items": 0,
         "bytes": 0
     },
     "osd_pglog": {
         "items": 0,
         "bytes": 0
     },
     "osdmap": {
         "items": 28593,
         "bytes": 431872
     },
     "osdmap_mapping": {
         "items": 0,
         "bytes": 0
     },
     "pgmap": {
         "items": 0,
         "bytes": 0
     },
     "mds_co": {
         "items": 0,
         "bytes": 0
     },
     "unittest_1": {
         "items": 0,
         "bytes": 0
     },
     "unittest_2": {
         "items": 0,
         "bytes": 0
     },
     "total": {
         "items": 51563554,
         "bytes": 207322309902
     }
}

The general cache size looks like this (if it is helpful I can put a whole
cache dump somewhere):

# ceph daemon /var/run/ceph/ceph-client.admin.asok dump_cache | grep path | wc
-l
84085
# ceph daemon /var/run/ceph/ceph-client.admin.asok dump_cache | grep name | wc
-l
168186

Any ideas what 'buffer_anon' is and what could be eating up the 200GB of
RAM?
buffer_anon is memory consumed by the bufferlist class that hasn't been
explicitly put into a separate mempool category.  The question is
where/why are buffers getting pinned in memory.  Can you dump the
perfcounters?  That might give some hint.

My guess is a leak, or a problem with the ObjectCacher code that is
preventing it from timming older buffers.

How reproducible is the situation?  Any idea what workloads trigger it?

Thanks!
sage

We are running with a few ceph-fuse specific parameters increased in
ceph.conf:

    # Description:  Set the number of inodes that the client keeps in
    the metadata cache.
    # Default:      16384
    client_cache_size = 262144

    # Description:  Set the maximum number of dirty bytes in the object
    cache.
    # Default:      104857600 (100MB)
    client_oc_max_dirty = 536870912

    # Description:  Set the maximum number of objects in the object cache.
    # Default:      1000
    client_oc_max_objects = 8192

    # Description:  Set how many bytes of data will the client cache.
    # Default:      209715200 (200 MB)
    client_oc_size = 2147483640

    # Description:  Set the maximum number of bytes that the kernel
    reads ahead for future read operations. Overridden by the
    client_readahead_max_periods setting.
    # Default:      0 (unlimited)
    #client_readahead_max_bytes = 67108864

    # Description:  Set the number of file layout periods (object size *
    number of stripes) that the kernel reads ahead. Overrides the
    client_readahead_max_bytes setting.
    # Default:      4
    client_readahead_max_periods = 64

    # Description:  Set the minimum number bytes that the kernel reads
    ahead.
    # Default:      131072 (128KB)
    client_readahead_min = 4194304

We are running a 12.2.7 ceph cluster, and the cluster is otherwise healthy.

Any hints would be appreciated.  Thanks,

Andras

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com