Re: ceph-fuse using excessive memory

Andras Pataki <apataki@xxxxxxxxxxxxxxxxxxxxx> · Tue, 25 Sep 2018 13:51:01 -0400

Hi Zheng,

Here is a debug dump: 
https://users.flatironinstitute.org/apataki/public_www/7f0011f676112cd4/
I have also included some other corresponding information (cache dump, 
mempool dump, perf dump and ceph.conf).  This corresponds to a 100GB 
ceph-fuse process while the client code is running.  I can reproduce 
this issue at will in about 6 to 8 hours of running one of our 
scientific jobs - and I can also run a more instrumented/patched/etc. 
code to try.

Andras

On 9/24/18 10:06 PM, Yan, Zheng wrote:
On Tue, Sep 25, 2018 at 2:23 AM Andras Pataki
<apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
The whole cluster, including ceph-fuse is version 12.2.7.

If this issue happens again, please set "debug_objectcacher" option of
ceph-fuse to 15 (for 30 seconds) and set ceph-fuse log to us

Regards
Yan, Zheng

Andras

On 9/24/18 6:27 AM, Yan, Zheng wrote:
On Fri, Sep 21, 2018 at 5:40 AM Andras Pataki
<apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
I've done some more experiments playing with client config parameters,
and it seems like the the client_oc_size parameter is very correlated to
how big ceph-fuse grows.  With its default value of 200MB, ceph-fuse
gets to about 22GB of RSS, with our previous client_oc_size value of
2GB, the ceph-fuse process grows to 211GB. After this size is reached,
its memory usage levels out.  So it seems like there is an issue
accounting for memory for the client cache - whatever client_oc_size is
set to, about 100 times more memory gets used in our case at least.

ceph-fuse version ?

Andras

On 9/19/18 6:06 PM, Andras Pataki wrote:
Hi Zheng,

It looks like the memory growth happens even with the simple messenger:

[root@worker1032 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok
config get ms_type
{
      "ms_type": "simple"
}
[root@worker1032 ~]# ps -auxwww | grep ceph-fuse
root      179133 82.2 13.5 77281896 71644120 ?   Sl   12:48 258:09
ceph-fuse --id=admin --conf=/etc/ceph/ceph.conf /mnt/ceph -o
rw,fsname=ceph,dev,suid
[root@worker1032 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok
dump_mempools
{
... snip ...
      "buffer_anon": {
          "items": 16753337,
          "bytes": 68782648777
      },
      "buffer_meta": {
          "items": 771,
          "bytes": 67848
      },
... snip ...
      "osdmap": {
          "items": 28582,
          "bytes": 431840
      },
... snip ...

      "total": {
          "items": 16782690,
          "bytes": 68783148465
      }
}
Andras

On 9/6/18 11:58 PM, Yan, Zheng wrote:
Could you please try make ceph-fuse use simple messenger (add "ms type
= simple" to client section of ceph.conf).

Regards
Yan, Zheng

On Wed, Sep 5, 2018 at 10:09 PM Sage Weil <sage@xxxxxxxxxxxx> wrote:
On Wed, 5 Sep 2018, Andras Pataki wrote:
Hi cephers,

Every so often we have a ceph-fuse process that grows to rather
large size (up
to eating up the whole memory of the machine).  Here is an example
of a 200GB
RSS size ceph-fuse instance:

# ceph daemon /var/run/ceph/ceph-client.admin.asok dump_mempools
{
       "bloom_filter": {
           "items": 0,
           "bytes": 0
       },
       "bluestore_alloc": {
           "items": 0,
           "bytes": 0
       },
       "bluestore_cache_data": {
           "items": 0,
           "bytes": 0
       },
       "bluestore_cache_onode": {
           "items": 0,
           "bytes": 0
       },
       "bluestore_cache_other": {
           "items": 0,
           "bytes": 0
       },
       "bluestore_fsck": {
           "items": 0,
           "bytes": 0
       },
       "bluestore_txc": {
           "items": 0,
           "bytes": 0
       },
       "bluestore_writing_deferred": {
           "items": 0,
           "bytes": 0
       },
       "bluestore_writing": {
           "items": 0,
           "bytes": 0
       },
       "bluefs": {
           "items": 0,
           "bytes": 0
       },
       "buffer_anon": {
           "items": 51534897,
           "bytes": 207321872398
       },
       "buffer_meta": {
           "items": 64,
           "bytes": 5632
       },
       "osd": {
           "items": 0,
           "bytes": 0
       },
       "osd_mapbl": {
           "items": 0,
           "bytes": 0
       },
       "osd_pglog": {
           "items": 0,
           "bytes": 0
       },
       "osdmap": {
           "items": 28593,
           "bytes": 431872
       },
       "osdmap_mapping": {
           "items": 0,
           "bytes": 0
       },
       "pgmap": {
           "items": 0,
           "bytes": 0
       },
       "mds_co": {
           "items": 0,
           "bytes": 0
       },
       "unittest_1": {
           "items": 0,
           "bytes": 0
       },
       "unittest_2": {
           "items": 0,
           "bytes": 0
       },
       "total": {
           "items": 51563554,
           "bytes": 207322309902
       }
}

The general cache size looks like this (if it is helpful I can put
a whole
cache dump somewhere):

# ceph daemon /var/run/ceph/ceph-client.admin.asok dump_cache |
grep path | wc
-l
84085
# ceph daemon /var/run/ceph/ceph-client.admin.asok dump_cache |
grep name | wc
-l
168186

Any ideas what 'buffer_anon' is and what could be eating up the
200GB of
RAM?
buffer_anon is memory consumed by the bufferlist class that hasn't been
explicitly put into a separate mempool category.  The question is
where/why are buffers getting pinned in memory.  Can you dump the
perfcounters?  That might give some hint.

My guess is a leak, or a problem with the ObjectCacher code that is
preventing it from timming older buffers.

How reproducible is the situation?  Any idea what workloads trigger it?

Thanks!
sage

We are running with a few ceph-fuse specific parameters increased in
ceph.conf:

      # Description:  Set the number of inodes that the client keeps in
      the metadata cache.
      # Default:      16384
      client_cache_size = 262144

      # Description:  Set the maximum number of dirty bytes in the
object
      cache.
      # Default:      104857600 (100MB)
      client_oc_max_dirty = 536870912

      # Description:  Set the maximum number of objects in the object
cache.
      # Default:      1000
      client_oc_max_objects = 8192

      # Description:  Set how many bytes of data will the client cache.
      # Default:      209715200 (200 MB)
      client_oc_size = 2147483640

      # Description:  Set the maximum number of bytes that the kernel
      reads ahead for future read operations. Overridden by the
      client_readahead_max_periods setting.
      # Default:      0 (unlimited)
      #client_readahead_max_bytes = 67108864

      # Description:  Set the number of file layout periods (object
size *
      number of stripes) that the kernel reads ahead. Overrides the
      client_readahead_max_bytes setting.
      # Default:      4
      client_readahead_max_periods = 64

      # Description:  Set the minimum number bytes that the kernel reads
      ahead.
      # Default:      131072 (128KB)
      client_readahead_min = 4194304

We are running a 12.2.7 ceph cluster, and the cluster is otherwise
healthy.

Any hints would be appreciated.  Thanks,

Andras

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com