Reducing RAM usage on production MDS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

The single active MDS on one of our Ceph clusters is close to running out of RAM.

MDS total system RAM = 528GB
MDS current free system RAM = 4GB
mds_cache_memory_limit = 451GB
current mds cache usage = 426GB

Presumably we need to reduce our mds_cache_memory_limit and/or mds_max_caps_per_client, but would like some guidance on whether it’s possible to do that safely on a live production cluster when the MDS is already pretty close to running out of RAM.

Cluster is Luminous - 12.2.12
Running single active MDS with two standby.
890 clients
Mix of kernel client (4.19.86) and ceph-fuse.
Clients are 12.2.12 (398) and 12.2.13 (3)

The kernel clients have stayed under “mds_max_caps_per_client”: “1048576". But the ceph-fuse clients appear to hold very large numbers according to the ceph-fuse asok.
e.g.
“num_caps”: 1007144398,
“num_caps”: 1150184586,
“num_caps”: 1502231153,
“num_caps”: 1714655840,
“num_caps”: 2022826512,

Dropping caches on the clients appears to reduce their cap usage but does not free up RAM on the MDS.
What is the safest method to free cache and reduce RAM usage on the MDS in this situation (without having to evict or remount clients)?
I’m concerned that reducing mds_cache_memory_limit even in very small increments may trigger a large recall of caps and overwhelm the MDS.
We also considered setting a reduced mds_cache_memory_limit on both the standby MDS. Would a subsequent failover to an MDS with a lower cache limit be safe?
Some more details below and I’d be more than happy to provide additional logs.

Thanks,
Dylan


# free -b
              total        used        free      shared  buff/cache   available
Mem:    540954992640 535268749312  4924698624   438284288   761544704  3893182464
Swap:             0           0           0

# ceph daemon mds.$(hostname -s) config get mds_cache_memory_limit
{
    "mds_cache_memory_limit": "450971566080"
}

# ceph daemon mds.$(hostname -s) cache status
{
    "pool": {
        "items": 10593257843,
        "bytes": 425176150288
    }
}

# ceph daemon mds.$(hostname -s) dump_mempools | grep -A2 "mds_co\|anon"
    "buffer_anon": {
        "items": 3935,
        "bytes": 4537932
--
    "mds_co": {
        "items": 10595391186,
        "bytes": 425255456209

# ceph daemon mds.$(hostname -s) perf dump | jq '.mds_mem.rss'
520100552

# ceph tell mds.$(hostname) heap stats
tcmalloc heap stats:------------------------------------------------
MALLOC:   496040753720 (473061.3 MiB) Bytes in use by application
MALLOC: +  11085479936 (10571.9 MiB) Bytes in page heap freelist
MALLOC: +  22568895888 (21523.4 MiB) Bytes in central cache freelist
MALLOC: +        31744 (    0.0 MiB) Bytes in transfer cache freelist
MALLOC: +     34186296 (   32.6 MiB) Bytes in thread cache freelists
MALLOC: +   2802057216 ( 2672.2 MiB) Bytes in malloc metadata
MALLOC:   ------------
MALLOC: = 532531404800 (507861.5 MiB) Actual memory used (physical + swap)
MALLOC: +   1315700736 ( 1254.8 MiB) Bytes released to OS (aka unmapped)
MALLOC:   ------------
MALLOC: = 533847105536 (509116.3 MiB) Virtual address space used
MALLOC:
MALLOC:       44496459              Spans in use
MALLOC:             22              Thread heaps in use
MALLOC:           8192              Tcmalloc page size
------------------------------------------------
Call ReleaseFreeMemory() to release freelist memory to the OS (via madvise()).
Bytes released to the OS take up virtual address space but no physical memory.


# ceph fs status
hpc_projects - 890 clients
============
+------+--------+----------------+---------------+-------+-------+
| Rank | State  |      MDS       |    Activity   |  dns  |  inos |
+------+--------+----------------+---------------+-------+-------+
|  0   | active | mds1-ceph2-qh2 | Reqs:  304 /s |  167M |  167M |
+------+--------+----------------+---------------+-------+-------+
+--------------------+----------+-------+-------+
|        Pool        |   type   |  used | avail |
+--------------------+----------+-------+-------+
|   hpcfs_metadata   | metadata | 17.4G | 1893G |
|     hpcfs_data     |   data   | 1014T |  379T |
|   test_nvmemeta    |   data   |    0  | 1893G |
| hpcfs_data_sandisk |   data   |  312T |  184T |
+--------------------+----------+-------+-------+

+----------------+
|  Standby MDS   |
+----------------+
| mds3-ceph2-qh2 |
| mds2-ceph2-qh2 |
+----------------+
MDS version: ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux