Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

Xiubo Li <xiubli@xxxxxxxxxx> · Wed, 17 Jan 2024 16:18:13 +0800

On 1/17/24 15:57, Eugen Block wrote:
Hi,

this is not an easy topic and there is no formula that can be applied 
to all clusters. From my experience, it is exactly how the discussion 
went in the thread you mentioned, trial & error.
Looking at your session ls output, this reminds of a debug session we 
had a few years ago:

        "recall_caps": {
            "value": 2577232.0049106553,
            "halflife": 60

It appears to be what we observed as well, and reducing 
mds_recall_max_caps helped us so the recall_caps value wouldn't pile up.

I'm not trying to solve cache pressure warning.

Your first mail proves the opposite ;-)

I have 2 questions:
1- What should I do to prevent cache pressue warning ?

I'm trying to increase the speed by creating multiple MDS even maybe 
binding subvolumes to specific MDS servers and decrease the latency.

And what is your result? Did it help? If you have multiple active MDS 
you might encounter a performance degredation if you don't use pinning 
because the mds balancers impact each other. We were successful with 
multi-active MDS and directory pinning.

Also when I check MDS CPU usage I see %120++ usage time to time. But 
when I check the server CPU load at MDS location, I see MDS only uses 
2-4 cores and other CPU cores are almost at idle.
I think MDS has a CPU core limitation and I need to increase the 
value to decrease the latency. How can I do that?

Yes, MDS is single-threaded. That's why we deployed multiple daemons 
per server (in the above mentioned cluster) and used multi-active MDS 
with pinning.

IMO also could disable the balancer by setting 'mds_bal_interval' to 0. 
Sometimes the balancer will introduce the cache pressure for 
multi-active MDS.

Thanks

- Xiubo

Regards,
Eugen

Zitat von Özkan Göksu <ozkangksu@xxxxxxxxx>:

Let me share some outputs about my cluster.

root@ud-01:~# ceph fs status
ud-data - 84 clients
=======
RANK  STATE           MDS              ACTIVITY     DNS INOS   DIRS
CAPS
 0    active  ud-data.ud-02.xcoojt  Reqs:   31 /s  3022k  3021k 52.6k
385k
        POOL           TYPE     USED  AVAIL
cephfs.ud-data.meta  metadata   136G  44.4T
cephfs.ud-data.data    data    45.2T  44.4T
    STANDBY MDS
ud-data.ud-03.lhwkml
ud-data.ud-05.rnhcfe
ud-data.ud-01.uatjle
ud-data.ud-04.seggyv

-------------------------------------------------------------------------- 

This is "ceph tell mds.ud-data.ud-02.xcoojt session ls" output for the
reported client for cache pressure warning.

    {
        "id": 1282205,
        "entity": {
            "name": {
                "type": "client",
                "num": 1282205
            },
            "addr": {
                "type": "v1",
                "addr": "172.16.3.48:0",
                "nonce": 2169935642
            }
        },
        "state": "open",
        "num_leases": 0,
        "num_caps": 52092,
        "request_load_avg": 1,
        "uptime": 75754.745608647994,
        "requests_in_flight": 0,
        "num_completed_requests": 0,
        "num_completed_flushes": 1,
        "reconnecting": false,
        "recall_caps": {
            "value": 2577232.0049106553,
            "halflife": 60
        },
        "release_caps": {
            "value": 1.4093491463510395,
            "halflife": 60
        },
        "recall_caps_throttle": {
            "value": 63733.985544098425,
            "halflife": 1.5
        },
        "recall_caps_throttle2o": {
            "value": 19452.428409271757,
            "halflife": 0.5
        },
        "session_cache_liveness": {
            "value": 14.100272208890081,
            "halflife": 300
        },
        "cap_acquisition": {
            "value": 0,
            "halflife": 10
        },
        "delegated_inos": [
            {
                "start": "0x10004a1c031",
                "length": 282
            },
            {
                "start": "0x10004a1c33f",
                "length": 207
            },
            {
                "start": "0x10004a1cdda",
                "length": 6
            },
            {
                "start": "0x10004a3c12e",
                "length": 3
            },
            {
                "start": "0x1000f9831fe",
                "length": 2
            }
        ],
        "inst": "client.1282205 v1:172.16.3.48:0/2169935642",
        "completed_requests": [],
        "prealloc_inos": [
            {
                "start": "0x10004a1c031",
                "length": 282
            },
            {
                "start": "0x10004a1c33f",
                "length": 207
            },
            {
                "start": "0x10004a1cdda",
                "length": 6
            },
            {
                "start": "0x10004a3c12e",
                "length": 3
            },
            {
                "start": "0x1000f9831fe",
                "length": 2
            },
            {
                "start": "0x1000fa86e5f",
                "length": 54
            },
            {
                "start": "0x1000faa069c",
                "length": 501
            }
        ],
        "client_metadata": {
            "client_features": {
                "feature_bits": "0x0000000000007bff"
            },
            "metric_spec": {
                "metric_flags": {
                    "feature_bits": "0x00000000000003ff"
                }
            },
            "entity_id": "admin",
            "hostname": "bennevis-2",
            "kernel_version": "5.15.0-91-generic",
            "root": "/volumes/babblians"
        }
    }

Özkan Göksu <ozkangksu@xxxxxxxxx>, 17 Oca 2024 Çar, 07:22 tarihinde şunu
yazdı:

Hello Eugen.

Thank you for the answer.
According to knowledge and test results at this issue:
https://github.com/ceph/ceph/pull/38574
I've tried their advice and I've applied the following changes.

max_mds = 4
standby_mds = 1
mds_cache_memory_limit = 16GB
mds_recall_max_caps = 40000

When I set these parameters, 1 day later I saw this log:
[8531248.982954] Out of memory: Killed process 1580586 (ceph-mds)
total-vm:70577592kB, anon-rss:70244236kB, file-rss:0kB, shmem-rss:0kB,
UID:167 pgtables:137832kB oom_score_adj:0

All the MDS services created memory leak and killed by kernel.
Because of this I changed it as below and it is stable now but 
performance
is very poor and I still get cache pressure alerts.

max_mds = 1
standby_mds = 5
mds_cache_memory_limit = 8GB
mds_recall_max_caps = 30000

I'm very surprised that you are advising to decrease 
"mds_recall_max_caps"
because it is the opposite of what developers advised in the issue I've
sended.
It is very hard to play around with MDS parameters without expert 
level of
understanding what these parameters stands for and how it will 
effect the
behavior.
Because of this I'm trying to understand the MDS code flow and I'm very
interested with learning more and tuning my system by debugging and
understanding my own data flow and MDS usage.

I have a very unique data flow and I think I need to configure the 
system
for this case.
I have 80+ clients and via all of these clients my users are requesting
Read a range of objects and compare them in GPU, they generate new 
data and
Write the new data back in the cluster.
So it means my clients usually reads objects only one time and do 
not read
the same object again. Sometimes same user runs multiple service in
multiple clients and these services can read the same data from 
different
clients.

So having a large cache is useless for my use case. I need to setup MDS
and Cephfs Client for this data flow.
When I debug the MDS ram usage, I see high allocation all the time 
and I
wonder why? If any of my client does not read any object why MDS 
does not
remove that data from ram allocation?
I need to configure MDS for reading the data and removing it very 
fast if
the data is constantly requested from clients. In this case ofc I 
want a
ram cache tier.

I'm little confused and I need to learn more about how MDS works and 
how
should I make multiple active MDS faster for my subvolumes and 
client data
flow.

Best regards.

Eugen Block <eblock@xxxxxx>, 16 Oca 2024 Sal, 11:36 tarihinde şunu 
yazdı:

Hi,

I have dealt with this topic multiple times, the SUSE team helped
understanding what's going on under the hood. The summary can be found
in this thread [1].

What helped in our case was to reduce the mds_recall_max_caps from 30k
(default) to 3k. We tried it in steps of 1k IIRC. So I suggest to
reduce that value step by step (maybe start with 20k or something) to
find the optimal value.

Regards,
Eugen

[1] https://www.spinics.net/lists/ceph-users/msg73188.html

Zitat von Özkan Göksu <ozkangksu@xxxxxxxxx>:

> Hello.
>
> I have 5 node ceph cluster and I'm constantly having "clients 
failing to
> respond to cache pressure" warning.
>
> I have 84 cephfs kernel clients (servers) and my users are accessing
their
> personal subvolumes  located on one pool.
>
> My users are software developers and the data is home and user data.
(Git,
> python projects, sample data and generated new data)
>
>
--------------------------------------------------------------------------------- 

> --- RAW STORAGE ---
> CLASS     SIZE    AVAIL    USED  RAW USED  %RAW USED
> ssd    146 TiB  101 TiB  45 TiB    45 TiB      30.71
> TOTAL  146 TiB  101 TiB  45 TiB    45 TiB      30.71
>
> --- POOLS ---
> POOL                 ID   PGS   STORED  OBJECTS USED  %USED  MAX
AVAIL
> .mgr                  1     1  356 MiB       90  1.0 GiB      
0     30
TiB
> cephfs.ud-data.meta   9   256   69 GiB    3.09M  137 GiB   
0.15     45
TiB
> cephfs.ud-data.data  10  2048   26 TiB  100.83M   44 TiB  
32.97     45
TiB
>
--------------------------------------------------------------------------------- 

> root@ud-01:~# ceph fs status
> ud-data - 84 clients
> =======
> RANK  STATE           MDS              ACTIVITY DNS    INOS   DIRS
> CAPS
>  0    active  ud-data.ud-04.seggyv  Reqs:  142 /s 2844k  2798k   
303k
> 720k
>         POOL           TYPE     USED  AVAIL
> cephfs.ud-data.meta  metadata   137G  44.9T
> cephfs.ud-data.data    data    44.2T  44.9T
>     STANDBY MDS
> ud-data.ud-02.xcoojt
> ud-data.ud-05.rnhcfe
> ud-data.ud-03.lhwkml
> ud-data.ud-01.uatjle
> MDS version: ceph version 17.2.6
(d7ff0d10654d2280e08f1ab989c7cdf3064446a5)
> quincy (stable)
>
>
----------------------------------------------------------------------------------- 

> My MDS settings are below:
>
> mds_cache_memory_limit                | 8589934592
> mds_cache_trim_threshold              | 524288
> mds_recall_global_max_decay_threshold | 131072
> mds_recall_max_caps                       | 30000
> mds_recall_max_decay_rate             | 1.500000
> mds_recall_max_decay_threshold    | 131072
> mds_recall_warning_threshold          | 262144
>
>
> I have 2 questions:
> 1- What should I do to prevent cache pressue warning ?
> 2- What can I do to increase speed ?
>
> - Thanks
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx