Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

Özkan Göksu <ozkangksu@xxxxxxxxx> · Wed, 17 Jan 2024 07:44:00 +0300

Let me share some outputs about my cluster.

root@ud-01:~# ceph fs status
ud-data - 84 clients
=======
RANK  STATE           MDS              ACTIVITY     DNS    INOS   DIRS
CAPS
 0    active  ud-data.ud-02.xcoojt  Reqs:   31 /s  3022k  3021k  52.6k
385k
        POOL           TYPE     USED  AVAIL
cephfs.ud-data.meta  metadata   136G  44.4T
cephfs.ud-data.data    data    45.2T  44.4T
    STANDBY MDS
ud-data.ud-03.lhwkml
ud-data.ud-05.rnhcfe
ud-data.ud-01.uatjle
ud-data.ud-04.seggyv

--------------------------------------------------------------------------
This is "ceph tell mds.ud-data.ud-02.xcoojt session ls" output for the
reported client for cache pressure warning.

    {
        "id": 1282205,
        "entity": {
            "name": {
                "type": "client",
                "num": 1282205
            },
            "addr": {
                "type": "v1",
                "addr": "172.16.3.48:0",
                "nonce": 2169935642
            }
        },
        "state": "open",
        "num_leases": 0,
        "num_caps": 52092,
        "request_load_avg": 1,
        "uptime": 75754.745608647994,
        "requests_in_flight": 0,
        "num_completed_requests": 0,
        "num_completed_flushes": 1,
        "reconnecting": false,
        "recall_caps": {
            "value": 2577232.0049106553,
            "halflife": 60
        },
        "release_caps": {
            "value": 1.4093491463510395,
            "halflife": 60
        },
        "recall_caps_throttle": {
            "value": 63733.985544098425,
            "halflife": 1.5
        },
        "recall_caps_throttle2o": {
            "value": 19452.428409271757,
            "halflife": 0.5
        },
        "session_cache_liveness": {
            "value": 14.100272208890081,
            "halflife": 300
        },
        "cap_acquisition": {
            "value": 0,
            "halflife": 10
        },
        "delegated_inos": [
            {
                "start": "0x10004a1c031",
                "length": 282
            },
            {
                "start": "0x10004a1c33f",
                "length": 207
            },
            {
                "start": "0x10004a1cdda",
                "length": 6
            },
            {
                "start": "0x10004a3c12e",
                "length": 3
            },
            {
                "start": "0x1000f9831fe",
                "length": 2
            }
        ],
        "inst": "client.1282205 v1:172.16.3.48:0/2169935642",
        "completed_requests": [],
        "prealloc_inos": [
            {
                "start": "0x10004a1c031",
                "length": 282
            },
            {
                "start": "0x10004a1c33f",
                "length": 207
            },
            {
                "start": "0x10004a1cdda",
                "length": 6
            },
            {
                "start": "0x10004a3c12e",
                "length": 3
            },
            {
                "start": "0x1000f9831fe",
                "length": 2
            },
            {
                "start": "0x1000fa86e5f",
                "length": 54
            },
            {
                "start": "0x1000faa069c",
                "length": 501
            }
        ],
        "client_metadata": {
            "client_features": {
                "feature_bits": "0x0000000000007bff"
            },
            "metric_spec": {
                "metric_flags": {
                    "feature_bits": "0x00000000000003ff"
                }
            },
            "entity_id": "admin",
            "hostname": "bennevis-2",
            "kernel_version": "5.15.0-91-generic",
            "root": "/volumes/babblians"
        }
    }

Özkan Göksu <ozkangksu@xxxxxxxxx>, 17 Oca 2024 Çar, 07:22 tarihinde şunu
yazdı:

> Hello Eugen.
>
> Thank you for the answer.
> According to knowledge and test results at this issue:
> https://github.com/ceph/ceph/pull/38574
> I've tried their advice and I've applied the following changes.
>
> max_mds = 4
> standby_mds = 1
> mds_cache_memory_limit = 16GB
> mds_recall_max_caps = 40000
>
> When I set these parameters, 1 day later I saw this log:
> [8531248.982954] Out of memory: Killed process 1580586 (ceph-mds)
> total-vm:70577592kB, anon-rss:70244236kB, file-rss:0kB, shmem-rss:0kB,
> UID:167 pgtables:137832kB oom_score_adj:0
>
> All the MDS services created memory leak and killed by kernel.
> Because of this I changed it as below and it is stable now but performance
> is very poor and I still get cache pressure alerts.
>
> max_mds = 1
> standby_mds = 5
> mds_cache_memory_limit = 8GB
> mds_recall_max_caps = 30000
>
> I'm very surprised that you are advising to decrease "mds_recall_max_caps"
> because it is the opposite of what developers advised in the issue I've
> sended.
> It is very hard to play around with MDS parameters without expert level of
> understanding what these parameters stands for and how it will effect the
> behavior.
> Because of this I'm trying to understand the MDS code flow and I'm very
> interested with learning more and tuning my system by debugging and
> understanding my own data flow and MDS usage.
>
> I have a very unique data flow and I think I need to configure the system
> for this case.
> I have 80+ clients and via all of these clients my users are requesting
> Read a range of objects and compare them in GPU, they generate new data and
> Write the new data back in the cluster.
> So it means my clients usually reads objects only one time and do not read
> the same object again. Sometimes same user runs multiple service in
> multiple clients and these services can read the same data from different
> clients.
>
> So having a large cache is useless for my use case. I need to setup MDS
> and Cephfs Client for this data flow.
> When I debug the MDS ram usage, I see high allocation all the time and I
> wonder why? If any of my client does not read any object why MDS does not
> remove that data from ram allocation?
> I need to configure MDS for reading the data and removing it very fast if
> the data is constantly requested from clients. In this case ofc I want a
> ram cache tier.
>
> I'm little confused and I need to learn more about how MDS works and how
> should I make multiple active MDS faster for my subvolumes and client data
> flow.
>
> Best regards.
>
>
>
> Eugen Block <eblock@xxxxxx>, 16 Oca 2024 Sal, 11:36 tarihinde şunu yazdı:
>
>> Hi,
>>
>> I have dealt with this topic multiple times, the SUSE team helped
>> understanding what's going on under the hood. The summary can be found
>> in this thread [1].
>>
>> What helped in our case was to reduce the mds_recall_max_caps from 30k
>> (default) to 3k. We tried it in steps of 1k IIRC. So I suggest to
>> reduce that value step by step (maybe start with 20k or something) to
>> find the optimal value.
>>
>> Regards,
>> Eugen
>>
>> [1] https://www.spinics.net/lists/ceph-users/msg73188.html
>>
>> Zitat von Özkan Göksu <ozkangksu@xxxxxxxxx>:
>>
>> > Hello.
>> >
>> > I have 5 node ceph cluster and I'm constantly having "clients failing to
>> > respond to cache pressure" warning.
>> >
>> > I have 84 cephfs kernel clients (servers) and my users are accessing
>> their
>> > personal subvolumes  located on one pool.
>> >
>> > My users are software developers and the data is home and user data.
>> (Git,
>> > python projects, sample data and generated new data)
>> >
>> >
>> ---------------------------------------------------------------------------------
>> > --- RAW STORAGE ---
>> > CLASS     SIZE    AVAIL    USED  RAW USED  %RAW USED
>> > ssd    146 TiB  101 TiB  45 TiB    45 TiB      30.71
>> > TOTAL  146 TiB  101 TiB  45 TiB    45 TiB      30.71
>> >
>> > --- POOLS ---
>> > POOL                 ID   PGS   STORED  OBJECTS     USED  %USED  MAX
>> AVAIL
>> > .mgr                  1     1  356 MiB       90  1.0 GiB      0     30
>> TiB
>> > cephfs.ud-data.meta   9   256   69 GiB    3.09M  137 GiB   0.15     45
>> TiB
>> > cephfs.ud-data.data  10  2048   26 TiB  100.83M   44 TiB  32.97     45
>> TiB
>> >
>> ---------------------------------------------------------------------------------
>> > root@ud-01:~# ceph fs status
>> > ud-data - 84 clients
>> > =======
>> > RANK  STATE           MDS              ACTIVITY     DNS    INOS   DIRS
>> > CAPS
>> >  0    active  ud-data.ud-04.seggyv  Reqs:  142 /s  2844k  2798k   303k
>> > 720k
>> >         POOL           TYPE     USED  AVAIL
>> > cephfs.ud-data.meta  metadata   137G  44.9T
>> > cephfs.ud-data.data    data    44.2T  44.9T
>> >     STANDBY MDS
>> > ud-data.ud-02.xcoojt
>> > ud-data.ud-05.rnhcfe
>> > ud-data.ud-03.lhwkml
>> > ud-data.ud-01.uatjle
>> > MDS version: ceph version 17.2.6
>> (d7ff0d10654d2280e08f1ab989c7cdf3064446a5)
>> > quincy (stable)
>> >
>> >
>> -----------------------------------------------------------------------------------
>> > My MDS settings are below:
>> >
>> > mds_cache_memory_limit                | 8589934592
>> > mds_cache_trim_threshold              | 524288
>> > mds_recall_global_max_decay_threshold | 131072
>> > mds_recall_max_caps                       | 30000
>> > mds_recall_max_decay_rate             | 1.500000
>> > mds_recall_max_decay_threshold    | 131072
>> > mds_recall_warning_threshold          | 262144
>> >
>> >
>> > I have 2 questions:
>> > 1- What should I do to prevent cache pressue warning ?
>> > 2- What can I do to increase speed ?
>> >
>> > - Thanks
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx