Let me share some outputs about my cluster. root@ud-01:~# ceph fs status ud-data - 84 clients ======= RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 active ud-data.ud-02.xcoojt Reqs: 31 /s 3022k 3021k 52.6k 385k POOL TYPE USED AVAIL cephfs.ud-data.meta metadata 136G 44.4T cephfs.ud-data.data data 45.2T 44.4T STANDBY MDS ud-data.ud-03.lhwkml ud-data.ud-05.rnhcfe ud-data.ud-01.uatjle ud-data.ud-04.seggyv -------------------------------------------------------------------------- This is "ceph tell mds.ud-data.ud-02.xcoojt session ls" output for the reported client for cache pressure warning. { "id": 1282205, "entity": { "name": { "type": "client", "num": 1282205 }, "addr": { "type": "v1", "addr": "172.16.3.48:0", "nonce": 2169935642 } }, "state": "open", "num_leases": 0, "num_caps": 52092, "request_load_avg": 1, "uptime": 75754.745608647994, "requests_in_flight": 0, "num_completed_requests": 0, "num_completed_flushes": 1, "reconnecting": false, "recall_caps": { "value": 2577232.0049106553, "halflife": 60 }, "release_caps": { "value": 1.4093491463510395, "halflife": 60 }, "recall_caps_throttle": { "value": 63733.985544098425, "halflife": 1.5 }, "recall_caps_throttle2o": { "value": 19452.428409271757, "halflife": 0.5 }, "session_cache_liveness": { "value": 14.100272208890081, "halflife": 300 }, "cap_acquisition": { "value": 0, "halflife": 10 }, "delegated_inos": [ { "start": "0x10004a1c031", "length": 282 }, { "start": "0x10004a1c33f", "length": 207 }, { "start": "0x10004a1cdda", "length": 6 }, { "start": "0x10004a3c12e", "length": 3 }, { "start": "0x1000f9831fe", "length": 2 } ], "inst": "client.1282205 v1:172.16.3.48:0/2169935642", "completed_requests": [], "prealloc_inos": [ { "start": "0x10004a1c031", "length": 282 }, { "start": "0x10004a1c33f", "length": 207 }, { "start": "0x10004a1cdda", "length": 6 }, { "start": "0x10004a3c12e", "length": 3 }, { "start": "0x1000f9831fe", "length": 2 }, { "start": "0x1000fa86e5f", "length": 54 }, { "start": "0x1000faa069c", "length": 501 } ], "client_metadata": { "client_features": { "feature_bits": "0x0000000000007bff" }, "metric_spec": { "metric_flags": { "feature_bits": "0x00000000000003ff" } }, "entity_id": "admin", "hostname": "bennevis-2", "kernel_version": "5.15.0-91-generic", "root": "/volumes/babblians" } } Özkan Göksu <ozkangksu@xxxxxxxxx>, 17 Oca 2024 Çar, 07:22 tarihinde şunu yazdı: > Hello Eugen. > > Thank you for the answer. > According to knowledge and test results at this issue: > https://github.com/ceph/ceph/pull/38574 > I've tried their advice and I've applied the following changes. > > max_mds = 4 > standby_mds = 1 > mds_cache_memory_limit = 16GB > mds_recall_max_caps = 40000 > > When I set these parameters, 1 day later I saw this log: > [8531248.982954] Out of memory: Killed process 1580586 (ceph-mds) > total-vm:70577592kB, anon-rss:70244236kB, file-rss:0kB, shmem-rss:0kB, > UID:167 pgtables:137832kB oom_score_adj:0 > > All the MDS services created memory leak and killed by kernel. > Because of this I changed it as below and it is stable now but performance > is very poor and I still get cache pressure alerts. > > max_mds = 1 > standby_mds = 5 > mds_cache_memory_limit = 8GB > mds_recall_max_caps = 30000 > > I'm very surprised that you are advising to decrease "mds_recall_max_caps" > because it is the opposite of what developers advised in the issue I've > sended. > It is very hard to play around with MDS parameters without expert level of > understanding what these parameters stands for and how it will effect the > behavior. > Because of this I'm trying to understand the MDS code flow and I'm very > interested with learning more and tuning my system by debugging and > understanding my own data flow and MDS usage. > > I have a very unique data flow and I think I need to configure the system > for this case. > I have 80+ clients and via all of these clients my users are requesting > Read a range of objects and compare them in GPU, they generate new data and > Write the new data back in the cluster. > So it means my clients usually reads objects only one time and do not read > the same object again. Sometimes same user runs multiple service in > multiple clients and these services can read the same data from different > clients. > > So having a large cache is useless for my use case. I need to setup MDS > and Cephfs Client for this data flow. > When I debug the MDS ram usage, I see high allocation all the time and I > wonder why? If any of my client does not read any object why MDS does not > remove that data from ram allocation? > I need to configure MDS for reading the data and removing it very fast if > the data is constantly requested from clients. In this case ofc I want a > ram cache tier. > > I'm little confused and I need to learn more about how MDS works and how > should I make multiple active MDS faster for my subvolumes and client data > flow. > > Best regards. > > > > Eugen Block <eblock@xxxxxx>, 16 Oca 2024 Sal, 11:36 tarihinde şunu yazdı: > >> Hi, >> >> I have dealt with this topic multiple times, the SUSE team helped >> understanding what's going on under the hood. The summary can be found >> in this thread [1]. >> >> What helped in our case was to reduce the mds_recall_max_caps from 30k >> (default) to 3k. We tried it in steps of 1k IIRC. So I suggest to >> reduce that value step by step (maybe start with 20k or something) to >> find the optimal value. >> >> Regards, >> Eugen >> >> [1] https://www.spinics.net/lists/ceph-users/msg73188.html >> >> Zitat von Özkan Göksu <ozkangksu@xxxxxxxxx>: >> >> > Hello. >> > >> > I have 5 node ceph cluster and I'm constantly having "clients failing to >> > respond to cache pressure" warning. >> > >> > I have 84 cephfs kernel clients (servers) and my users are accessing >> their >> > personal subvolumes located on one pool. >> > >> > My users are software developers and the data is home and user data. >> (Git, >> > python projects, sample data and generated new data) >> > >> > >> --------------------------------------------------------------------------------- >> > --- RAW STORAGE --- >> > CLASS SIZE AVAIL USED RAW USED %RAW USED >> > ssd 146 TiB 101 TiB 45 TiB 45 TiB 30.71 >> > TOTAL 146 TiB 101 TiB 45 TiB 45 TiB 30.71 >> > >> > --- POOLS --- >> > POOL ID PGS STORED OBJECTS USED %USED MAX >> AVAIL >> > .mgr 1 1 356 MiB 90 1.0 GiB 0 30 >> TiB >> > cephfs.ud-data.meta 9 256 69 GiB 3.09M 137 GiB 0.15 45 >> TiB >> > cephfs.ud-data.data 10 2048 26 TiB 100.83M 44 TiB 32.97 45 >> TiB >> > >> --------------------------------------------------------------------------------- >> > root@ud-01:~# ceph fs status >> > ud-data - 84 clients >> > ======= >> > RANK STATE MDS ACTIVITY DNS INOS DIRS >> > CAPS >> > 0 active ud-data.ud-04.seggyv Reqs: 142 /s 2844k 2798k 303k >> > 720k >> > POOL TYPE USED AVAIL >> > cephfs.ud-data.meta metadata 137G 44.9T >> > cephfs.ud-data.data data 44.2T 44.9T >> > STANDBY MDS >> > ud-data.ud-02.xcoojt >> > ud-data.ud-05.rnhcfe >> > ud-data.ud-03.lhwkml >> > ud-data.ud-01.uatjle >> > MDS version: ceph version 17.2.6 >> (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) >> > quincy (stable) >> > >> > >> ----------------------------------------------------------------------------------- >> > My MDS settings are below: >> > >> > mds_cache_memory_limit | 8589934592 >> > mds_cache_trim_threshold | 524288 >> > mds_recall_global_max_decay_threshold | 131072 >> > mds_recall_max_caps | 30000 >> > mds_recall_max_decay_rate | 1.500000 >> > mds_recall_max_decay_threshold | 131072 >> > mds_recall_warning_threshold | 262144 >> > >> > >> > I have 2 questions: >> > 1- What should I do to prevent cache pressue warning ? >> > 2- What can I do to increase speed ? >> > >> > - Thanks >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@xxxxxxx >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >> >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx