Hello everyone! How are you doing? I wasn't around for two years but I'm back and working on a new development. I deployed 2x ceph cluster: 1- user_data:5x node [8x4TB Sata SSD, 2x 25Gbit network], 2- data-gen: 3x node [8x4TB Sata SSD, 2x 25Gbit network], note: hardware is not my choice and I know I have TRIM issue and also I couldn't use any PCI-E nvme for wal+db because 1u servers and no empty slots --------------------- At test phase everything was good, I reached 1GB/s for 18 clients at the same time. But when I migrate to production (60 GPU server client + 40 CPU server client) the speed issue begin because of the default parameters as usual and now I'm working on adaptation by debugging current data work flow I have and I'm researching how can I improve my environment. So far, I couldn't find useful guide or informations in one place and I just wanted to share my findings, benchmarks and ideas with the community and if I'm lucky enough, maybe I will get awesome recommendations from some old friends and enjoy get in touch after a while. :) Starting from here, I will only share technical information about my environment: 1- Cluster user_data: 5x node [8x4TB Sata SSD, 2x 25Gbit network] = Replication 2 - A: I only have 1 pool in this cluster and information is below: - ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 146 TiB 106 TiB 40 TiB 40 TiB 27.50 TOTAL 146 TiB 106 TiB 40 TiB 40 TiB 27.50 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL .mgr 1 1 286 MiB 73 859 MiB 0 32 TiB cephfs.ud-data.meta 9 512 65 GiB 2.87M 131 GiB 0.13 48 TiB cephfs.ud-data.data 10 2048 23 TiB 95.34M 40 TiB 29.39 48 TiB - B: In this cluster, every user(50) has a subvolume and the quota is 1TB/for each users - C: In each subvolume, users has "home and data" directory. - D: home directory size 5-10GB and client uses it for docker home directory at each login - E: I'm also storing users personal or development data around 2TB/each user - F: I only have 1x active MDS server and 4x standby as below. - ceph fs status > ud-data - 84 clients > ======= > RANK STATE MDS ACTIVITY DNS INOS DIRS > CAPS > 0 active ud-data.ud-04.seggyv Reqs: 372 /s 4343k 4326k 69.7k > 2055k > POOL TYPE USED AVAIL > cephfs.ud-data.meta metadata 130G 47.5T > cephfs.ud-data.data data 39.5T 47.5T > STANDBY MDS > ud-data.ud-01.uatjle > ud-data.ud-02.xcoojt > ud-data.ud-05.rnhcfe > ud-data.ud-03.lhwkml > MDS version: ceph version 17.2.6 > (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) - What is my issue? 2023-12-15T21:07:47.175542+0000 mon.ud-01 [WRN] Health check failed: 1 clients failing to respond to cache pressure (MDS_CLIENT_RECALL) 2023-12-15T21:09:35.002112+0000 mon.ud-01 [INF] MDS health message cleared (mds.?): Client gpu-server-11 failing to respond to cache pressure 2023-12-15T21:09:35.391235+0000 mon.ud-01 [INF] Health check cleared: MDS_CLIENT_RECALL (was: 1 clients failing to respond to cache pressure) 2023-12-15T21:09:35.391304+0000 mon.ud-01 [INF] Cluster is now healthy 2023-12-15T21:10:00.000169+0000 mon.ud-01 [INF] overall HEALTH_OK For every read and write in client's trying to reach ceph MDS server and requests some data: issue 1: home data is around 5-10GB and users need all the time. I need to store it one time and prevent new requests. issue 2: users process generates new data by only reading some data one time and they write generated data one time. No need to cache this data at all. What I want to do ??? 1- I want to deploy 2x active MDS server for only "home" directory in each subvolume: - These 2x home MDS servers must send the data to client and cache in the client to reduce new requests even for simple "ls" command 2- I want to deploy 2x active MDS server for only "data" directory in each subvolume: - These 2x MDS servers must be configured to not hold any CACHE if it is not required constantly. The cache life time must be short and must be independent. - Constantly requested data from one client must be cached locally in that client to reduce requests and load on the MDS server. ------------------------------------------------------------ I believe you understand my data-flow and my needs. Let's talk what we can do about it. Note: I'm still researching and these are my finding and my plan so far. it is not completed, and this is the main reason why I'm writing this mail. ceph fs set $MYFS max_mds 4 mds_cache_memory_limit | default 4GiB --> 16GiB mds_cache_reservation | default 0.05 --> ?? mds_health_cache_threshold | default 1.5 --> ?? mds_cache_trim_threshold | default 256KiB --> ?? mds_cache_trim_decay_rate | default 1.0 --> ?? mds_cache_mid mds_decay_halflife mds_client_prealloc_inos mds_dirstat_min_interval mds_session_cache_liveness_magnitude mds_session_cache_liveness_decay_rate mds_max_caps_per_client mds_recall_max_caps mds_recall_max_decay_threshold mds_recall_max_decay_rate mds_recall_global_max_decay_threshold mds_session_cap_acquisition_throttle mds_session_cap_acquisition_decay_rate mds_session_max_caps_throttle_ratio mds_cap_acquisition_throttle_retry_request_timeout -Manually pinning directory trees to a particular rank As you can see, I'm at the beginning of this journey and I will be grateful if you can help me, share your knowledge, even I'm ready to help developers to use my system as a test bench to improve ceph as always! Best regards folks! - Özkan _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx