This is client side metrics from a "failing to respond to cache pressure" warned client. root@datagen-27:/sys/kernel/debug/ceph/e42fd4b0-313b-11ee-9a00-31da71873773.client1282187# cat bdi/stats BdiWriteback: 0 kB BdiReclaimable: 0 kB BdiDirtyThresh: 0 kB DirtyThresh: 35979376 kB BackgroundThresh: 17967720 kB BdiDirtied: 3071616 kB BdiWritten: 3036864 kB BdiWriteBandwidth: 20 kBps b_dirty: 0 b_io: 0 b_more_io: 0 b_dirty_time: 0 bdi_list: 1 state: 1 ------------------------------------------------ root@d27:/sys/kernel/debug/ceph/e42fd4b0-313b-11ee-9a00-31da71873773.client1282187# cat metrics item total ------------------------------------------ opened files / total inodes 4 / 14129 pinned i_caps / total inodes 14129 / 14129 opened inodes / total inodes 2 / 14129 item total avg_lat(us) min_lat(us) max_lat(us) stdev(us) ----------------------------------------------------------------------------------- read 1218753 3116 208 8741271 2154 write 34945 24003 3017 2191493 16156 metadata 1703642 8395 127 17936115 1497 item total avg_sz(bytes) min_sz(bytes) max_sz(bytes) total_sz(bytes) ---------------------------------------------------------------------------------------- read 1218753 227009 1 4194304 276668475618 write 34945 85860 1 4194304 3000382055 item total miss hit ------------------------------------------------- d_lease 306 19110 3317071969 caps 14129 145404 3761682333 Özkan Göksu <ozkangksu@xxxxxxxxx>, 25 Oca 2024 Per, 20:25 tarihinde şunu yazdı: > Every user has a 1x subvolume and I only have 1 pool. > At the beginning we were using each subvolume for ldap home directory + > user data. > When a user logins any docker on any host, it was using the cluster for > home and the for user related data, we was have second directory in the > same subvolume. > Time to time users were feeling a very slow home environment and after a > month it became almost impossible to use home. VNC sessions became > unresponsive and slow etc. > > 2 weeks ago, I had to migrate home to a ZFS storage and now the overall > performance is better for only user_data without home. > But still the performance is not good enough as I expected because of the > problems related to MDS. > The usage is low but allocation is high and Cpu usage is high. You saw the > IO Op/s, it's nothing but allocation is high. > > I develop a fio benchmark script and I run the script on 4x test server at > the same time, the results are below: > Script: > https://github.com/ozkangoksu/benchmark/blob/8f5df87997864c25ef32447e02fcd41fda0d2a67/iobench.sh > > > https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-01.txt > > https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-02.txt > > https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-03.txt > > https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-04.txt > > While running benchmark, I take sample values for each type of iobench run. > > Seq Write benchmarking: size=1G,direct=1,numjobs=3,iodepth=32 > client: 70 MiB/s rd, 762 MiB/s wr, 337 op/s rd, 24.41k op/s wr > client: 60 MiB/s rd, 551 MiB/s wr, 303 op/s rd, 35.12k op/s wr > client: 13 MiB/s rd, 161 MiB/s wr, 101 op/s rd, 41.30k op/s wr > > Seq Read benchmarking: size=1G,direct=1,numjobs=3,iodepth=32 > client: 1.6 GiB/s rd, 219 KiB/s wr, 28.76k op/s rd, 89 op/s wr > client: 370 MiB/s rd, 475 KiB/s wr, 90.38k op/s rd, 89 op/s wr > > Rand Write benchmarking: size=1G,direct=1,numjobs=3,iodepth=32 > client: 63 MiB/s rd, 1.5 GiB/s wr, 8.77k op/s rd, 5.50k op/s wr > client: 14 MiB/s rd, 1.8 GiB/s wr, 81 op/s rd, 13.86k op/s wr > client: 6.6 MiB/s rd, 1.2 GiB/s wr, 61 op/s rd, 30.13k op/s wr > > Rand Read benchmarking: size=1G,direct=1,numjobs=3,iodepth=32 > client: 317 MiB/s rd, 841 MiB/s wr, 426 op/s rd, 10.98k op/s wr > client: 2.8 GiB/s rd, 882 MiB/s wr, 25.68k op/s rd, 291 op/s wr > client: 4.0 GiB/s rd, 226 MiB/s wr, 89.63k op/s rd, 124 op/s wr > client: 2.4 GiB/s rd, 295 KiB/s wr, 197.86k op/s rd, 20 op/s wr > > It seems I only have problems with the 4K,8K,16K other sector sizes. > > > > > Eugen Block <eblock@xxxxxx>, 25 Oca 2024 Per, 19:06 tarihinde şunu yazdı: > >> I understand that your MDS shows a high CPU usage, but other than that >> what is your performance issue? Do users complain? Do some operations >> take longer than expected? Are OSDs saturated during those phases? >> Because the cache pressure messages don’t necessarily mean that users >> will notice. >> MDS daemons are single-threaded so that might be a bottleneck. In that >> case multi-active mds might help, which you already tried and >> experienced OOM killers. But you might have to disable the mds >> balancer as someone else mentioned. And then you could think about >> pinning, is it possible to split the CephFS into multiple >> subdirectories and pin them to different ranks? >> But first I’d still like to know what the performance issue really is. >> >> Zitat von Özkan Göksu <ozkangksu@xxxxxxxxx>: >> >> > I will try my best to explain my situation. >> > >> > I don't have a separate mds server. I have 5 identical nodes, 3 of them >> > mons, and I use the other 2 as active and standby mds. (currently I have >> > left overs from max_mds 4) >> > >> > root@ud-01:~# ceph -s >> > cluster: >> > id: e42fd4b0-313b-11ee-9a00-31da71873773 >> > health: HEALTH_WARN >> > 1 clients failing to respond to cache pressure >> > >> > services: >> > mon: 3 daemons, quorum ud-01,ud-02,ud-03 (age 9d) >> > mgr: ud-01.qycnol(active, since 8d), standbys: ud-02.tfhqfd >> > mds: 1/1 daemons up, 4 standby >> > osd: 80 osds: 80 up (since 9d), 80 in (since 5M) >> > >> > data: >> > volumes: 1/1 healthy >> > pools: 3 pools, 2305 pgs >> > objects: 106.58M objects, 25 TiB >> > usage: 45 TiB used, 101 TiB / 146 TiB avail >> > pgs: 2303 active+clean >> > 2 active+clean+scrubbing+deep >> > >> > io: >> > client: 16 MiB/s rd, 3.4 MiB/s wr, 77 op/s rd, 23 op/s wr >> > >> > ------------------------------ >> > root@ud-01:~# ceph fs status >> > ud-data - 84 clients >> > ======= >> > RANK STATE MDS ACTIVITY DNS INOS DIRS >> > CAPS >> > 0 active ud-data.ud-02.xcoojt Reqs: 40 /s 2579k 2578k 169k >> > 3048k >> > POOL TYPE USED AVAIL >> > cephfs.ud-data.meta metadata 136G 44.9T >> > cephfs.ud-data.data data 44.3T 44.9T >> > >> > ------------------------------ >> > root@ud-01:~# ceph health detail >> > HEALTH_WARN 1 clients failing to respond to cache pressure >> > [WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure >> > mds.ud-data.ud-02.xcoojt(mds.0): Client bmw-m4 failing to respond to >> > cache pressure client_id: 1275577 >> > >> > ------------------------------ >> > When I check the failing client with session ls I see only "num_caps: >> 12298" >> > >> > ceph tell mds.ud-data.ud-02.xcoojt session ls | jq -r '.[] | "clientid: >> > \(.id)= num_caps: \(.num_caps), num_leases: \(.num_leases), >> > request_load_avg: \(.request_load_avg), num_completed_requests: >> > \(.num_completed_requests), num_completed_flushes: >> > \(.num_completed_flushes)"' | sort -n -t: -k3 >> > >> > clientid: 1275577= num_caps: 12298, num_leases: 0, request_load_avg: 0, >> > num_completed_requests: 0, num_completed_flushes: 1 >> > clientid: 1294542= num_caps: 13000, num_leases: 12, request_load_avg: >> 105, >> > num_completed_requests: 0, num_completed_flushes: 6 >> > clientid: 1282187= num_caps: 16869, num_leases: 1, request_load_avg: 0, >> > num_completed_requests: 0, num_completed_flushes: 1 >> > clientid: 1275589= num_caps: 18943, num_leases: 0, request_load_avg: 52, >> > num_completed_requests: 0, num_completed_flushes: 1 >> > clientid: 1282154= num_caps: 24747, num_leases: 1, request_load_avg: 57, >> > num_completed_requests: 2, num_completed_flushes: 2 >> > clientid: 1275553= num_caps: 25120, num_leases: 2, request_load_avg: >> 116, >> > num_completed_requests: 2, num_completed_flushes: 8 >> > clientid: 1282142= num_caps: 27185, num_leases: 6, request_load_avg: >> 128, >> > num_completed_requests: 0, num_completed_flushes: 8 >> > clientid: 1275535= num_caps: 40364, num_leases: 6, request_load_avg: >> 111, >> > num_completed_requests: 2, num_completed_flushes: 8 >> > clientid: 1282130= num_caps: 41483, num_leases: 0, request_load_avg: >> 135, >> > num_completed_requests: 0, num_completed_flushes: 1 >> > clientid: 1275547= num_caps: 42953, num_leases: 4, request_load_avg: >> 119, >> > num_completed_requests: 2, num_completed_flushes: 6 >> > clientid: 1282139= num_caps: 45435, num_leases: 27, request_load_avg: >> 84, >> > num_completed_requests: 2, num_completed_flushes: 34 >> > clientid: 1282136= num_caps: 48374, num_leases: 8, request_load_avg: 0, >> > num_completed_requests: 1, num_completed_flushes: 1 >> > clientid: 1275532= num_caps: 48664, num_leases: 7, request_load_avg: >> 115, >> > num_completed_requests: 2, num_completed_flushes: 8 >> > clientid: 1191789= num_caps: 130319, num_leases: 0, request_load_avg: >> 1753, >> > num_completed_requests: 0, num_completed_flushes: 0 >> > clientid: 1275571= num_caps: 139488, num_leases: 0, request_load_avg: 2, >> > num_completed_requests: 0, num_completed_flushes: 1 >> > clientid: 1282133= num_caps: 145487, num_leases: 0, request_load_avg: 8, >> > num_completed_requests: 1, num_completed_flushes: 1 >> > clientid: 1534496= num_caps: 1041316, num_leases: 0, request_load_avg: >> 0, >> > num_completed_requests: 0, num_completed_flushes: 1 >> > >> > ------------------------------ >> > When I check the dashboard/service/mds I see %120+ CPU usage on active >> MDS >> > but on the host everything is almost idle and disk waits are very low. >> > >> > avg-cpu: %user %nice %system %iowait %steal %idle >> > 0.61 0.00 0.38 0.41 0.00 98.60 >> > >> > Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s >> > wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s >> %drqm >> > d_await dareq-sz f/s f_await aqu-sz %util >> > sdc 2.00 0.01 0.00 0.00 0.50 6.00 20.00 >> > 0.04 0.00 0.00 0.50 2.00 0.00 0.00 0.00 >> 0.00 >> > 0.00 0.00 10.00 0.60 0.02 1.20 >> > sdd 3.00 0.02 0.00 0.00 0.67 8.00 285.00 >> > 1.84 77.00 21.27 0.44 6.61 0.00 0.00 0.00 >> 0.00 >> > 0.00 0.00 114.00 0.83 0.22 22.40 >> > sde 1.00 0.01 0.00 0.00 1.00 8.00 36.00 >> > 0.08 3.00 7.69 0.64 2.33 0.00 0.00 0.00 >> 0.00 >> > 0.00 0.00 18.00 0.67 0.04 1.60 >> > sdf 5.00 0.04 0.00 0.00 0.40 7.20 40.00 >> > 0.09 3.00 6.98 0.53 2.30 0.00 0.00 0.00 >> 0.00 >> > 0.00 0.00 20.00 0.70 0.04 2.00 >> > sdg 11.00 0.08 0.00 0.00 0.73 7.27 36.00 >> > 0.09 4.00 10.00 0.50 2.44 0.00 0.00 0.00 >> 0.00 >> > 0.00 0.00 18.00 0.72 0.04 3.20 >> > sdh 5.00 0.03 0.00 0.00 0.60 5.60 46.00 >> > 0.10 2.00 4.17 0.59 2.17 0.00 0.00 0.00 >> 0.00 >> > 0.00 0.00 23.00 0.83 0.05 2.80 >> > sdi 7.00 0.04 0.00 0.00 0.43 6.29 36.00 >> > 0.07 1.00 2.70 0.47 2.11 0.00 0.00 0.00 >> 0.00 >> > 0.00 0.00 18.00 0.61 0.03 2.40 >> > sdj 5.00 0.04 0.00 0.00 0.80 7.20 42.00 >> > 0.09 1.00 2.33 0.67 2.10 0.00 0.00 0.00 >> 0.00 >> > 0.00 0.00 21.00 0.81 0.05 3.20 >> > >> > ------------------------------ >> > Other than this 5x node cluster, I also have a 3x node cluster with >> > identical hardware but it serves for a different purpose and data >> workload. >> > In this cluster I don't have any problem and MDS default settings seems >> > enough. >> > The only difference between two cluster is, 5x node cluster used >> directly >> > by users, 3x node cluster used heavily to read and write data via >> projects >> > not by users. So allocate and de-allocate will be better. >> > >> > I guess I just have a problematic use case on the 5x node cluster and >> as I >> > mentioned above, I might have the similar problem but I don't know how >> to >> > debug it. >> > >> > >> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YO4SGL4DJQ6EKUBUIHKTFSW72ZJ3XLZS/ >> > quote:"A user running VSCodium, keeping 15k caps open.. the >> opportunistic >> > caps recall eventually starts recalling those but the (el7 kernel) >> client >> > won't release them. Stopping Codium seems to be the only way to >> release." >> > >> > ------------------------------ >> > Before reading the osd df you should know that I created 2x >> > OSD/per"CT4000MX500SSD1" >> > # ceph osd df tree >> > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP >> META >> > AVAIL %USE VAR PGS STATUS TYPE NAME >> > -1 145.54321 - 146 TiB 45 TiB 44 TiB 119 GiB >> 333 >> > GiB 101 TiB 30.81 1.00 - root default >> > -3 29.10864 - 29 TiB 8.9 TiB 8.8 TiB 25 GiB >> 66 >> > GiB 20 TiB 30.54 0.99 - host ud-01 >> > 0 ssd 1.81929 1.00000 1.8 TiB 616 GiB 610 GiB 1.4 GiB >> 4.5 >> > GiB 1.2 TiB 33.04 1.07 61 up osd.0 >> > 1 ssd 1.81929 1.00000 1.8 TiB 527 GiB 521 GiB 1.5 GiB >> 4.0 >> > GiB 1.3 TiB 28.28 0.92 53 up osd.1 >> > 2 ssd 1.81929 1.00000 1.8 TiB 595 GiB 589 GiB 2.3 GiB >> 4.0 >> > GiB 1.2 TiB 31.96 1.04 63 up osd.2 >> > 3 ssd 1.81929 1.00000 1.8 TiB 527 GiB 521 GiB 1.8 GiB >> 4.2 >> > GiB 1.3 TiB 28.30 0.92 55 up osd.3 >> > 4 ssd 1.81929 1.00000 1.8 TiB 525 GiB 520 GiB 1.3 GiB >> 3.9 >> > GiB 1.3 TiB 28.21 0.92 52 up osd.4 >> > 5 ssd 1.81929 1.00000 1.8 TiB 592 GiB 586 GiB 1.8 GiB >> 3.8 >> > GiB 1.2 TiB 31.76 1.03 61 up osd.5 >> > 6 ssd 1.81929 1.00000 1.8 TiB 559 GiB 553 GiB 1.8 GiB >> 4.3 >> > GiB 1.3 TiB 30.03 0.97 57 up osd.6 >> > 7 ssd 1.81929 1.00000 1.8 TiB 602 GiB 597 GiB 836 MiB >> 4.4 >> > GiB 1.2 TiB 32.32 1.05 58 up osd.7 >> > 8 ssd 1.81929 1.00000 1.8 TiB 614 GiB 609 GiB 1.2 GiB >> 4.5 >> > GiB 1.2 TiB 32.98 1.07 60 up osd.8 >> > 9 ssd 1.81929 1.00000 1.8 TiB 571 GiB 565 GiB 2.2 GiB >> 4.2 >> > GiB 1.3 TiB 30.67 1.00 61 up osd.9 >> > 10 ssd 1.81929 1.00000 1.8 TiB 528 GiB 522 GiB 1.3 GiB >> 4.1 >> > GiB 1.3 TiB 28.33 0.92 52 up osd.10 >> > 11 ssd 1.81929 1.00000 1.8 TiB 551 GiB 546 GiB 1.5 GiB >> 3.6 >> > GiB 1.3 TiB 29.57 0.96 56 up osd.11 >> > 12 ssd 1.81929 1.00000 1.8 TiB 594 GiB 588 GiB 1.8 GiB >> 4.4 >> > GiB 1.2 TiB 31.91 1.04 61 up osd.12 >> > 13 ssd 1.81929 1.00000 1.8 TiB 561 GiB 555 GiB 1.1 GiB >> 4.3 >> > GiB 1.3 TiB 30.10 0.98 55 up osd.13 >> > 14 ssd 1.81929 1.00000 1.8 TiB 616 GiB 609 GiB 1.9 GiB >> 4.2 >> > GiB 1.2 TiB 33.04 1.07 64 up osd.14 >> > 15 ssd 1.81929 1.00000 1.8 TiB 525 GiB 520 GiB 1.1 GiB >> 4.0 >> > GiB 1.3 TiB 28.20 0.92 51 up osd.15 >> > -5 29.10864 - 29 TiB 9.0 TiB 8.9 TiB 22 GiB >> 67 >> > GiB 20 TiB 30.89 1.00 - host ud-02 >> > 16 ssd 1.81929 1.00000 1.8 TiB 617 GiB 611 GiB 1.7 GiB >> 4.7 >> > GiB 1.2 TiB 33.12 1.08 63 up osd.16 >> > 17 ssd 1.81929 1.00000 1.8 TiB 582 GiB 577 GiB 1.6 GiB >> 4.0 >> > GiB 1.3 TiB 31.26 1.01 59 up osd.17 >> > 18 ssd 1.81929 1.00000 1.8 TiB 583 GiB 578 GiB 418 MiB >> 4.0 >> > GiB 1.3 TiB 31.29 1.02 54 up osd.18 >> > 19 ssd 1.81929 1.00000 1.8 TiB 550 GiB 544 GiB 1.5 GiB >> 4.0 >> > GiB 1.3 TiB 29.50 0.96 56 up osd.19 >> > 20 ssd 1.81929 1.00000 1.8 TiB 551 GiB 546 GiB 1.1 GiB >> 4.1 >> > GiB 1.3 TiB 29.57 0.96 54 up osd.20 >> > 21 ssd 1.81929 1.00000 1.8 TiB 616 GiB 610 GiB 1.3 GiB >> 4.4 >> > GiB 1.2 TiB 33.04 1.07 60 up osd.21 >> > 22 ssd 1.81929 1.00000 1.8 TiB 573 GiB 567 GiB 1.6 GiB >> 4.1 >> > GiB 1.3 TiB 30.75 1.00 58 up osd.22 >> > 23 ssd 1.81929 1.00000 1.8 TiB 616 GiB 610 GiB 1.3 GiB >> 4.3 >> > GiB 1.2 TiB 33.06 1.07 60 up osd.23 >> > 24 ssd 1.81929 1.00000 1.8 TiB 539 GiB 534 GiB 844 MiB >> 3.8 >> > GiB 1.3 TiB 28.92 0.94 51 up osd.24 >> > 25 ssd 1.81929 1.00000 1.8 TiB 583 GiB 576 GiB 2.1 GiB >> 4.1 >> > GiB 1.3 TiB 31.27 1.02 61 up osd.25 >> > 26 ssd 1.81929 1.00000 1.8 TiB 617 GiB 611 GiB 1.3 GiB >> 4.6 >> > GiB 1.2 TiB 33.12 1.08 61 up osd.26 >> > 27 ssd 1.81929 1.00000 1.8 TiB 537 GiB 532 GiB 1.2 GiB >> 4.1 >> > GiB 1.3 TiB 28.84 0.94 53 up osd.27 >> > 28 ssd 1.81929 1.00000 1.8 TiB 527 GiB 522 GiB 1.3 GiB >> 4.2 >> > GiB 1.3 TiB 28.29 0.92 53 up osd.28 >> > 29 ssd 1.81929 1.00000 1.8 TiB 594 GiB 588 GiB 1.5 GiB >> 4.6 >> > GiB 1.2 TiB 31.91 1.04 59 up osd.29 >> > 30 ssd 1.81929 1.00000 1.8 TiB 528 GiB 523 GiB 1.4 GiB >> 4.1 >> > GiB 1.3 TiB 28.35 0.92 53 up osd.30 >> > 31 ssd 1.81929 1.00000 1.8 TiB 594 GiB 589 GiB 1.6 GiB >> 3.8 >> > GiB 1.2 TiB 31.89 1.03 61 up osd.31 >> > -7 29.10864 - 29 TiB 8.9 TiB 8.8 TiB 23 GiB >> 67 >> > GiB 20 TiB 30.66 1.00 - host ud-03 >> > 32 ssd 1.81929 1.00000 1.8 TiB 593 GiB 588 GiB 1.1 GiB >> 4.3 >> > GiB 1.2 TiB 31.84 1.03 57 up osd.32 >> > 33 ssd 1.81929 1.00000 1.8 TiB 617 GiB 611 GiB 1.8 GiB >> 4.4 >> > GiB 1.2 TiB 33.13 1.08 63 up osd.33 >> > 34 ssd 1.81929 1.00000 1.8 TiB 537 GiB 532 GiB 2.0 GiB >> 3.8 >> > GiB 1.3 TiB 28.84 0.94 59 up osd.34 >> > 35 ssd 1.81929 1.00000 1.8 TiB 562 GiB 556 GiB 1.7 GiB >> 4.2 >> > GiB 1.3 TiB 30.16 0.98 58 up osd.35 >> > 36 ssd 1.81929 1.00000 1.8 TiB 529 GiB 523 GiB 1.3 GiB >> 3.9 >> > GiB 1.3 TiB 28.38 0.92 52 up osd.36 >> > 37 ssd 1.81929 1.00000 1.8 TiB 527 GiB 521 GiB 1.7 GiB >> 4.2 >> > GiB 1.3 TiB 28.28 0.92 55 up osd.37 >> > 38 ssd 1.81929 1.00000 1.8 TiB 574 GiB 568 GiB 1.2 GiB >> 4.3 >> > GiB 1.3 TiB 30.79 1.00 55 up osd.38 >> > 39 ssd 1.81929 1.00000 1.8 TiB 605 GiB 599 GiB 1.6 GiB >> 4.2 >> > GiB 1.2 TiB 32.48 1.05 61 up osd.39 >> > 40 ssd 1.81929 1.00000 1.8 TiB 573 GiB 567 GiB 1.2 GiB >> 4.4 >> > GiB 1.3 TiB 30.76 1.00 56 up osd.40 >> > 41 ssd 1.81929 1.00000 1.8 TiB 526 GiB 520 GiB 1.7 GiB >> 3.9 >> > GiB 1.3 TiB 28.21 0.92 54 up osd.41 >> > 42 ssd 1.81929 1.00000 1.8 TiB 613 GiB 608 GiB 1010 MiB >> 4.4 >> > GiB 1.2 TiB 32.91 1.07 58 up osd.42 >> > 43 ssd 1.81929 1.00000 1.8 TiB 606 GiB 600 GiB 1.7 GiB >> 4.3 >> > GiB 1.2 TiB 32.51 1.06 61 up osd.43 >> > 44 ssd 1.81929 1.00000 1.8 TiB 583 GiB 577 GiB 1.6 GiB >> 4.2 >> > GiB 1.3 TiB 31.29 1.02 60 up osd.44 >> > 45 ssd 1.81929 1.00000 1.8 TiB 618 GiB 613 GiB 1.4 GiB >> 4.3 >> > GiB 1.2 TiB 33.18 1.08 62 up osd.45 >> > 46 ssd 1.81929 1.00000 1.8 TiB 550 GiB 544 GiB 1.5 GiB >> 4.2 >> > GiB 1.3 TiB 29.50 0.96 54 up osd.46 >> > 47 ssd 1.81929 1.00000 1.8 TiB 526 GiB 522 GiB 692 MiB >> 3.7 >> > GiB 1.3 TiB 28.25 0.92 50 up osd.47 >> > -9 29.10864 - 29 TiB 9.0 TiB 8.9 TiB 26 GiB >> 68 >> > GiB 20 TiB 31.04 1.01 - host ud-04 >> > 48 ssd 1.81929 1.00000 1.8 TiB 540 GiB 534 GiB 2.2 GiB >> 3.6 >> > GiB 1.3 TiB 28.96 0.94 58 up osd.48 >> > 49 ssd 1.81929 1.00000 1.8 TiB 617 GiB 611 GiB 1.4 GiB >> 4.5 >> > GiB 1.2 TiB 33.11 1.07 61 up osd.49 >> > 50 ssd 1.81929 1.00000 1.8 TiB 618 GiB 612 GiB 1.2 GiB >> 4.8 >> > GiB 1.2 TiB 33.17 1.08 61 up osd.50 >> > 51 ssd 1.81929 1.00000 1.8 TiB 618 GiB 612 GiB 1.5 GiB >> 4.5 >> > GiB 1.2 TiB 33.19 1.08 61 up osd.51 >> > 52 ssd 1.81929 1.00000 1.8 TiB 526 GiB 521 GiB 1.4 GiB >> 4.1 >> > GiB 1.3 TiB 28.25 0.92 53 up osd.52 >> > 53 ssd 1.81929 1.00000 1.8 TiB 618 GiB 611 GiB 2.4 GiB >> 4.3 >> > GiB 1.2 TiB 33.17 1.08 66 up osd.53 >> > 54 ssd 1.81929 1.00000 1.8 TiB 550 GiB 544 GiB 1.5 GiB >> 4.3 >> > GiB 1.3 TiB 29.54 0.96 55 up osd.54 >> > 55 ssd 1.81929 1.00000 1.8 TiB 527 GiB 522 GiB 1.3 GiB >> 4.0 >> > GiB 1.3 TiB 28.29 0.92 52 up osd.55 >> > 56 ssd 1.81929 1.00000 1.8 TiB 525 GiB 519 GiB 1.2 GiB >> 4.1 >> > GiB 1.3 TiB 28.16 0.91 52 up osd.56 >> > 57 ssd 1.81929 1.00000 1.8 TiB 615 GiB 609 GiB 2.3 GiB >> 4.2 >> > GiB 1.2 TiB 33.03 1.07 65 up osd.57 >> > 58 ssd 1.81929 1.00000 1.8 TiB 527 GiB 522 GiB 1.6 GiB >> 3.7 >> > GiB 1.3 TiB 28.31 0.92 55 up osd.58 >> > 59 ssd 1.81929 1.00000 1.8 TiB 615 GiB 609 GiB 1.2 GiB >> 4.6 >> > GiB 1.2 TiB 33.01 1.07 60 up osd.59 >> > 60 ssd 1.81929 1.00000 1.8 TiB 594 GiB 588 GiB 1.2 GiB >> 4.4 >> > GiB 1.2 TiB 31.88 1.03 59 up osd.60 >> > 61 ssd 1.81929 1.00000 1.8 TiB 616 GiB 610 GiB 1.9 GiB >> 4.1 >> > GiB 1.2 TiB 33.04 1.07 64 up osd.61 >> > 62 ssd 1.81929 1.00000 1.8 TiB 620 GiB 614 GiB 1.9 GiB >> 4.4 >> > GiB 1.2 TiB 33.27 1.08 63 up osd.62 >> > 63 ssd 1.81929 1.00000 1.8 TiB 527 GiB 522 GiB 1.5 GiB >> 4.0 >> > GiB 1.3 TiB 28.30 0.92 53 up osd.63 >> > -11 29.10864 - 29 TiB 9.0 TiB 8.9 TiB 23 GiB >> 65 >> > GiB 20 TiB 30.91 1.00 - host ud-05 >> > 64 ssd 1.81929 1.00000 1.8 TiB 608 GiB 601 GiB 2.3 GiB >> 4.5 >> > GiB 1.2 TiB 32.62 1.06 65 up osd.64 >> > 65 ssd 1.81929 1.00000 1.8 TiB 606 GiB 601 GiB 628 MiB >> 4.2 >> > GiB 1.2 TiB 32.53 1.06 57 up osd.65 >> > 66 ssd 1.81929 1.00000 1.8 TiB 583 GiB 578 GiB 1.3 GiB >> 4.3 >> > GiB 1.2 TiB 31.31 1.02 57 up osd.66 >> > 67 ssd 1.81929 1.00000 1.8 TiB 537 GiB 533 GiB 436 MiB >> 3.6 >> > GiB 1.3 TiB 28.82 0.94 50 up osd.67 >> > 68 ssd 1.81929 1.00000 1.8 TiB 541 GiB 535 GiB 2.5 GiB >> 3.8 >> > GiB 1.3 TiB 29.04 0.94 59 up osd.68 >> > 69 ssd 1.81929 1.00000 1.8 TiB 606 GiB 601 GiB 1.1 GiB >> 4.4 >> > GiB 1.2 TiB 32.55 1.06 59 up osd.69 >> > 70 ssd 1.81929 1.00000 1.8 TiB 604 GiB 598 GiB 1.8 GiB >> 4.1 >> > GiB 1.2 TiB 32.44 1.05 63 up osd.70 >> > 71 ssd 1.81929 1.00000 1.8 TiB 606 GiB 600 GiB 1.9 GiB >> 4.5 >> > GiB 1.2 TiB 32.53 1.06 62 up osd.71 >> > 72 ssd 1.81929 1.00000 1.8 TiB 602 GiB 598 GiB 612 MiB >> 4.1 >> > GiB 1.2 TiB 32.33 1.05 57 up osd.72 >> > 73 ssd 1.81929 1.00000 1.8 TiB 571 GiB 565 GiB 1.8 GiB >> 4.5 >> > GiB 1.3 TiB 30.65 0.99 58 up osd.73 >> > 74 ssd 1.81929 1.00000 1.8 TiB 608 GiB 602 GiB 1.8 GiB >> 4.2 >> > GiB 1.2 TiB 32.62 1.06 61 up osd.74 >> > 75 ssd 1.81929 1.00000 1.8 TiB 536 GiB 531 GiB 1.9 GiB >> 3.5 >> > GiB 1.3 TiB 28.80 0.93 57 up osd.75 >> > 76 ssd 1.81929 1.00000 1.8 TiB 605 GiB 599 GiB 1.4 GiB >> 4.5 >> > GiB 1.2 TiB 32.48 1.05 60 up osd.76 >> > 77 ssd 1.81929 1.00000 1.8 TiB 537 GiB 532 GiB 1.2 GiB >> 3.9 >> > GiB 1.3 TiB 28.84 0.94 52 up osd.77 >> > 78 ssd 1.81929 1.00000 1.8 TiB 525 GiB 520 GiB 1.3 GiB >> 3.8 >> > GiB 1.3 TiB 28.20 0.92 52 up osd.78 >> > 79 ssd 1.81929 1.00000 1.8 TiB 536 GiB 531 GiB 1.1 GiB >> 3.3 >> > GiB 1.3 TiB 28.76 0.93 53 up osd.79 >> > TOTAL 146 TiB 45 TiB 44 TiB 119 GiB >> 333 >> > GiB 101 TiB 30.81 >> > MIN/MAX VAR: 0.91/1.08 STDDEV: 1.90 >> > >> > >> > >> > Eugen Block <eblock@xxxxxx>, 25 Oca 2024 Per, 16:52 tarihinde şunu >> yazdı: >> > >> >> There is no definitive answer wrt mds tuning. As it is everywhere >> >> mentioned, it's about finding the right setup for your specific >> >> workload. If you can synthesize your workload (maybe scale down a bit) >> >> try optimizing it in a test cluster without interrupting your >> >> developers too much. >> >> But what you haven't explained yet is what are you experiencing as a >> >> performance issue? Do you have numbers or a detailed description? >> >> From the fs status output you didn't seem to have too much activity >> >> going on (around 140 requests per second), but that's probably not the >> >> usual traffic? What does ceph report in its client IO output? >> >> Can you paste the 'ceph osd df' output as well? >> >> Do you have dedicated MDS servers or are they colocated with other >> >> services? >> >> >> >> Zitat von Özkan Göksu <ozkangksu@xxxxxxxxx>: >> >> >> >> > Hello Eugen. >> >> > >> >> > I read all of your MDS related topics and thank you so much for your >> >> effort >> >> > on this. >> >> > There is not much information and I couldn't find a MDS tuning guide >> at >> >> > all. It seems that you are the correct person to discuss mds >> debugging >> >> and >> >> > tuning. >> >> > >> >> > Do you have any documents or may I learn what is the proper way to >> debug >> >> > MDS and clients ? >> >> > Which debug logs will guide me to understand the limitations and will >> >> help >> >> > to tune according to the data flow? >> >> > >> >> > While searching, I find this: >> >> > >> >> >> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YO4SGL4DJQ6EKUBUIHKTFSW72ZJ3XLZS/ >> >> > quote:"A user running VSCodium, keeping 15k caps open.. the >> opportunistic >> >> > caps recall eventually starts recalling those but the (el7 kernel) >> client >> >> > won't release them. Stopping Codium seems to be the only way to >> release." >> >> > >> >> > Because of this I think I also need to play around with the client >> side >> >> too. >> >> > >> >> > My main goal is increasing the speed and reducing the latency and I >> >> wonder >> >> > if these ideas are correct or not: >> >> > - Maybe I need to increase client side cache size because via each >> >> client, >> >> > multiple users request a lot of objects and clearly the >> >> > client_cache_size=16 default is not enough. >> >> > - Maybe I need to increase client side maximum cache limit for >> >> > object "client_oc_max_objects=1000 to 10000" and data >> >> "client_oc_size=200mi >> >> > to 400mi" >> >> > - The client cache cleaning threshold is not aggressive enough to >> keep >> >> the >> >> > free cache size in the desired range. I need to make it aggressive >> but >> >> this >> >> > should not reduce speed and increase latency. >> >> > >> >> > mds_cache_memory_limit=4gi to 16gi >> >> > client_oc_max_objects=1000 to 10000 >> >> > client_oc_size=200mi to 400mi >> >> > client_permissions=false #to reduce latency. >> >> > client_cache_size=16 to 128 >> >> > >> >> > >> >> > What do you think? >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx