Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

Özkan Göksu <ozkangksu@xxxxxxxxx> · Sat, 27 Jan 2024 02:58:29 +0300

I started to investigate my clients.

for example:

root@ud-01:~# ceph health detail
HEALTH_WARN 1 clients failing to respond to cache pressure
[WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache pressure
    mds.ud-data.ud-02.xcoojt(mds.0): Client bmw-m4 failing to respond to
cache pressure client_id: 1275577

root@ud-01:~# ceph fs status
ud-data - 86 clients
=======
RANK  STATE           MDS              ACTIVITY     DNS    INOS   DIRS
CAPS
 0    active  ud-data.ud-02.xcoojt  Reqs:   34 /s  2926k  2827k   155k
 1157k

ceph tell mds.ud-data.ud-02.xcoojt session ls | jq -r '.[] | "clientid:
\(.id)= num_caps: \(.num_caps), num_leases: \(.num_leases),
request_load_avg: \(.request_load_avg), num_completed_requests:
\(.num_completed_requests), num_completed_flushes:
\(.num_completed_flushes)"' | sort -n -t: -k3

clientid: *1275577*= num_caps: 12312, num_leases: 0, request_load_avg: 0,
num_completed_requests: 0, num_completed_flushes: 1
clientid: 1275571= num_caps: 16307, num_leases: 1, request_load_avg: 2101,
num_completed_requests: 0, num_completed_flushes: 3
clientid: 1282130= num_caps: 26337, num_leases: 3, request_load_avg: 116,
num_completed_requests: 0, num_completed_flushes: 1
clientid: 1191789= num_caps: 32784, num_leases: 0, request_load_avg: 1846,
num_completed_requests: 0, num_completed_flushes: 0
clientid: 1275535= num_caps: 79825, num_leases: 2, request_load_avg: 133,
num_completed_requests: 8, num_completed_flushes: 8
clientid: 1282142= num_caps: 80581, num_leases: 6, request_load_avg: 125,
num_completed_requests: 2, num_completed_flushes: 6
clientid: 1275532= num_caps: 87836, num_leases: 3, request_load_avg: 190,
num_completed_requests: 2, num_completed_flushes: 6
clientid: 1275547= num_caps: 94129, num_leases: 4, request_load_avg: 149,
num_completed_requests: 2, num_completed_flushes: 4
clientid: 1275553= num_caps: 96460, num_leases: 4, request_load_avg: 155,
num_completed_requests: 2, num_completed_flushes: 8
clientid: 1282139= num_caps: 108882, num_leases: 25, request_load_avg: 99,
num_completed_requests: 2, num_completed_flushes: 4
clientid: 1275538= num_caps: 437162, num_leases: 0, request_load_avg: 101,
num_completed_requests: 2, num_completed_flushes: 0

--------------------------------------

*MY CLIENT:*

The client is actually at idle mode and there is no reason to fail at all.

root@bmw-m4:~# apt list --installed |grep ceph
ceph-common/jammy-updates,now 17.2.6-0ubuntu0.22.04.2 amd64 [installed]
libcephfs2/jammy-updates,now 17.2.6-0ubuntu0.22.04.2 amd64
[installed,automatic]
python3-ceph-argparse/jammy-updates,now 17.2.6-0ubuntu0.22.04.2 amd64
[installed,automatic]
python3-ceph-common/jammy-updates,now 17.2.6-0ubuntu0.22.04.2 all
[installed,automatic]
python3-cephfs/jammy-updates,now 17.2.6-0ubuntu0.22.04.2 amd64
[installed,automatic]

Let's check metrics and stats:

root@bmw-m4:/sys/kernel/debug/ceph/e42fd4b0-313b-11ee-9a00-31da71873773.client1275577#
cat metrics
item                               total
------------------------------------------
opened files  / total inodes       2 / 12312
pinned i_caps / total inodes       12312 / 12312
opened inodes / total inodes       1 / 12312

item          total       avg_lat(us)     min_lat(us)     max_lat(us)
stdev(us)
-----------------------------------------------------------------------------------
read          22283       44409           430             1804853
15619
write         112702      419725          3658            8879541
6008
metadata      353322      5712            154             917903
 5357

item          total       avg_sz(bytes)   min_sz(bytes)   max_sz(bytes)
 total_sz(bytes)
----------------------------------------------------------------------------------------
read          22283       1701940         1               4194304
37924318602
write         112702      246211          1               4194304
27748469309

item          total           miss            hit
-------------------------------------------------
d_lease       62              63627           28564698
caps          12312           36658           44568261

root@bmw-m4:/sys/kernel/debug/ceph/e42fd4b0-313b-11ee-9a00-31da71873773.client1275577#
cat bdi/stats
BdiWriteback:                0 kB
BdiReclaimable:            800 kB
BdiDirtyThresh:              0 kB
DirtyThresh:           5795340 kB
BackgroundThresh:      2894132 kB
BdiDirtied:           27316320 kB
BdiWritten:           27316320 kB
BdiWriteBandwidth:        1472 kBps
b_dirty:                     0
b_io:                        0
b_more_io:                   0
b_dirty_time:                0
bdi_list:                    1
state:                       1

Last 3 days dmesg output:

[Wed Jan 24 16:45:13 2024] xfsettingsd[653036]: segfault at 18 ip
00007fbd12f5d337 sp 00007ffd254332a0 error 4 in
libxklavier.so.16.4.0[7fbd12f4d000+19000]
[Wed Jan 24 16:45:13 2024] Code: 4c 89 e7 e8 0b 56 ff ff 48 89 03 48 8b 5c
24 30 e9 d1 fd ff ff e8 b9 5b ff ff 66 0f 1f 84 00 00 00 00 00 41 54 55 48
89 f5 53 <48> 8b 42 18 48 89 d1 49 89 fc 48 89 d3 48 89 fa 48 89 ef 48 8b b0
[Thu Jan 25 06:51:31 2024] NVRM: GPU at PCI:0000:81:00:
GPU-02efbb18-c9e4-3a16-d615-598959520b99
[Thu Jan 25 06:51:31 2024] NVRM: GPU Board Serial Number: 1321421015411
[Thu Jan 25 06:51:31 2024] NVRM: Xid (PCI:0000:81:00): 43, pid=683281,
name=python, Ch 00000008
[Thu Jan 25 06:56:49 2024] NVRM: Xid (PCI:0000:81:00): 43, pid=683377,
name=python, Ch 00000018
[Thu Jan 25 20:14:13 2024] NVRM: Xid (PCI:0000:81:00): 43, pid=696062,
name=python, Ch 00000008
[Fri Jan 26 04:05:40 2024] NVRM: Xid (PCI:0000:81:00): 43, pid=700166,
name=python, Ch 00000008
[Fri Jan 26 05:05:12 2024] NVRM: Xid (PCI:0000:81:00): 43, pid=700320,
name=python, Ch 00000008
[Fri Jan 26 05:44:50 2024] NVRM: GPU at PCI:0000:82:00:
GPU-3af62a2c-e7eb-a7d5-c073-22f06dc7065f
[Fri Jan 26 05:44:50 2024] NVRM: GPU Board Serial Number: 1321421010400
[Fri Jan 26 05:44:50 2024] NVRM: Xid (PCI:0000:82:00): 43, pid=700757,
name=python, Ch 00000018
[Fri Jan 26 05:56:02 2024] NVRM: Xid (PCI:0000:81:00): 43, pid=701096,
name=python, Ch 00000028
[Fri Jan 26 06:34:20 2024] NVRM: Xid (PCI:0000:81:00): 43, pid=701226,
name=python, Ch 00000038

root@bmw-m4:/sys/kernel/debug/ceph/e42fd4b0-313b-11ee-9a00-31da71873773.client1275577#
free -h
               total        used        free      shared  buff/cache
available
Mem:            62Gi        34Gi        27Gi       0.0Ki       639Mi
 27Gi
Swap:          1.8Ti        18Gi       1.8Ti

root@bmw-m4:/sys/kernel/debug/ceph/e42fd4b0-313b-11ee-9a00-31da71873773.client1275577#
cat /proc/vmstat
nr_free_pages 7231171
nr_zone_inactive_anon 7924766
nr_zone_active_anon 525190
nr_zone_inactive_file 44029
nr_zone_active_file 55966
nr_zone_unevictable 13042
nr_zone_write_pending 3
nr_mlock 13042
nr_bounce 0
nr_zspages 0
nr_free_cma 0
numa_hit 6701928919
numa_miss 312628341
numa_foreign 312628341
numa_interleave 31538
numa_local 6701864751
numa_other 312692567
nr_inactive_anon 7924766
nr_active_anon 525190
nr_inactive_file 44029
nr_active_file 55966
nr_unevictable 13042
nr_slab_reclaimable 61076
nr_slab_unreclaimable 63509
nr_isolated_anon 0
nr_isolated_file 0
workingset_nodes 3934
workingset_refault_anon 30325493
workingset_refault_file 14593094
workingset_activate_anon 5376050
workingset_activate_file 3250679
workingset_restore_anon 292317
workingset_restore_file 1166673
workingset_nodereclaim 488665
nr_anon_pages 8451968
nr_mapped 35731
nr_file_pages 138824
nr_dirty 3
nr_writeback 0
nr_writeback_temp 0
nr_shmem 242
nr_shmem_hugepages 0
nr_shmem_pmdmapped 0
nr_file_hugepages 0
nr_file_pmdmapped 0
nr_anon_transparent_hugepages 3588
nr_vmscan_write 33746573
nr_vmscan_immediate_reclaim 160
nr_dirtied 48165341
nr_written 80207893
nr_kernel_misc_reclaimable 0
nr_foll_pin_acquired 174002
nr_foll_pin_released 174002
nr_kernel_stack 60032
nr_page_table_pages 46041
nr_swapcached 36166
nr_dirty_threshold 1448010
nr_dirty_background_threshold 723121
pgpgin 129904699
pgpgout 299261581
pswpin 30325493
pswpout 45158221
pgalloc_dma 1024
pgalloc_dma32 57788566
pgalloc_normal 6956384725
pgalloc_movable 0
allocstall_dma 0
allocstall_dma32 0
allocstall_normal 188
allocstall_movable 63024
pgskip_dma 0
pgskip_dma32 0
pgskip_normal 0
pgskip_movable 0
pgfree 7222273815
pgactivate 1371753960
pgdeactivate 18329381
pglazyfree 10
pgfault 7795723861
pgmajfault 4600007
pglazyfreed 0
pgrefill 18575528
pgreuse 81910383
pgsteal_kswapd 980532060
pgsteal_direct 38942066
pgdemote_kswapd 0
pgdemote_direct 0
pgscan_kswapd 1135293298
pgscan_direct 58883653
pgscan_direct_throttle 15
pgscan_anon 220939938
pgscan_file 973237013
pgsteal_anon 46538607
pgsteal_file 972935519
zone_reclaim_failed 0
pginodesteal 0
slabs_scanned 25879882
kswapd_inodesteal 2179831
kswapd_low_wmark_hit_quickly 152797
kswapd_high_wmark_hit_quickly 32025
pageoutrun 204447
pgrotated 44963935
drop_pagecache 0
drop_slab 0
oom_kill 0
numa_pte_updates 2724410955
numa_huge_pte_updates 1695890
numa_hint_faults 1739823254
numa_hint_faults_local 1222358972
numa_pages_migrated 312611639
pgmigrate_success 510846802
pgmigrate_fail 875493
thp_migration_success 156413
thp_migration_fail 2
thp_migration_split 0
compact_migrate_scanned 1274073243
compact_free_scanned 8430842597
compact_isolated 400278352
compact_stall 145300
compact_fail 128562
compact_success 16738
compact_daemon_wake 170247
compact_daemon_migrate_scanned 35486283
compact_daemon_free_scanned 369870412
htlb_buddy_alloc_success 0
htlb_buddy_alloc_fail 0
unevictable_pgs_culled 2774290
unevictable_pgs_scanned 0
unevictable_pgs_rescued 2675031
unevictable_pgs_mlocked 2813622
unevictable_pgs_munlocked 2674972
unevictable_pgs_cleared 84231
unevictable_pgs_stranded 84225
thp_fault_alloc 416468
thp_fault_fallback 19181
thp_fault_fallback_charge 0
thp_collapse_alloc 17931
thp_collapse_alloc_failed 76
thp_file_alloc 0
thp_file_fallback 0
thp_file_fallback_charge 0
thp_file_mapped 0
thp_split_page 2
thp_split_page_failed 0
thp_deferred_split_page 66
thp_split_pmd 22451
thp_split_pud 0
thp_zero_page_alloc 1
thp_zero_page_alloc_failed 0
thp_swpout 22332
thp_swpout_fallback 0
balloon_inflate 0
balloon_deflate 0
balloon_migrate 0
swap_ra 25777929
swap_ra_hit 25658825
direct_map_level2_splits 1249
direct_map_level3_splits 49
nr_unstable 0

Özkan Göksu <ozkangksu@xxxxxxxxx>, 27 Oca 2024 Cmt, 02:36 tarihinde şunu
yazdı:

> Hello Frank.
>
> I have 84 clients (high-end servers) with: Ubuntu 20.04.5 LTS - Kernel:
> Linux 5.4.0-125-generic
>
> My cluster 17.2.6 quincy.
> I have some client nodes with "ceph-common/stable,now 17.2.7-1focal" I
> wonder using new version clients is the main problem?
> Maybe I have a communication error. For example I hit this problem and I
> can not collect client stats "
> https://github.com/ceph/ceph/pull/52127/files";
>
> Best regards.
>
>
>
> Frank Schilder <frans@xxxxxx>, 26 Oca 2024 Cum, 14:53 tarihinde şunu
> yazdı:
>
>> Hi, this message is one of those that are often spurious. I don't recall
>> in which thread/PR/tracker I read it, but the story was something like that:
>>
>> If an MDS gets under memory pressure it will request dentry items back
>> from *all* clients, not just the active ones or the ones holding many of
>> them. If you have a client that's below the min-threshold for dentries (its
>> one of the client/mds tuning options), it will not respond. This client
>> will be flagged as not responding, which is a false positive.
>>
>> I believe the devs are working on a fix to get rid of these spurious
>> warnings. There is a "bug/feature" in the MDS that does not clear this
>> warning flag for inactive clients. Hence, the message hangs and never
>> disappears. I usually clear it with a "echo 3 > /proc/sys/vm/drop_caches"
>> on the client. However, except for being annoying in the dashboard, it has
>> no performance or otherwise negative impact.
>>
>> Best regards,
>> =================
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> ________________________________________
>> From: Eugen Block <eblock@xxxxxx>
>> Sent: Friday, January 26, 2024 10:05 AM
>> To: Özkan Göksu
>> Cc: ceph-users@xxxxxxx
>> Subject:  Re: 1 clients failing to respond to cache pressure
>> (quincy:17.2.6)
>>
>> Performance for small files is more about IOPS rather than throughput,
>> and the IOPS in your fio tests look okay to me. What you could try is
>> to split the PGs to get around 150 or 200 PGs per OSD. You're
>> currently at around 60 according to the ceph osd df output. Before you
>> do that, can you share 'ceph pg ls-by-pool cephfs.ud-data.data |
>> head'? I don't need the whole output, just to see how many objects
>> each PG has. We had a case once where that helped, but it was an older
>> cluster and the pool was backed by HDDs and separate rocksDB on SSDs.
>> So this might not be the solution here, but it could improve things as
>> well.
>>
>>
>> Zitat von Özkan Göksu <ozkangksu@xxxxxxxxx>:
>>
>> > Every user has a 1x subvolume and I only have 1 pool.
>> > At the beginning we were using each subvolume for ldap home directory +
>> > user data.
>> > When a user logins any docker on any host, it was using the cluster for
>> > home and the for user related data, we was have second directory in the
>> > same subvolume.
>> > Time to time users were feeling a very slow home environment and after a
>> > month it became almost impossible to use home. VNC sessions became
>> > unresponsive and slow etc.
>> >
>> > 2 weeks ago, I had to migrate home to a ZFS storage and now the overall
>> > performance is better for only user_data without home.
>> > But still the performance is not good enough as I expected because of
>> the
>> > problems related to MDS.
>> > The usage is low but allocation is high and Cpu usage is high. You saw
>> the
>> > IO Op/s, it's nothing but allocation is high.
>> >
>> > I develop a fio benchmark script and I run the script on 4x test server
>> at
>> > the same time, the results are below:
>> > Script:
>> >
>> https://github.com/ozkangoksu/benchmark/blob/8f5df87997864c25ef32447e02fcd41fda0d2a67/iobench.sh
>> >
>> >
>> https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-01.txt
>> >
>> https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-02.txt
>> >
>> https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-03.txt
>> >
>> https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-04.txt
>> >
>> > While running benchmark, I take sample values for each type of iobench
>> run.
>> >
>> > Seq Write benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
>> >     client:   70 MiB/s rd, 762 MiB/s wr, 337 op/s rd, 24.41k op/s wr
>> >     client:   60 MiB/s rd, 551 MiB/s wr, 303 op/s rd, 35.12k op/s wr
>> >     client:   13 MiB/s rd, 161 MiB/s wr, 101 op/s rd, 41.30k op/s wr
>> >
>> > Seq Read benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
>> >     client:   1.6 GiB/s rd, 219 KiB/s wr, 28.76k op/s rd, 89 op/s wr
>> >     client:   370 MiB/s rd, 475 KiB/s wr, 90.38k op/s rd, 89 op/s wr
>> >
>> > Rand Write benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
>> >     client:   63 MiB/s rd, 1.5 GiB/s wr, 8.77k op/s rd, 5.50k op/s wr
>> >     client:   14 MiB/s rd, 1.8 GiB/s wr, 81 op/s rd, 13.86k op/s wr
>> >     client:   6.6 MiB/s rd, 1.2 GiB/s wr, 61 op/s rd, 30.13k op/s wr
>> >
>> > Rand Read benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
>> >     client:   317 MiB/s rd, 841 MiB/s wr, 426 op/s rd, 10.98k op/s wr
>> >     client:   2.8 GiB/s rd, 882 MiB/s wr, 25.68k op/s rd, 291 op/s wr
>> >     client:   4.0 GiB/s rd, 226 MiB/s wr, 89.63k op/s rd, 124 op/s wr
>> >     client:   2.4 GiB/s rd, 295 KiB/s wr, 197.86k op/s rd, 20 op/s wr
>> >
>> > It seems I only have problems with the 4K,8K,16K other sector sizes.
>> >
>> >
>> >
>> >
>> > Eugen Block <eblock@xxxxxx>, 25 Oca 2024 Per, 19:06 tarihinde şunu
>> yazdı:
>> >
>> >> I understand that your MDS shows a high CPU usage, but other than that
>> >> what is your performance issue? Do users complain? Do some operations
>> >> take longer than expected? Are OSDs saturated during those phases?
>> >> Because the cache pressure messages don’t necessarily mean that users
>> >> will notice.
>> >> MDS daemons are single-threaded so that might be a bottleneck. In that
>> >> case multi-active mds might help, which you already tried and
>> >> experienced OOM killers. But you might have to disable the mds
>> >> balancer as someone else mentioned. And then you could think about
>> >> pinning, is it possible to split the CephFS into multiple
>> >> subdirectories and pin them to different ranks?
>> >> But first I’d still like to know what the performance issue really is.
>> >>
>> >> Zitat von Özkan Göksu <ozkangksu@xxxxxxxxx>:
>> >>
>> >> > I will try my best to explain my situation.
>> >> >
>> >> > I don't have a separate mds server. I have 5 identical nodes, 3 of
>> them
>> >> > mons, and I use the other 2 as active and standby mds. (currently I
>> have
>> >> > left overs from max_mds 4)
>> >> >
>> >> > root@ud-01:~# ceph -s
>> >> >   cluster:
>> >> >     id:     e42fd4b0-313b-11ee-9a00-31da71873773
>> >> >     health: HEALTH_WARN
>> >> >             1 clients failing to respond to cache pressure
>> >> >
>> >> >   services:
>> >> >     mon: 3 daemons, quorum ud-01,ud-02,ud-03 (age 9d)
>> >> >     mgr: ud-01.qycnol(active, since 8d), standbys: ud-02.tfhqfd
>> >> >     mds: 1/1 daemons up, 4 standby
>> >> >     osd: 80 osds: 80 up (since 9d), 80 in (since 5M)
>> >> >
>> >> >   data:
>> >> >     volumes: 1/1 healthy
>> >> >     pools:   3 pools, 2305 pgs
>> >> >     objects: 106.58M objects, 25 TiB
>> >> >     usage:   45 TiB used, 101 TiB / 146 TiB avail
>> >> >     pgs:     2303 active+clean
>> >> >              2    active+clean+scrubbing+deep
>> >> >
>> >> >   io:
>> >> >     client:   16 MiB/s rd, 3.4 MiB/s wr, 77 op/s rd, 23 op/s wr
>> >> >
>> >> > ------------------------------
>> >> > root@ud-01:~# ceph fs status
>> >> > ud-data - 84 clients
>> >> > =======
>> >> > RANK  STATE           MDS              ACTIVITY     DNS    INOS
>>  DIRS
>> >> > CAPS
>> >> >  0    active  ud-data.ud-02.xcoojt  Reqs:   40 /s  2579k  2578k
>>  169k
>> >> >  3048k
>> >> >         POOL           TYPE     USED  AVAIL
>> >> > cephfs.ud-data.meta  metadata   136G  44.9T
>> >> > cephfs.ud-data.data    data    44.3T  44.9T
>> >> >
>> >> > ------------------------------
>> >> > root@ud-01:~# ceph health detail
>> >> > HEALTH_WARN 1 clients failing to respond to cache pressure
>> >> > [WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache
>> pressure
>> >> >     mds.ud-data.ud-02.xcoojt(mds.0): Client bmw-m4 failing to
>> respond to
>> >> > cache pressure client_id: 1275577
>> >> >
>> >> > ------------------------------
>> >> > When I check the failing client with session ls I see only "num_caps:
>> >> 12298"
>> >> >
>> >> > ceph tell mds.ud-data.ud-02.xcoojt session ls | jq -r '.[] |
>> "clientid:
>> >> > \(.id)= num_caps: \(.num_caps), num_leases: \(.num_leases),
>> >> > request_load_avg: \(.request_load_avg), num_completed_requests:
>> >> > \(.num_completed_requests), num_completed_flushes:
>> >> > \(.num_completed_flushes)"' | sort -n -t: -k3
>> >> >
>> >> > clientid: 1275577= num_caps: 12298, num_leases: 0, request_load_avg:
>> 0,
>> >> > num_completed_requests: 0, num_completed_flushes: 1
>> >> > clientid: 1294542= num_caps: 13000, num_leases: 12, request_load_avg:
>> >> 105,
>> >> > num_completed_requests: 0, num_completed_flushes: 6
>> >> > clientid: 1282187= num_caps: 16869, num_leases: 1, request_load_avg:
>> 0,
>> >> > num_completed_requests: 0, num_completed_flushes: 1
>> >> > clientid: 1275589= num_caps: 18943, num_leases: 0, request_load_avg:
>> 52,
>> >> > num_completed_requests: 0, num_completed_flushes: 1
>> >> > clientid: 1282154= num_caps: 24747, num_leases: 1, request_load_avg:
>> 57,
>> >> > num_completed_requests: 2, num_completed_flushes: 2
>> >> > clientid: 1275553= num_caps: 25120, num_leases: 2, request_load_avg:
>> 116,
>> >> > num_completed_requests: 2, num_completed_flushes: 8
>> >> > clientid: 1282142= num_caps: 27185, num_leases: 6, request_load_avg:
>> 128,
>> >> > num_completed_requests: 0, num_completed_flushes: 8
>> >> > clientid: 1275535= num_caps: 40364, num_leases: 6, request_load_avg:
>> 111,
>> >> > num_completed_requests: 2, num_completed_flushes: 8
>> >> > clientid: 1282130= num_caps: 41483, num_leases: 0, request_load_avg:
>> 135,
>> >> > num_completed_requests: 0, num_completed_flushes: 1
>> >> > clientid: 1275547= num_caps: 42953, num_leases: 4, request_load_avg:
>> 119,
>> >> > num_completed_requests: 2, num_completed_flushes: 6
>> >> > clientid: 1282139= num_caps: 45435, num_leases: 27,
>> request_load_avg: 84,
>> >> > num_completed_requests: 2, num_completed_flushes: 34
>> >> > clientid: 1282136= num_caps: 48374, num_leases: 8, request_load_avg:
>> 0,
>> >> > num_completed_requests: 1, num_completed_flushes: 1
>> >> > clientid: 1275532= num_caps: 48664, num_leases: 7, request_load_avg:
>> 115,
>> >> > num_completed_requests: 2, num_completed_flushes: 8
>> >> > clientid: 1191789= num_caps: 130319, num_leases: 0, request_load_avg:
>> >> 1753,
>> >> > num_completed_requests: 0, num_completed_flushes: 0
>> >> > clientid: 1275571= num_caps: 139488, num_leases: 0,
>> request_load_avg: 2,
>> >> > num_completed_requests: 0, num_completed_flushes: 1
>> >> > clientid: 1282133= num_caps: 145487, num_leases: 0,
>> request_load_avg: 8,
>> >> > num_completed_requests: 1, num_completed_flushes: 1
>> >> > clientid: 1534496= num_caps: 1041316, num_leases: 0,
>> request_load_avg: 0,
>> >> > num_completed_requests: 0, num_completed_flushes: 1
>> >> >
>> >> > ------------------------------
>> >> > When I check the dashboard/service/mds I see %120+ CPU usage on
>> active
>> >> MDS
>> >> > but on the host everything is almost idle and disk waits are very
>> low.
>> >> >
>> >> > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>> >> >            0.61    0.00    0.38    0.41    0.00   98.60
>> >> >
>> >> > Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz
>>  w/s
>> >> >   wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s
>> >> %drqm
>> >> > d_await dareq-sz     f/s f_await  aqu-sz  %util
>> >> > sdc              2.00      0.01     0.00   0.00    0.50     6.00
>>  20.00
>> >> >    0.04     0.00   0.00    0.50     2.00    0.00      0.00     0.00
>> >>  0.00
>> >> >    0.00     0.00   10.00    0.60    0.02   1.20
>> >> > sdd              3.00      0.02     0.00   0.00    0.67     8.00
>> 285.00
>> >> >    1.84    77.00  21.27    0.44     6.61    0.00      0.00     0.00
>> >>  0.00
>> >> >    0.00     0.00  114.00    0.83    0.22  22.40
>> >> > sde              1.00      0.01     0.00   0.00    1.00     8.00
>>  36.00
>> >> >    0.08     3.00   7.69    0.64     2.33    0.00      0.00     0.00
>> >>  0.00
>> >> >    0.00     0.00   18.00    0.67    0.04   1.60
>> >> > sdf              5.00      0.04     0.00   0.00    0.40     7.20
>>  40.00
>> >> >    0.09     3.00   6.98    0.53     2.30    0.00      0.00     0.00
>> >>  0.00
>> >> >    0.00     0.00   20.00    0.70    0.04   2.00
>> >> > sdg             11.00      0.08     0.00   0.00    0.73     7.27
>>  36.00
>> >> >    0.09     4.00  10.00    0.50     2.44    0.00      0.00     0.00
>> >>  0.00
>> >> >    0.00     0.00   18.00    0.72    0.04   3.20
>> >> > sdh              5.00      0.03     0.00   0.00    0.60     5.60
>>  46.00
>> >> >    0.10     2.00   4.17    0.59     2.17    0.00      0.00     0.00
>> >>  0.00
>> >> >    0.00     0.00   23.00    0.83    0.05   2.80
>> >> > sdi              7.00      0.04     0.00   0.00    0.43     6.29
>>  36.00
>> >> >    0.07     1.00   2.70    0.47     2.11    0.00      0.00     0.00
>> >>  0.00
>> >> >    0.00     0.00   18.00    0.61    0.03   2.40
>> >> > sdj              5.00      0.04     0.00   0.00    0.80     7.20
>>  42.00
>> >> >    0.09     1.00   2.33    0.67     2.10    0.00      0.00     0.00
>> >>  0.00
>> >> >    0.00     0.00   21.00    0.81    0.05   3.20
>> >> >
>> >> > ------------------------------
>> >> > Other than this 5x node cluster, I also have a 3x node cluster with
>> >> > identical hardware but it serves for a different purpose and data
>> >> workload.
>> >> > In this cluster I don't have any problem and MDS default settings
>> seems
>> >> > enough.
>> >> > The only difference between two cluster is, 5x node cluster used
>> directly
>> >> > by users, 3x node cluster used heavily to read and write data via
>> >> projects
>> >> > not by users. So allocate and de-allocate will be better.
>> >> >
>> >> > I guess I just have a problematic use case on the 5x node cluster
>> and as
>> >> I
>> >> > mentioned above, I might have the similar problem but I don't know
>> how to
>> >> > debug it.
>> >> >
>> >> >
>> >>
>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YO4SGL4DJQ6EKUBUIHKTFSW72ZJ3XLZS/
>> >> > quote:"A user running VSCodium, keeping 15k caps open.. the
>> opportunistic
>> >> > caps recall eventually starts recalling those but the (el7 kernel)
>> client
>> >> > won't release them. Stopping Codium seems to be the only way to
>> release."
>> >> >
>> >> > ------------------------------
>> >> > Before reading the osd df you should know that I created 2x
>> >> > OSD/per"CT4000MX500SSD1"
>> >> > # ceph osd df tree
>> >> > ID   CLASS  WEIGHT     REWEIGHT  SIZE     RAW USE  DATA     OMAP
>> >> META
>> >> >     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
>> >> >  -1         145.54321         -  146 TiB   45 TiB   44 TiB   119
>> GiB  333
>> >> > GiB  101 TiB  30.81  1.00    -          root default
>> >> >  -3          29.10864         -   29 TiB  8.9 TiB  8.8 TiB    25
>> GiB   66
>> >> > GiB   20 TiB  30.54  0.99    -              host ud-01
>> >> >   0    ssd    1.81929   1.00000  1.8 TiB  616 GiB  610 GiB   1.4
>> GiB  4.5
>> >> > GiB  1.2 TiB  33.04  1.07   61      up          osd.0
>> >> >   1    ssd    1.81929   1.00000  1.8 TiB  527 GiB  521 GiB   1.5
>> GiB  4.0
>> >> > GiB  1.3 TiB  28.28  0.92   53      up          osd.1
>> >> >   2    ssd    1.81929   1.00000  1.8 TiB  595 GiB  589 GiB   2.3
>> GiB  4.0
>> >> > GiB  1.2 TiB  31.96  1.04   63      up          osd.2
>> >> >   3    ssd    1.81929   1.00000  1.8 TiB  527 GiB  521 GiB   1.8
>> GiB  4.2
>> >> > GiB  1.3 TiB  28.30  0.92   55      up          osd.3
>> >> >   4    ssd    1.81929   1.00000  1.8 TiB  525 GiB  520 GiB   1.3
>> GiB  3.9
>> >> > GiB  1.3 TiB  28.21  0.92   52      up          osd.4
>> >> >   5    ssd    1.81929   1.00000  1.8 TiB  592 GiB  586 GiB   1.8
>> GiB  3.8
>> >> > GiB  1.2 TiB  31.76  1.03   61      up          osd.5
>> >> >   6    ssd    1.81929   1.00000  1.8 TiB  559 GiB  553 GiB   1.8
>> GiB  4.3
>> >> > GiB  1.3 TiB  30.03  0.97   57      up          osd.6
>> >> >   7    ssd    1.81929   1.00000  1.8 TiB  602 GiB  597 GiB   836
>> MiB  4.4
>> >> > GiB  1.2 TiB  32.32  1.05   58      up          osd.7
>> >> >   8    ssd    1.81929   1.00000  1.8 TiB  614 GiB  609 GiB   1.2
>> GiB  4.5
>> >> > GiB  1.2 TiB  32.98  1.07   60      up          osd.8
>> >> >   9    ssd    1.81929   1.00000  1.8 TiB  571 GiB  565 GiB   2.2
>> GiB  4.2
>> >> > GiB  1.3 TiB  30.67  1.00   61      up          osd.9
>> >> >  10    ssd    1.81929   1.00000  1.8 TiB  528 GiB  522 GiB   1.3
>> GiB  4.1
>> >> > GiB  1.3 TiB  28.33  0.92   52      up          osd.10
>> >> >  11    ssd    1.81929   1.00000  1.8 TiB  551 GiB  546 GiB   1.5
>> GiB  3.6
>> >> > GiB  1.3 TiB  29.57  0.96   56      up          osd.11
>> >> >  12    ssd    1.81929   1.00000  1.8 TiB  594 GiB  588 GiB   1.8
>> GiB  4.4
>> >> > GiB  1.2 TiB  31.91  1.04   61      up          osd.12
>> >> >  13    ssd    1.81929   1.00000  1.8 TiB  561 GiB  555 GiB   1.1
>> GiB  4.3
>> >> > GiB  1.3 TiB  30.10  0.98   55      up          osd.13
>> >> >  14    ssd    1.81929   1.00000  1.8 TiB  616 GiB  609 GiB   1.9
>> GiB  4.2
>> >> > GiB  1.2 TiB  33.04  1.07   64      up          osd.14
>> >> >  15    ssd    1.81929   1.00000  1.8 TiB  525 GiB  520 GiB   1.1
>> GiB  4.0
>> >> > GiB  1.3 TiB  28.20  0.92   51      up          osd.15
>> >> >  -5          29.10864         -   29 TiB  9.0 TiB  8.9 TiB    22
>> GiB   67
>> >> > GiB   20 TiB  30.89  1.00    -              host ud-02
>> >> >  16    ssd    1.81929   1.00000  1.8 TiB  617 GiB  611 GiB   1.7
>> GiB  4.7
>> >> > GiB  1.2 TiB  33.12  1.08   63      up          osd.16
>> >> >  17    ssd    1.81929   1.00000  1.8 TiB  582 GiB  577 GiB   1.6
>> GiB  4.0
>> >> > GiB  1.3 TiB  31.26  1.01   59      up          osd.17
>> >> >  18    ssd    1.81929   1.00000  1.8 TiB  583 GiB  578 GiB   418
>> MiB  4.0
>> >> > GiB  1.3 TiB  31.29  1.02   54      up          osd.18
>> >> >  19    ssd    1.81929   1.00000  1.8 TiB  550 GiB  544 GiB   1.5
>> GiB  4.0
>> >> > GiB  1.3 TiB  29.50  0.96   56      up          osd.19
>> >> >  20    ssd    1.81929   1.00000  1.8 TiB  551 GiB  546 GiB   1.1
>> GiB  4.1
>> >> > GiB  1.3 TiB  29.57  0.96   54      up          osd.20
>> >> >  21    ssd    1.81929   1.00000  1.8 TiB  616 GiB  610 GiB   1.3
>> GiB  4.4
>> >> > GiB  1.2 TiB  33.04  1.07   60      up          osd.21
>> >> >  22    ssd    1.81929   1.00000  1.8 TiB  573 GiB  567 GiB   1.6
>> GiB  4.1
>> >> > GiB  1.3 TiB  30.75  1.00   58      up          osd.22
>> >> >  23    ssd    1.81929   1.00000  1.8 TiB  616 GiB  610 GiB   1.3
>> GiB  4.3
>> >> > GiB  1.2 TiB  33.06  1.07   60      up          osd.23
>> >> >  24    ssd    1.81929   1.00000  1.8 TiB  539 GiB  534 GiB   844
>> MiB  3.8
>> >> > GiB  1.3 TiB  28.92  0.94   51      up          osd.24
>> >> >  25    ssd    1.81929   1.00000  1.8 TiB  583 GiB  576 GiB   2.1
>> GiB  4.1
>> >> > GiB  1.3 TiB  31.27  1.02   61      up          osd.25
>> >> >  26    ssd    1.81929   1.00000  1.8 TiB  617 GiB  611 GiB   1.3
>> GiB  4.6
>> >> > GiB  1.2 TiB  33.12  1.08   61      up          osd.26
>> >> >  27    ssd    1.81929   1.00000  1.8 TiB  537 GiB  532 GiB   1.2
>> GiB  4.1
>> >> > GiB  1.3 TiB  28.84  0.94   53      up          osd.27
>> >> >  28    ssd    1.81929   1.00000  1.8 TiB  527 GiB  522 GiB   1.3
>> GiB  4.2
>> >> > GiB  1.3 TiB  28.29  0.92   53      up          osd.28
>> >> >  29    ssd    1.81929   1.00000  1.8 TiB  594 GiB  588 GiB   1.5
>> GiB  4.6
>> >> > GiB  1.2 TiB  31.91  1.04   59      up          osd.29
>> >> >  30    ssd    1.81929   1.00000  1.8 TiB  528 GiB  523 GiB   1.4
>> GiB  4.1
>> >> > GiB  1.3 TiB  28.35  0.92   53      up          osd.30
>> >> >  31    ssd    1.81929   1.00000  1.8 TiB  594 GiB  589 GiB   1.6
>> GiB  3.8
>> >> > GiB  1.2 TiB  31.89  1.03   61      up          osd.31
>> >> >  -7          29.10864         -   29 TiB  8.9 TiB  8.8 TiB    23
>> GiB   67
>> >> > GiB   20 TiB  30.66  1.00    -              host ud-03
>> >> >  32    ssd    1.81929   1.00000  1.8 TiB  593 GiB  588 GiB   1.1
>> GiB  4.3
>> >> > GiB  1.2 TiB  31.84  1.03   57      up          osd.32
>> >> >  33    ssd    1.81929   1.00000  1.8 TiB  617 GiB  611 GiB   1.8
>> GiB  4.4
>> >> > GiB  1.2 TiB  33.13  1.08   63      up          osd.33
>> >> >  34    ssd    1.81929   1.00000  1.8 TiB  537 GiB  532 GiB   2.0
>> GiB  3.8
>> >> > GiB  1.3 TiB  28.84  0.94   59      up          osd.34
>> >> >  35    ssd    1.81929   1.00000  1.8 TiB  562 GiB  556 GiB   1.7
>> GiB  4.2
>> >> > GiB  1.3 TiB  30.16  0.98   58      up          osd.35
>> >> >  36    ssd    1.81929   1.00000  1.8 TiB  529 GiB  523 GiB   1.3
>> GiB  3.9
>> >> > GiB  1.3 TiB  28.38  0.92   52      up          osd.36
>> >> >  37    ssd    1.81929   1.00000  1.8 TiB  527 GiB  521 GiB   1.7
>> GiB  4.2
>> >> > GiB  1.3 TiB  28.28  0.92   55      up          osd.37
>> >> >  38    ssd    1.81929   1.00000  1.8 TiB  574 GiB  568 GiB   1.2
>> GiB  4.3
>> >> > GiB  1.3 TiB  30.79  1.00   55      up          osd.38
>> >> >  39    ssd    1.81929   1.00000  1.8 TiB  605 GiB  599 GiB   1.6
>> GiB  4.2
>> >> > GiB  1.2 TiB  32.48  1.05   61      up          osd.39
>> >> >  40    ssd    1.81929   1.00000  1.8 TiB  573 GiB  567 GiB   1.2
>> GiB  4.4
>> >> > GiB  1.3 TiB  30.76  1.00   56      up          osd.40
>> >> >  41    ssd    1.81929   1.00000  1.8 TiB  526 GiB  520 GiB   1.7
>> GiB  3.9
>> >> > GiB  1.3 TiB  28.21  0.92   54      up          osd.41
>> >> >  42    ssd    1.81929   1.00000  1.8 TiB  613 GiB  608 GiB  1010
>> MiB  4.4
>> >> > GiB  1.2 TiB  32.91  1.07   58      up          osd.42
>> >> >  43    ssd    1.81929   1.00000  1.8 TiB  606 GiB  600 GiB   1.7
>> GiB  4.3
>> >> > GiB  1.2 TiB  32.51  1.06   61      up          osd.43
>> >> >  44    ssd    1.81929   1.00000  1.8 TiB  583 GiB  577 GiB   1.6
>> GiB  4.2
>> >> > GiB  1.3 TiB  31.29  1.02   60      up          osd.44
>> >> >  45    ssd    1.81929   1.00000  1.8 TiB  618 GiB  613 GiB   1.4
>> GiB  4.3
>> >> > GiB  1.2 TiB  33.18  1.08   62      up          osd.45
>> >> >  46    ssd    1.81929   1.00000  1.8 TiB  550 GiB  544 GiB   1.5
>> GiB  4.2
>> >> > GiB  1.3 TiB  29.50  0.96   54      up          osd.46
>> >> >  47    ssd    1.81929   1.00000  1.8 TiB  526 GiB  522 GiB   692
>> MiB  3.7
>> >> > GiB  1.3 TiB  28.25  0.92   50      up          osd.47
>> >> >  -9          29.10864         -   29 TiB  9.0 TiB  8.9 TiB    26
>> GiB   68
>> >> > GiB   20 TiB  31.04  1.01    -              host ud-04
>> >> >  48    ssd    1.81929   1.00000  1.8 TiB  540 GiB  534 GiB   2.2
>> GiB  3.6
>> >> > GiB  1.3 TiB  28.96  0.94   58      up          osd.48
>> >> >  49    ssd    1.81929   1.00000  1.8 TiB  617 GiB  611 GiB   1.4
>> GiB  4.5
>> >> > GiB  1.2 TiB  33.11  1.07   61      up          osd.49
>> >> >  50    ssd    1.81929   1.00000  1.8 TiB  618 GiB  612 GiB   1.2
>> GiB  4.8
>> >> > GiB  1.2 TiB  33.17  1.08   61      up          osd.50
>> >> >  51    ssd    1.81929   1.00000  1.8 TiB  618 GiB  612 GiB   1.5
>> GiB  4.5
>> >> > GiB  1.2 TiB  33.19  1.08   61      up          osd.51
>> >> >  52    ssd    1.81929   1.00000  1.8 TiB  526 GiB  521 GiB   1.4
>> GiB  4.1
>> >> > GiB  1.3 TiB  28.25  0.92   53      up          osd.52
>> >> >  53    ssd    1.81929   1.00000  1.8 TiB  618 GiB  611 GiB   2.4
>> GiB  4.3
>> >> > GiB  1.2 TiB  33.17  1.08   66      up          osd.53
>> >> >  54    ssd    1.81929   1.00000  1.8 TiB  550 GiB  544 GiB   1.5
>> GiB  4.3
>> >> > GiB  1.3 TiB  29.54  0.96   55      up          osd.54
>> >> >  55    ssd    1.81929   1.00000  1.8 TiB  527 GiB  522 GiB   1.3
>> GiB  4.0
>> >> > GiB  1.3 TiB  28.29  0.92   52      up          osd.55
>> >> >  56    ssd    1.81929   1.00000  1.8 TiB  525 GiB  519 GiB   1.2
>> GiB  4.1
>> >> > GiB  1.3 TiB  28.16  0.91   52      up          osd.56
>> >> >  57    ssd    1.81929   1.00000  1.8 TiB  615 GiB  609 GiB   2.3
>> GiB  4.2
>> >> > GiB  1.2 TiB  33.03  1.07   65      up          osd.57
>> >> >  58    ssd    1.81929   1.00000  1.8 TiB  527 GiB  522 GiB   1.6
>> GiB  3.7
>> >> > GiB  1.3 TiB  28.31  0.92   55      up          osd.58
>> >> >  59    ssd    1.81929   1.00000  1.8 TiB  615 GiB  609 GiB   1.2
>> GiB  4.6
>> >> > GiB  1.2 TiB  33.01  1.07   60      up          osd.59
>> >> >  60    ssd    1.81929   1.00000  1.8 TiB  594 GiB  588 GiB   1.2
>> GiB  4.4
>> >> > GiB  1.2 TiB  31.88  1.03   59      up          osd.60
>> >> >  61    ssd    1.81929   1.00000  1.8 TiB  616 GiB  610 GiB   1.9
>> GiB  4.1
>> >> > GiB  1.2 TiB  33.04  1.07   64      up          osd.61
>> >> >  62    ssd    1.81929   1.00000  1.8 TiB  620 GiB  614 GiB   1.9
>> GiB  4.4
>> >> > GiB  1.2 TiB  33.27  1.08   63      up          osd.62
>> >> >  63    ssd    1.81929   1.00000  1.8 TiB  527 GiB  522 GiB   1.5
>> GiB  4.0
>> >> > GiB  1.3 TiB  28.30  0.92   53      up          osd.63
>> >> > -11          29.10864         -   29 TiB  9.0 TiB  8.9 TiB    23
>> GiB   65
>> >> > GiB   20 TiB  30.91  1.00    -              host ud-05
>> >> >  64    ssd    1.81929   1.00000  1.8 TiB  608 GiB  601 GiB   2.3
>> GiB  4.5
>> >> > GiB  1.2 TiB  32.62  1.06   65      up          osd.64
>> >> >  65    ssd    1.81929   1.00000  1.8 TiB  606 GiB  601 GiB   628
>> MiB  4.2
>> >> > GiB  1.2 TiB  32.53  1.06   57      up          osd.65
>> >> >  66    ssd    1.81929   1.00000  1.8 TiB  583 GiB  578 GiB   1.3
>> GiB  4.3
>> >> > GiB  1.2 TiB  31.31  1.02   57      up          osd.66
>> >> >  67    ssd    1.81929   1.00000  1.8 TiB  537 GiB  533 GiB   436
>> MiB  3.6
>> >> > GiB  1.3 TiB  28.82  0.94   50      up          osd.67
>> >> >  68    ssd    1.81929   1.00000  1.8 TiB  541 GiB  535 GiB   2.5
>> GiB  3.8
>> >> > GiB  1.3 TiB  29.04  0.94   59      up          osd.68
>> >> >  69    ssd    1.81929   1.00000  1.8 TiB  606 GiB  601 GiB   1.1
>> GiB  4.4
>> >> > GiB  1.2 TiB  32.55  1.06   59      up          osd.69
>> >> >  70    ssd    1.81929   1.00000  1.8 TiB  604 GiB  598 GiB   1.8
>> GiB  4.1
>> >> > GiB  1.2 TiB  32.44  1.05   63      up          osd.70
>> >> >  71    ssd    1.81929   1.00000  1.8 TiB  606 GiB  600 GiB   1.9
>> GiB  4.5
>> >> > GiB  1.2 TiB  32.53  1.06   62      up          osd.71
>> >> >  72    ssd    1.81929   1.00000  1.8 TiB  602 GiB  598 GiB   612
>> MiB  4.1
>> >> > GiB  1.2 TiB  32.33  1.05   57      up          osd.72
>> >> >  73    ssd    1.81929   1.00000  1.8 TiB  571 GiB  565 GiB   1.8
>> GiB  4.5
>> >> > GiB  1.3 TiB  30.65  0.99   58      up          osd.73
>> >> >  74    ssd    1.81929   1.00000  1.8 TiB  608 GiB  602 GiB   1.8
>> GiB  4.2
>> >> > GiB  1.2 TiB  32.62  1.06   61      up          osd.74
>> >> >  75    ssd    1.81929   1.00000  1.8 TiB  536 GiB  531 GiB   1.9
>> GiB  3.5
>> >> > GiB  1.3 TiB  28.80  0.93   57      up          osd.75
>> >> >  76    ssd    1.81929   1.00000  1.8 TiB  605 GiB  599 GiB   1.4
>> GiB  4.5
>> >> > GiB  1.2 TiB  32.48  1.05   60      up          osd.76
>> >> >  77    ssd    1.81929   1.00000  1.8 TiB  537 GiB  532 GiB   1.2
>> GiB  3.9
>> >> > GiB  1.3 TiB  28.84  0.94   52      up          osd.77
>> >> >  78    ssd    1.81929   1.00000  1.8 TiB  525 GiB  520 GiB   1.3
>> GiB  3.8
>> >> > GiB  1.3 TiB  28.20  0.92   52      up          osd.78
>> >> >  79    ssd    1.81929   1.00000  1.8 TiB  536 GiB  531 GiB   1.1
>> GiB  3.3
>> >> > GiB  1.3 TiB  28.76  0.93   53      up          osd.79
>> >> >                           TOTAL  146 TiB   45 TiB   44 TiB   119
>> GiB  333
>> >> > GiB  101 TiB  30.81
>> >> > MIN/MAX VAR: 0.91/1.08  STDDEV: 1.90
>> >> >
>> >> >
>> >> >
>> >> > Eugen Block <eblock@xxxxxx>, 25 Oca 2024 Per, 16:52 tarihinde şunu
>> >> yazdı:
>> >> >
>> >> >> There is no definitive answer wrt mds tuning. As it is everywhere
>> >> >> mentioned, it's about finding the right setup for your specific
>> >> >> workload. If you can synthesize your workload (maybe scale down a
>> bit)
>> >> >> try optimizing it in a test cluster without interrupting your
>> >> >> developers too much.
>> >> >> But what you haven't explained yet is what are you experiencing as a
>> >> >> performance issue? Do you have numbers or a detailed description?
>> >> >>  From the fs status output you didn't seem to have too much activity
>> >> >> going on (around 140 requests per second), but that's probably not
>> the
>> >> >> usual traffic? What does ceph report in its client IO output?
>> >> >> Can you paste the 'ceph osd df' output as well?
>> >> >> Do you have dedicated MDS servers or are they colocated with other
>> >> >> services?
>> >> >>
>> >> >> Zitat von Özkan Göksu <ozkangksu@xxxxxxxxx>:
>> >> >>
>> >> >> > Hello  Eugen.
>> >> >> >
>> >> >> > I read all of your MDS related topics and thank you so much for
>> your
>> >> >> effort
>> >> >> > on this.
>> >> >> > There is not much information and I couldn't find a MDS tuning
>> guide
>> >> at
>> >> >> > all. It  seems that you are the correct person to discuss mds
>> >> debugging
>> >> >> and
>> >> >> > tuning.
>> >> >> >
>> >> >> > Do you have any documents or may I learn what is the proper way to
>> >> debug
>> >> >> > MDS and clients ?
>> >> >> > Which debug logs will guide me to understand the limitations and
>> will
>> >> >> help
>> >> >> > to tune according to the data flow?
>> >> >> >
>> >> >> > While searching, I find this:
>> >> >> >
>> >> >>
>> >>
>> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YO4SGL4DJQ6EKUBUIHKTFSW72ZJ3XLZS/
>> >> >> > quote:"A user running VSCodium, keeping 15k caps open.. the
>> >> opportunistic
>> >> >> > caps recall eventually starts recalling those but the (el7 kernel)
>> >> client
>> >> >> > won't release them. Stopping Codium seems to be the only way to
>> >> release."
>> >> >> >
>> >> >> > Because of this I think I also need to play around with the client
>> >> side
>> >> >> too.
>> >> >> >
>> >> >> > My main goal is increasing the speed and reducing the latency and
>> I
>> >> >> wonder
>> >> >> > if these ideas are correct or not:
>> >> >> > - Maybe I need to increase client side cache size because via each
>> >> >> client,
>> >> >> > multiple users request a lot of objects and clearly the
>> >> >> > client_cache_size=16 default is not enough.
>> >> >> > -  Maybe I need to increase client side maximum cache limit for
>> >> >> > object "client_oc_max_objects=1000 to 10000" and data
>> >> >> "client_oc_size=200mi
>> >> >> > to 400mi"
>> >> >> > - The client cache cleaning threshold is not aggressive enough to
>> keep
>> >> >> the
>> >> >> > free cache size in the desired range. I need to make it
>> aggressive but
>> >> >> this
>> >> >> > should not reduce speed and increase latency.
>> >> >> >
>> >> >> > mds_cache_memory_limit=4gi to 16gi
>> >> >> > client_oc_max_objects=1000 to 10000
>> >> >> > client_oc_size=200mi to 400mi
>> >> >> > client_permissions=false #to reduce latency.
>> >> >> > client_cache_size=16 to 128
>> >> >> >
>> >> >> >
>> >> >> > What do you think?
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >>
>> >>
>> >>
>> >>
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx