Re: 1 clients failing to respond to cache pressure (quincy:17.2.6)

Özkan Göksu <ozkangksu@xxxxxxxxx> · Sat, 27 Jan 2024 02:36:18 +0300

Hello Frank.

I have 84 clients (high-end servers) with: Ubuntu 20.04.5 LTS - Kernel:
Linux 5.4.0-125-generic

My cluster 17.2.6 quincy.
I have some client nodes with "ceph-common/stable,now 17.2.7-1focal" I
wonder using new version clients is the main problem?
Maybe I have a communication error. For example I hit this problem and I
can not collect client stats "https://github.com/ceph/ceph/pull/52127/files";

Best regards.

Frank Schilder <frans@xxxxxx>, 26 Oca 2024 Cum, 14:53 tarihinde şunu yazdı:

> Hi, this message is one of those that are often spurious. I don't recall
> in which thread/PR/tracker I read it, but the story was something like that:
>
> If an MDS gets under memory pressure it will request dentry items back
> from *all* clients, not just the active ones or the ones holding many of
> them. If you have a client that's below the min-threshold for dentries (its
> one of the client/mds tuning options), it will not respond. This client
> will be flagged as not responding, which is a false positive.
>
> I believe the devs are working on a fix to get rid of these spurious
> warnings. There is a "bug/feature" in the MDS that does not clear this
> warning flag for inactive clients. Hence, the message hangs and never
> disappears. I usually clear it with a "echo 3 > /proc/sys/vm/drop_caches"
> on the client. However, except for being annoying in the dashboard, it has
> no performance or otherwise negative impact.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Eugen Block <eblock@xxxxxx>
> Sent: Friday, January 26, 2024 10:05 AM
> To: Özkan Göksu
> Cc: ceph-users@xxxxxxx
> Subject:  Re: 1 clients failing to respond to cache pressure
> (quincy:17.2.6)
>
> Performance for small files is more about IOPS rather than throughput,
> and the IOPS in your fio tests look okay to me. What you could try is
> to split the PGs to get around 150 or 200 PGs per OSD. You're
> currently at around 60 according to the ceph osd df output. Before you
> do that, can you share 'ceph pg ls-by-pool cephfs.ud-data.data |
> head'? I don't need the whole output, just to see how many objects
> each PG has. We had a case once where that helped, but it was an older
> cluster and the pool was backed by HDDs and separate rocksDB on SSDs.
> So this might not be the solution here, but it could improve things as
> well.
>
>
> Zitat von Özkan Göksu <ozkangksu@xxxxxxxxx>:
>
> > Every user has a 1x subvolume and I only have 1 pool.
> > At the beginning we were using each subvolume for ldap home directory +
> > user data.
> > When a user logins any docker on any host, it was using the cluster for
> > home and the for user related data, we was have second directory in the
> > same subvolume.
> > Time to time users were feeling a very slow home environment and after a
> > month it became almost impossible to use home. VNC sessions became
> > unresponsive and slow etc.
> >
> > 2 weeks ago, I had to migrate home to a ZFS storage and now the overall
> > performance is better for only user_data without home.
> > But still the performance is not good enough as I expected because of the
> > problems related to MDS.
> > The usage is low but allocation is high and Cpu usage is high. You saw
> the
> > IO Op/s, it's nothing but allocation is high.
> >
> > I develop a fio benchmark script and I run the script on 4x test server
> at
> > the same time, the results are below:
> > Script:
> >
> https://github.com/ozkangoksu/benchmark/blob/8f5df87997864c25ef32447e02fcd41fda0d2a67/iobench.sh
> >
> >
> https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-01.txt
> >
> https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-02.txt
> >
> https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-03.txt
> >
> https://github.com/ozkangoksu/benchmark/blob/main/benchmark-results/iobench-client-04.txt
> >
> > While running benchmark, I take sample values for each type of iobench
> run.
> >
> > Seq Write benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
> >     client:   70 MiB/s rd, 762 MiB/s wr, 337 op/s rd, 24.41k op/s wr
> >     client:   60 MiB/s rd, 551 MiB/s wr, 303 op/s rd, 35.12k op/s wr
> >     client:   13 MiB/s rd, 161 MiB/s wr, 101 op/s rd, 41.30k op/s wr
> >
> > Seq Read benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
> >     client:   1.6 GiB/s rd, 219 KiB/s wr, 28.76k op/s rd, 89 op/s wr
> >     client:   370 MiB/s rd, 475 KiB/s wr, 90.38k op/s rd, 89 op/s wr
> >
> > Rand Write benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
> >     client:   63 MiB/s rd, 1.5 GiB/s wr, 8.77k op/s rd, 5.50k op/s wr
> >     client:   14 MiB/s rd, 1.8 GiB/s wr, 81 op/s rd, 13.86k op/s wr
> >     client:   6.6 MiB/s rd, 1.2 GiB/s wr, 61 op/s rd, 30.13k op/s wr
> >
> > Rand Read benchmarking: size=1G,direct=1,numjobs=3,iodepth=32
> >     client:   317 MiB/s rd, 841 MiB/s wr, 426 op/s rd, 10.98k op/s wr
> >     client:   2.8 GiB/s rd, 882 MiB/s wr, 25.68k op/s rd, 291 op/s wr
> >     client:   4.0 GiB/s rd, 226 MiB/s wr, 89.63k op/s rd, 124 op/s wr
> >     client:   2.4 GiB/s rd, 295 KiB/s wr, 197.86k op/s rd, 20 op/s wr
> >
> > It seems I only have problems with the 4K,8K,16K other sector sizes.
> >
> >
> >
> >
> > Eugen Block <eblock@xxxxxx>, 25 Oca 2024 Per, 19:06 tarihinde şunu
> yazdı:
> >
> >> I understand that your MDS shows a high CPU usage, but other than that
> >> what is your performance issue? Do users complain? Do some operations
> >> take longer than expected? Are OSDs saturated during those phases?
> >> Because the cache pressure messages don’t necessarily mean that users
> >> will notice.
> >> MDS daemons are single-threaded so that might be a bottleneck. In that
> >> case multi-active mds might help, which you already tried and
> >> experienced OOM killers. But you might have to disable the mds
> >> balancer as someone else mentioned. And then you could think about
> >> pinning, is it possible to split the CephFS into multiple
> >> subdirectories and pin them to different ranks?
> >> But first I’d still like to know what the performance issue really is.
> >>
> >> Zitat von Özkan Göksu <ozkangksu@xxxxxxxxx>:
> >>
> >> > I will try my best to explain my situation.
> >> >
> >> > I don't have a separate mds server. I have 5 identical nodes, 3 of
> them
> >> > mons, and I use the other 2 as active and standby mds. (currently I
> have
> >> > left overs from max_mds 4)
> >> >
> >> > root@ud-01:~# ceph -s
> >> >   cluster:
> >> >     id:     e42fd4b0-313b-11ee-9a00-31da71873773
> >> >     health: HEALTH_WARN
> >> >             1 clients failing to respond to cache pressure
> >> >
> >> >   services:
> >> >     mon: 3 daemons, quorum ud-01,ud-02,ud-03 (age 9d)
> >> >     mgr: ud-01.qycnol(active, since 8d), standbys: ud-02.tfhqfd
> >> >     mds: 1/1 daemons up, 4 standby
> >> >     osd: 80 osds: 80 up (since 9d), 80 in (since 5M)
> >> >
> >> >   data:
> >> >     volumes: 1/1 healthy
> >> >     pools:   3 pools, 2305 pgs
> >> >     objects: 106.58M objects, 25 TiB
> >> >     usage:   45 TiB used, 101 TiB / 146 TiB avail
> >> >     pgs:     2303 active+clean
> >> >              2    active+clean+scrubbing+deep
> >> >
> >> >   io:
> >> >     client:   16 MiB/s rd, 3.4 MiB/s wr, 77 op/s rd, 23 op/s wr
> >> >
> >> > ------------------------------
> >> > root@ud-01:~# ceph fs status
> >> > ud-data - 84 clients
> >> > =======
> >> > RANK  STATE           MDS              ACTIVITY     DNS    INOS   DIRS
> >> > CAPS
> >> >  0    active  ud-data.ud-02.xcoojt  Reqs:   40 /s  2579k  2578k   169k
> >> >  3048k
> >> >         POOL           TYPE     USED  AVAIL
> >> > cephfs.ud-data.meta  metadata   136G  44.9T
> >> > cephfs.ud-data.data    data    44.3T  44.9T
> >> >
> >> > ------------------------------
> >> > root@ud-01:~# ceph health detail
> >> > HEALTH_WARN 1 clients failing to respond to cache pressure
> >> > [WRN] MDS_CLIENT_RECALL: 1 clients failing to respond to cache
> pressure
> >> >     mds.ud-data.ud-02.xcoojt(mds.0): Client bmw-m4 failing to respond
> to
> >> > cache pressure client_id: 1275577
> >> >
> >> > ------------------------------
> >> > When I check the failing client with session ls I see only "num_caps:
> >> 12298"
> >> >
> >> > ceph tell mds.ud-data.ud-02.xcoojt session ls | jq -r '.[] |
> "clientid:
> >> > \(.id)= num_caps: \(.num_caps), num_leases: \(.num_leases),
> >> > request_load_avg: \(.request_load_avg), num_completed_requests:
> >> > \(.num_completed_requests), num_completed_flushes:
> >> > \(.num_completed_flushes)"' | sort -n -t: -k3
> >> >
> >> > clientid: 1275577= num_caps: 12298, num_leases: 0, request_load_avg:
> 0,
> >> > num_completed_requests: 0, num_completed_flushes: 1
> >> > clientid: 1294542= num_caps: 13000, num_leases: 12, request_load_avg:
> >> 105,
> >> > num_completed_requests: 0, num_completed_flushes: 6
> >> > clientid: 1282187= num_caps: 16869, num_leases: 1, request_load_avg:
> 0,
> >> > num_completed_requests: 0, num_completed_flushes: 1
> >> > clientid: 1275589= num_caps: 18943, num_leases: 0, request_load_avg:
> 52,
> >> > num_completed_requests: 0, num_completed_flushes: 1
> >> > clientid: 1282154= num_caps: 24747, num_leases: 1, request_load_avg:
> 57,
> >> > num_completed_requests: 2, num_completed_flushes: 2
> >> > clientid: 1275553= num_caps: 25120, num_leases: 2, request_load_avg:
> 116,
> >> > num_completed_requests: 2, num_completed_flushes: 8
> >> > clientid: 1282142= num_caps: 27185, num_leases: 6, request_load_avg:
> 128,
> >> > num_completed_requests: 0, num_completed_flushes: 8
> >> > clientid: 1275535= num_caps: 40364, num_leases: 6, request_load_avg:
> 111,
> >> > num_completed_requests: 2, num_completed_flushes: 8
> >> > clientid: 1282130= num_caps: 41483, num_leases: 0, request_load_avg:
> 135,
> >> > num_completed_requests: 0, num_completed_flushes: 1
> >> > clientid: 1275547= num_caps: 42953, num_leases: 4, request_load_avg:
> 119,
> >> > num_completed_requests: 2, num_completed_flushes: 6
> >> > clientid: 1282139= num_caps: 45435, num_leases: 27, request_load_avg:
> 84,
> >> > num_completed_requests: 2, num_completed_flushes: 34
> >> > clientid: 1282136= num_caps: 48374, num_leases: 8, request_load_avg:
> 0,
> >> > num_completed_requests: 1, num_completed_flushes: 1
> >> > clientid: 1275532= num_caps: 48664, num_leases: 7, request_load_avg:
> 115,
> >> > num_completed_requests: 2, num_completed_flushes: 8
> >> > clientid: 1191789= num_caps: 130319, num_leases: 0, request_load_avg:
> >> 1753,
> >> > num_completed_requests: 0, num_completed_flushes: 0
> >> > clientid: 1275571= num_caps: 139488, num_leases: 0, request_load_avg:
> 2,
> >> > num_completed_requests: 0, num_completed_flushes: 1
> >> > clientid: 1282133= num_caps: 145487, num_leases: 0, request_load_avg:
> 8,
> >> > num_completed_requests: 1, num_completed_flushes: 1
> >> > clientid: 1534496= num_caps: 1041316, num_leases: 0,
> request_load_avg: 0,
> >> > num_completed_requests: 0, num_completed_flushes: 1
> >> >
> >> > ------------------------------
> >> > When I check the dashboard/service/mds I see %120+ CPU usage on active
> >> MDS
> >> > but on the host everything is almost idle and disk waits are very low.
> >> >
> >> > avg-cpu:  %user   %nice %system %iowait  %steal   %idle
> >> >            0.61    0.00    0.38    0.41    0.00   98.60
> >> >
> >> > Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz
>  w/s
> >> >   wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s
> >> %drqm
> >> > d_await dareq-sz     f/s f_await  aqu-sz  %util
> >> > sdc              2.00      0.01     0.00   0.00    0.50     6.00
>  20.00
> >> >    0.04     0.00   0.00    0.50     2.00    0.00      0.00     0.00
> >>  0.00
> >> >    0.00     0.00   10.00    0.60    0.02   1.20
> >> > sdd              3.00      0.02     0.00   0.00    0.67     8.00
> 285.00
> >> >    1.84    77.00  21.27    0.44     6.61    0.00      0.00     0.00
> >>  0.00
> >> >    0.00     0.00  114.00    0.83    0.22  22.40
> >> > sde              1.00      0.01     0.00   0.00    1.00     8.00
>  36.00
> >> >    0.08     3.00   7.69    0.64     2.33    0.00      0.00     0.00
> >>  0.00
> >> >    0.00     0.00   18.00    0.67    0.04   1.60
> >> > sdf              5.00      0.04     0.00   0.00    0.40     7.20
>  40.00
> >> >    0.09     3.00   6.98    0.53     2.30    0.00      0.00     0.00
> >>  0.00
> >> >    0.00     0.00   20.00    0.70    0.04   2.00
> >> > sdg             11.00      0.08     0.00   0.00    0.73     7.27
>  36.00
> >> >    0.09     4.00  10.00    0.50     2.44    0.00      0.00     0.00
> >>  0.00
> >> >    0.00     0.00   18.00    0.72    0.04   3.20
> >> > sdh              5.00      0.03     0.00   0.00    0.60     5.60
>  46.00
> >> >    0.10     2.00   4.17    0.59     2.17    0.00      0.00     0.00
> >>  0.00
> >> >    0.00     0.00   23.00    0.83    0.05   2.80
> >> > sdi              7.00      0.04     0.00   0.00    0.43     6.29
>  36.00
> >> >    0.07     1.00   2.70    0.47     2.11    0.00      0.00     0.00
> >>  0.00
> >> >    0.00     0.00   18.00    0.61    0.03   2.40
> >> > sdj              5.00      0.04     0.00   0.00    0.80     7.20
>  42.00
> >> >    0.09     1.00   2.33    0.67     2.10    0.00      0.00     0.00
> >>  0.00
> >> >    0.00     0.00   21.00    0.81    0.05   3.20
> >> >
> >> > ------------------------------
> >> > Other than this 5x node cluster, I also have a 3x node cluster with
> >> > identical hardware but it serves for a different purpose and data
> >> workload.
> >> > In this cluster I don't have any problem and MDS default settings
> seems
> >> > enough.
> >> > The only difference between two cluster is, 5x node cluster used
> directly
> >> > by users, 3x node cluster used heavily to read and write data via
> >> projects
> >> > not by users. So allocate and de-allocate will be better.
> >> >
> >> > I guess I just have a problematic use case on the 5x node cluster and
> as
> >> I
> >> > mentioned above, I might have the similar problem but I don't know
> how to
> >> > debug it.
> >> >
> >> >
> >>
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YO4SGL4DJQ6EKUBUIHKTFSW72ZJ3XLZS/
> >> > quote:"A user running VSCodium, keeping 15k caps open.. the
> opportunistic
> >> > caps recall eventually starts recalling those but the (el7 kernel)
> client
> >> > won't release them. Stopping Codium seems to be the only way to
> release."
> >> >
> >> > ------------------------------
> >> > Before reading the osd df you should know that I created 2x
> >> > OSD/per"CT4000MX500SSD1"
> >> > # ceph osd df tree
> >> > ID   CLASS  WEIGHT     REWEIGHT  SIZE     RAW USE  DATA     OMAP
> >> META
> >> >     AVAIL    %USE   VAR   PGS  STATUS  TYPE NAME
> >> >  -1         145.54321         -  146 TiB   45 TiB   44 TiB   119 GiB
> 333
> >> > GiB  101 TiB  30.81  1.00    -          root default
> >> >  -3          29.10864         -   29 TiB  8.9 TiB  8.8 TiB    25 GiB
>  66
> >> > GiB   20 TiB  30.54  0.99    -              host ud-01
> >> >   0    ssd    1.81929   1.00000  1.8 TiB  616 GiB  610 GiB   1.4 GiB
> 4.5
> >> > GiB  1.2 TiB  33.04  1.07   61      up          osd.0
> >> >   1    ssd    1.81929   1.00000  1.8 TiB  527 GiB  521 GiB   1.5 GiB
> 4.0
> >> > GiB  1.3 TiB  28.28  0.92   53      up          osd.1
> >> >   2    ssd    1.81929   1.00000  1.8 TiB  595 GiB  589 GiB   2.3 GiB
> 4.0
> >> > GiB  1.2 TiB  31.96  1.04   63      up          osd.2
> >> >   3    ssd    1.81929   1.00000  1.8 TiB  527 GiB  521 GiB   1.8 GiB
> 4.2
> >> > GiB  1.3 TiB  28.30  0.92   55      up          osd.3
> >> >   4    ssd    1.81929   1.00000  1.8 TiB  525 GiB  520 GiB   1.3 GiB
> 3.9
> >> > GiB  1.3 TiB  28.21  0.92   52      up          osd.4
> >> >   5    ssd    1.81929   1.00000  1.8 TiB  592 GiB  586 GiB   1.8 GiB
> 3.8
> >> > GiB  1.2 TiB  31.76  1.03   61      up          osd.5
> >> >   6    ssd    1.81929   1.00000  1.8 TiB  559 GiB  553 GiB   1.8 GiB
> 4.3
> >> > GiB  1.3 TiB  30.03  0.97   57      up          osd.6
> >> >   7    ssd    1.81929   1.00000  1.8 TiB  602 GiB  597 GiB   836 MiB
> 4.4
> >> > GiB  1.2 TiB  32.32  1.05   58      up          osd.7
> >> >   8    ssd    1.81929   1.00000  1.8 TiB  614 GiB  609 GiB   1.2 GiB
> 4.5
> >> > GiB  1.2 TiB  32.98  1.07   60      up          osd.8
> >> >   9    ssd    1.81929   1.00000  1.8 TiB  571 GiB  565 GiB   2.2 GiB
> 4.2
> >> > GiB  1.3 TiB  30.67  1.00   61      up          osd.9
> >> >  10    ssd    1.81929   1.00000  1.8 TiB  528 GiB  522 GiB   1.3 GiB
> 4.1
> >> > GiB  1.3 TiB  28.33  0.92   52      up          osd.10
> >> >  11    ssd    1.81929   1.00000  1.8 TiB  551 GiB  546 GiB   1.5 GiB
> 3.6
> >> > GiB  1.3 TiB  29.57  0.96   56      up          osd.11
> >> >  12    ssd    1.81929   1.00000  1.8 TiB  594 GiB  588 GiB   1.8 GiB
> 4.4
> >> > GiB  1.2 TiB  31.91  1.04   61      up          osd.12
> >> >  13    ssd    1.81929   1.00000  1.8 TiB  561 GiB  555 GiB   1.1 GiB
> 4.3
> >> > GiB  1.3 TiB  30.10  0.98   55      up          osd.13
> >> >  14    ssd    1.81929   1.00000  1.8 TiB  616 GiB  609 GiB   1.9 GiB
> 4.2
> >> > GiB  1.2 TiB  33.04  1.07   64      up          osd.14
> >> >  15    ssd    1.81929   1.00000  1.8 TiB  525 GiB  520 GiB   1.1 GiB
> 4.0
> >> > GiB  1.3 TiB  28.20  0.92   51      up          osd.15
> >> >  -5          29.10864         -   29 TiB  9.0 TiB  8.9 TiB    22 GiB
>  67
> >> > GiB   20 TiB  30.89  1.00    -              host ud-02
> >> >  16    ssd    1.81929   1.00000  1.8 TiB  617 GiB  611 GiB   1.7 GiB
> 4.7
> >> > GiB  1.2 TiB  33.12  1.08   63      up          osd.16
> >> >  17    ssd    1.81929   1.00000  1.8 TiB  582 GiB  577 GiB   1.6 GiB
> 4.0
> >> > GiB  1.3 TiB  31.26  1.01   59      up          osd.17
> >> >  18    ssd    1.81929   1.00000  1.8 TiB  583 GiB  578 GiB   418 MiB
> 4.0
> >> > GiB  1.3 TiB  31.29  1.02   54      up          osd.18
> >> >  19    ssd    1.81929   1.00000  1.8 TiB  550 GiB  544 GiB   1.5 GiB
> 4.0
> >> > GiB  1.3 TiB  29.50  0.96   56      up          osd.19
> >> >  20    ssd    1.81929   1.00000  1.8 TiB  551 GiB  546 GiB   1.1 GiB
> 4.1
> >> > GiB  1.3 TiB  29.57  0.96   54      up          osd.20
> >> >  21    ssd    1.81929   1.00000  1.8 TiB  616 GiB  610 GiB   1.3 GiB
> 4.4
> >> > GiB  1.2 TiB  33.04  1.07   60      up          osd.21
> >> >  22    ssd    1.81929   1.00000  1.8 TiB  573 GiB  567 GiB   1.6 GiB
> 4.1
> >> > GiB  1.3 TiB  30.75  1.00   58      up          osd.22
> >> >  23    ssd    1.81929   1.00000  1.8 TiB  616 GiB  610 GiB   1.3 GiB
> 4.3
> >> > GiB  1.2 TiB  33.06  1.07   60      up          osd.23
> >> >  24    ssd    1.81929   1.00000  1.8 TiB  539 GiB  534 GiB   844 MiB
> 3.8
> >> > GiB  1.3 TiB  28.92  0.94   51      up          osd.24
> >> >  25    ssd    1.81929   1.00000  1.8 TiB  583 GiB  576 GiB   2.1 GiB
> 4.1
> >> > GiB  1.3 TiB  31.27  1.02   61      up          osd.25
> >> >  26    ssd    1.81929   1.00000  1.8 TiB  617 GiB  611 GiB   1.3 GiB
> 4.6
> >> > GiB  1.2 TiB  33.12  1.08   61      up          osd.26
> >> >  27    ssd    1.81929   1.00000  1.8 TiB  537 GiB  532 GiB   1.2 GiB
> 4.1
> >> > GiB  1.3 TiB  28.84  0.94   53      up          osd.27
> >> >  28    ssd    1.81929   1.00000  1.8 TiB  527 GiB  522 GiB   1.3 GiB
> 4.2
> >> > GiB  1.3 TiB  28.29  0.92   53      up          osd.28
> >> >  29    ssd    1.81929   1.00000  1.8 TiB  594 GiB  588 GiB   1.5 GiB
> 4.6
> >> > GiB  1.2 TiB  31.91  1.04   59      up          osd.29
> >> >  30    ssd    1.81929   1.00000  1.8 TiB  528 GiB  523 GiB   1.4 GiB
> 4.1
> >> > GiB  1.3 TiB  28.35  0.92   53      up          osd.30
> >> >  31    ssd    1.81929   1.00000  1.8 TiB  594 GiB  589 GiB   1.6 GiB
> 3.8
> >> > GiB  1.2 TiB  31.89  1.03   61      up          osd.31
> >> >  -7          29.10864         -   29 TiB  8.9 TiB  8.8 TiB    23 GiB
>  67
> >> > GiB   20 TiB  30.66  1.00    -              host ud-03
> >> >  32    ssd    1.81929   1.00000  1.8 TiB  593 GiB  588 GiB   1.1 GiB
> 4.3
> >> > GiB  1.2 TiB  31.84  1.03   57      up          osd.32
> >> >  33    ssd    1.81929   1.00000  1.8 TiB  617 GiB  611 GiB   1.8 GiB
> 4.4
> >> > GiB  1.2 TiB  33.13  1.08   63      up          osd.33
> >> >  34    ssd    1.81929   1.00000  1.8 TiB  537 GiB  532 GiB   2.0 GiB
> 3.8
> >> > GiB  1.3 TiB  28.84  0.94   59      up          osd.34
> >> >  35    ssd    1.81929   1.00000  1.8 TiB  562 GiB  556 GiB   1.7 GiB
> 4.2
> >> > GiB  1.3 TiB  30.16  0.98   58      up          osd.35
> >> >  36    ssd    1.81929   1.00000  1.8 TiB  529 GiB  523 GiB   1.3 GiB
> 3.9
> >> > GiB  1.3 TiB  28.38  0.92   52      up          osd.36
> >> >  37    ssd    1.81929   1.00000  1.8 TiB  527 GiB  521 GiB   1.7 GiB
> 4.2
> >> > GiB  1.3 TiB  28.28  0.92   55      up          osd.37
> >> >  38    ssd    1.81929   1.00000  1.8 TiB  574 GiB  568 GiB   1.2 GiB
> 4.3
> >> > GiB  1.3 TiB  30.79  1.00   55      up          osd.38
> >> >  39    ssd    1.81929   1.00000  1.8 TiB  605 GiB  599 GiB   1.6 GiB
> 4.2
> >> > GiB  1.2 TiB  32.48  1.05   61      up          osd.39
> >> >  40    ssd    1.81929   1.00000  1.8 TiB  573 GiB  567 GiB   1.2 GiB
> 4.4
> >> > GiB  1.3 TiB  30.76  1.00   56      up          osd.40
> >> >  41    ssd    1.81929   1.00000  1.8 TiB  526 GiB  520 GiB   1.7 GiB
> 3.9
> >> > GiB  1.3 TiB  28.21  0.92   54      up          osd.41
> >> >  42    ssd    1.81929   1.00000  1.8 TiB  613 GiB  608 GiB  1010 MiB
> 4.4
> >> > GiB  1.2 TiB  32.91  1.07   58      up          osd.42
> >> >  43    ssd    1.81929   1.00000  1.8 TiB  606 GiB  600 GiB   1.7 GiB
> 4.3
> >> > GiB  1.2 TiB  32.51  1.06   61      up          osd.43
> >> >  44    ssd    1.81929   1.00000  1.8 TiB  583 GiB  577 GiB   1.6 GiB
> 4.2
> >> > GiB  1.3 TiB  31.29  1.02   60      up          osd.44
> >> >  45    ssd    1.81929   1.00000  1.8 TiB  618 GiB  613 GiB   1.4 GiB
> 4.3
> >> > GiB  1.2 TiB  33.18  1.08   62      up          osd.45
> >> >  46    ssd    1.81929   1.00000  1.8 TiB  550 GiB  544 GiB   1.5 GiB
> 4.2
> >> > GiB  1.3 TiB  29.50  0.96   54      up          osd.46
> >> >  47    ssd    1.81929   1.00000  1.8 TiB  526 GiB  522 GiB   692 MiB
> 3.7
> >> > GiB  1.3 TiB  28.25  0.92   50      up          osd.47
> >> >  -9          29.10864         -   29 TiB  9.0 TiB  8.9 TiB    26 GiB
>  68
> >> > GiB   20 TiB  31.04  1.01    -              host ud-04
> >> >  48    ssd    1.81929   1.00000  1.8 TiB  540 GiB  534 GiB   2.2 GiB
> 3.6
> >> > GiB  1.3 TiB  28.96  0.94   58      up          osd.48
> >> >  49    ssd    1.81929   1.00000  1.8 TiB  617 GiB  611 GiB   1.4 GiB
> 4.5
> >> > GiB  1.2 TiB  33.11  1.07   61      up          osd.49
> >> >  50    ssd    1.81929   1.00000  1.8 TiB  618 GiB  612 GiB   1.2 GiB
> 4.8
> >> > GiB  1.2 TiB  33.17  1.08   61      up          osd.50
> >> >  51    ssd    1.81929   1.00000  1.8 TiB  618 GiB  612 GiB   1.5 GiB
> 4.5
> >> > GiB  1.2 TiB  33.19  1.08   61      up          osd.51
> >> >  52    ssd    1.81929   1.00000  1.8 TiB  526 GiB  521 GiB   1.4 GiB
> 4.1
> >> > GiB  1.3 TiB  28.25  0.92   53      up          osd.52
> >> >  53    ssd    1.81929   1.00000  1.8 TiB  618 GiB  611 GiB   2.4 GiB
> 4.3
> >> > GiB  1.2 TiB  33.17  1.08   66      up          osd.53
> >> >  54    ssd    1.81929   1.00000  1.8 TiB  550 GiB  544 GiB   1.5 GiB
> 4.3
> >> > GiB  1.3 TiB  29.54  0.96   55      up          osd.54
> >> >  55    ssd    1.81929   1.00000  1.8 TiB  527 GiB  522 GiB   1.3 GiB
> 4.0
> >> > GiB  1.3 TiB  28.29  0.92   52      up          osd.55
> >> >  56    ssd    1.81929   1.00000  1.8 TiB  525 GiB  519 GiB   1.2 GiB
> 4.1
> >> > GiB  1.3 TiB  28.16  0.91   52      up          osd.56
> >> >  57    ssd    1.81929   1.00000  1.8 TiB  615 GiB  609 GiB   2.3 GiB
> 4.2
> >> > GiB  1.2 TiB  33.03  1.07   65      up          osd.57
> >> >  58    ssd    1.81929   1.00000  1.8 TiB  527 GiB  522 GiB   1.6 GiB
> 3.7
> >> > GiB  1.3 TiB  28.31  0.92   55      up          osd.58
> >> >  59    ssd    1.81929   1.00000  1.8 TiB  615 GiB  609 GiB   1.2 GiB
> 4.6
> >> > GiB  1.2 TiB  33.01  1.07   60      up          osd.59
> >> >  60    ssd    1.81929   1.00000  1.8 TiB  594 GiB  588 GiB   1.2 GiB
> 4.4
> >> > GiB  1.2 TiB  31.88  1.03   59      up          osd.60
> >> >  61    ssd    1.81929   1.00000  1.8 TiB  616 GiB  610 GiB   1.9 GiB
> 4.1
> >> > GiB  1.2 TiB  33.04  1.07   64      up          osd.61
> >> >  62    ssd    1.81929   1.00000  1.8 TiB  620 GiB  614 GiB   1.9 GiB
> 4.4
> >> > GiB  1.2 TiB  33.27  1.08   63      up          osd.62
> >> >  63    ssd    1.81929   1.00000  1.8 TiB  527 GiB  522 GiB   1.5 GiB
> 4.0
> >> > GiB  1.3 TiB  28.30  0.92   53      up          osd.63
> >> > -11          29.10864         -   29 TiB  9.0 TiB  8.9 TiB    23 GiB
>  65
> >> > GiB   20 TiB  30.91  1.00    -              host ud-05
> >> >  64    ssd    1.81929   1.00000  1.8 TiB  608 GiB  601 GiB   2.3 GiB
> 4.5
> >> > GiB  1.2 TiB  32.62  1.06   65      up          osd.64
> >> >  65    ssd    1.81929   1.00000  1.8 TiB  606 GiB  601 GiB   628 MiB
> 4.2
> >> > GiB  1.2 TiB  32.53  1.06   57      up          osd.65
> >> >  66    ssd    1.81929   1.00000  1.8 TiB  583 GiB  578 GiB   1.3 GiB
> 4.3
> >> > GiB  1.2 TiB  31.31  1.02   57      up          osd.66
> >> >  67    ssd    1.81929   1.00000  1.8 TiB  537 GiB  533 GiB   436 MiB
> 3.6
> >> > GiB  1.3 TiB  28.82  0.94   50      up          osd.67
> >> >  68    ssd    1.81929   1.00000  1.8 TiB  541 GiB  535 GiB   2.5 GiB
> 3.8
> >> > GiB  1.3 TiB  29.04  0.94   59      up          osd.68
> >> >  69    ssd    1.81929   1.00000  1.8 TiB  606 GiB  601 GiB   1.1 GiB
> 4.4
> >> > GiB  1.2 TiB  32.55  1.06   59      up          osd.69
> >> >  70    ssd    1.81929   1.00000  1.8 TiB  604 GiB  598 GiB   1.8 GiB
> 4.1
> >> > GiB  1.2 TiB  32.44  1.05   63      up          osd.70
> >> >  71    ssd    1.81929   1.00000  1.8 TiB  606 GiB  600 GiB   1.9 GiB
> 4.5
> >> > GiB  1.2 TiB  32.53  1.06   62      up          osd.71
> >> >  72    ssd    1.81929   1.00000  1.8 TiB  602 GiB  598 GiB   612 MiB
> 4.1
> >> > GiB  1.2 TiB  32.33  1.05   57      up          osd.72
> >> >  73    ssd    1.81929   1.00000  1.8 TiB  571 GiB  565 GiB   1.8 GiB
> 4.5
> >> > GiB  1.3 TiB  30.65  0.99   58      up          osd.73
> >> >  74    ssd    1.81929   1.00000  1.8 TiB  608 GiB  602 GiB   1.8 GiB
> 4.2
> >> > GiB  1.2 TiB  32.62  1.06   61      up          osd.74
> >> >  75    ssd    1.81929   1.00000  1.8 TiB  536 GiB  531 GiB   1.9 GiB
> 3.5
> >> > GiB  1.3 TiB  28.80  0.93   57      up          osd.75
> >> >  76    ssd    1.81929   1.00000  1.8 TiB  605 GiB  599 GiB   1.4 GiB
> 4.5
> >> > GiB  1.2 TiB  32.48  1.05   60      up          osd.76
> >> >  77    ssd    1.81929   1.00000  1.8 TiB  537 GiB  532 GiB   1.2 GiB
> 3.9
> >> > GiB  1.3 TiB  28.84  0.94   52      up          osd.77
> >> >  78    ssd    1.81929   1.00000  1.8 TiB  525 GiB  520 GiB   1.3 GiB
> 3.8
> >> > GiB  1.3 TiB  28.20  0.92   52      up          osd.78
> >> >  79    ssd    1.81929   1.00000  1.8 TiB  536 GiB  531 GiB   1.1 GiB
> 3.3
> >> > GiB  1.3 TiB  28.76  0.93   53      up          osd.79
> >> >                           TOTAL  146 TiB   45 TiB   44 TiB   119 GiB
> 333
> >> > GiB  101 TiB  30.81
> >> > MIN/MAX VAR: 0.91/1.08  STDDEV: 1.90
> >> >
> >> >
> >> >
> >> > Eugen Block <eblock@xxxxxx>, 25 Oca 2024 Per, 16:52 tarihinde şunu
> >> yazdı:
> >> >
> >> >> There is no definitive answer wrt mds tuning. As it is everywhere
> >> >> mentioned, it's about finding the right setup for your specific
> >> >> workload. If you can synthesize your workload (maybe scale down a
> bit)
> >> >> try optimizing it in a test cluster without interrupting your
> >> >> developers too much.
> >> >> But what you haven't explained yet is what are you experiencing as a
> >> >> performance issue? Do you have numbers or a detailed description?
> >> >>  From the fs status output you didn't seem to have too much activity
> >> >> going on (around 140 requests per second), but that's probably not
> the
> >> >> usual traffic? What does ceph report in its client IO output?
> >> >> Can you paste the 'ceph osd df' output as well?
> >> >> Do you have dedicated MDS servers or are they colocated with other
> >> >> services?
> >> >>
> >> >> Zitat von Özkan Göksu <ozkangksu@xxxxxxxxx>:
> >> >>
> >> >> > Hello  Eugen.
> >> >> >
> >> >> > I read all of your MDS related topics and thank you so much for
> your
> >> >> effort
> >> >> > on this.
> >> >> > There is not much information and I couldn't find a MDS tuning
> guide
> >> at
> >> >> > all. It  seems that you are the correct person to discuss mds
> >> debugging
> >> >> and
> >> >> > tuning.
> >> >> >
> >> >> > Do you have any documents or may I learn what is the proper way to
> >> debug
> >> >> > MDS and clients ?
> >> >> > Which debug logs will guide me to understand the limitations and
> will
> >> >> help
> >> >> > to tune according to the data flow?
> >> >> >
> >> >> > While searching, I find this:
> >> >> >
> >> >>
> >>
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/YO4SGL4DJQ6EKUBUIHKTFSW72ZJ3XLZS/
> >> >> > quote:"A user running VSCodium, keeping 15k caps open.. the
> >> opportunistic
> >> >> > caps recall eventually starts recalling those but the (el7 kernel)
> >> client
> >> >> > won't release them. Stopping Codium seems to be the only way to
> >> release."
> >> >> >
> >> >> > Because of this I think I also need to play around with the client
> >> side
> >> >> too.
> >> >> >
> >> >> > My main goal is increasing the speed and reducing the latency and I
> >> >> wonder
> >> >> > if these ideas are correct or not:
> >> >> > - Maybe I need to increase client side cache size because via each
> >> >> client,
> >> >> > multiple users request a lot of objects and clearly the
> >> >> > client_cache_size=16 default is not enough.
> >> >> > -  Maybe I need to increase client side maximum cache limit for
> >> >> > object "client_oc_max_objects=1000 to 10000" and data
> >> >> "client_oc_size=200mi
> >> >> > to 400mi"
> >> >> > - The client cache cleaning threshold is not aggressive enough to
> keep
> >> >> the
> >> >> > free cache size in the desired range. I need to make it aggressive
> but
> >> >> this
> >> >> > should not reduce speed and increase latency.
> >> >> >
> >> >> > mds_cache_memory_limit=4gi to 16gi
> >> >> > client_oc_max_objects=1000 to 10000
> >> >> > client_oc_size=200mi to 400mi
> >> >> > client_permissions=false #to reduce latency.
> >> >> > client_cache_size=16 to 128
> >> >> >
> >> >> >
> >> >> > What do you think?
> >> >>
> >> >>
> >> >>
> >> >>
> >>
> >>
> >>
> >>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx