Re: 50% performance drop after disk failure

Michael Eichenberger <michael.eichenberger@xxxxxxxxxxxxxxxxx> · Mon, 11 Jul 2022 16:07:12 +0200

Hi Joachim

Thank you for the clarification. I had already migrated the test vm to 
another node (therefore creating a new process).

We're currently back on pre-failure performance.

We used a brute force approach: We actually turned one storage node 
after the other off and restarted them again. After this, we achieved 
pre-failure performance ... If we find a logical reason for this 
behaviour, we'll let this list know.

We are going to expand the monitoring with your suggestions and we'll be 
updating Ceph over the next few weeks (the version as well as the 
backend from FileStore to BlueStore).

Many thanks for your help. The same goes to everybody else, who helped us!

With best regards, Michael

On 10/07/2022 13.21, Joachim Kraftmayer wrote:

Hi Michael,

i meant the compaction of the leveldb of the osd, see left picture in 
link:

https://ceph.io/en/news/blog/2017/new-luminous-bluestore/#how-does-bluestore-work%3F

xfs:

https://wiki.archlinux.org/title/XFS#Defragmentation

latency is ok

load is ok

test vm:

i would shut down the test vm and then start it again to create a new 
process. Also check how the cache for the rbd client is configured.

Regards, Joachim

Am 09.07.22 um 18:24 schrieb Michael Eichenberger:
Hi Joachim

I have the following starting points:
Has the read and write latency also changed?
I don't really understand, how I'd measure the latency. What we 
measure regularly:
fio --filename=/dev/vdb     --direct=1     --rw=randrw 
    --refill_buffers     --norandommap     --randrepeat=0 
    --ioengine=libaio     --bs=4k     --rwmixread=40
 --iodepth=16     --numjobs=16     --runtime=60     --group_reporting 
    --name=4k-mixed-read-write
4k-mixed-read-write: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 
4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.7
Starting 16 processes
Jobs: 7 (f=7): 
[_(3),m(1),_(1),m(1),_(1),m(2),_(2),m(3),_(2)][10.2%][r=92KiB/s,w=592KiB/s][r=23,w=148 
IOPS][eta 08m:59s]
4k-mixed-read-write: (groupid=0, jobs=16): err= 0: pid=8208: Sat Jul 
 9 17:47:59 2022
  read: IOPS=4528, BW=17.7MiB/s (18.5MB/s)(1085MiB/61366msec)
   slat (usec): min=4, max=33583, avg=13.77, stdev=70.01
   clat (usec): min=169, max=5236.2k, avg=6859.71, stdev=49067.43
    lat (usec): min=412, max=5236.2k, avg=6876.42, stdev=49067.49
   clat percentiles (usec):
    |  1.00th=[    955],  5.00th=[   1270], 10.00th=[   1549],
    | 20.00th=[   1958], 30.00th=[   2343], 40.00th=[   2704],
    | 50.00th=[   3064], 60.00th=[   3523], 70.00th=[   4015],
    | 80.00th=[   4686], 90.00th=[   5735], 95.00th=[   7504],
    | 99.00th=[ 105382], 99.50th=[ 185598], 99.90th=[ 583009],
    | 99.95th=[ 809501], 99.99th=[1988101]
  bw (  KiB/s): min=    7, max= 2192, per=6.46%, avg=1169.56, 
stdev=439.96, samples=1900
  iops        : min=    1, max=  548, avg=292.36, stdev=110.02, 
samples=1900
 write: IOPS=6781, BW=26.5MiB/s (27.8MB/s)(1626MiB/61366msec)
   slat (usec): min=5, max=9466, avg=14.18, stdev=39.69
   clat (usec): min=1273, max=5206.5k, avg=32766.99, stdev=108764.53
    lat (usec): min=1281, max=5206.6k, avg=32784.08, stdev=108764.60
   clat percentiles (msec):
    |  1.00th=[    3],  5.00th=[    4], 10.00th=[    5], 20.00th=[ 
   7],
    | 30.00th=[    9], 40.00th=[   12], 50.00th=[   16], 60.00th=[ 
  21],
    | 70.00th=[   27], 80.00th=[   33], 90.00th=[   53], 95.00th=[ 
 106],
    | 99.00th=[  259], 99.50th=[  426], 99.90th=[ 1502], 99.95th=[ 
2333],
    | 99.99th=[ 4245]
  bw (  KiB/s): min=    6, max= 3064, per=6.40%, avg=1735.15, 
stdev=653.97, samples=1917
  iops        : min=    1, max=  766, avg=433.75, stdev=163.51, 
samples=1917
 lat (usec)   : 250=0.01%, 500=0.01%, 750=0.04%, 1000=0.53%
 lat (msec)   : 2=8.10%, 4=23.79%, 10=26.66%, 20=15.32%, 50=18.67%
 lat (msec)   : 100=3.33%, 250=2.81%, 500=0.44%, 750=0.14%, 1000=0.04%
 cpu          : usr=0.84%, sys=1.67%, ctx=777034, majf=0, minf=580
 IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, 
>=64=0.0%
    submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
    complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, 
>=64=0.0%
    issued rwts: total=277884,416138,0,0 short=0,0,0,0 dropped=0,0,0,0
    latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
  READ: bw=17.7MiB/s (18.5MB/s), 17.7MiB/s-17.7MiB/s 
(18.5MB/s-18.5MB/s), io=1085MiB (1138MB), run=61366-61366msec
 WRITE: bw=26.5MiB/s (27.8MB/s), 26.5MiB/s-26.5MiB/s 
(27.8MB/s-27.8MB/s), io=1626MiB (1705MB), run=61366-61366msec

Disk stats (read/write):
 vdb: ios=277964/416121, merge=0/7, ticks=1883252/13426595, 
in_queue=7721698, util=100.00

These values are about half as high as before the disk failure.

How does the utilisation of the hard disks look (iostat)?
An example output of one storage node:
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          3.97    0.00    2.85    0.74    0.00   92.45

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s 
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
nvme1n1           0.00     0.06    1.16   17.01    98.42   227.18 
   35.84     1.00    0.06    0.27    0.05  55.02  99.95
nvme2n1           0.00     0.31    5.14   83.86   380.94  2736.85 
   70.06     1.02    0.29    0.19    0.29  11.23  99.97
nvme0n1           0.27    17.17  240.90 1190.43 15757.58 36318.89 
   72.77     5.69    3.28    0.20    3.91   0.70 100.00
sda               0.19     0.97    0.22   29.11    11.29  2848.95 
  195.02     0.05    1.69    0.27    1.70   0.23   0.67
sdd               0.05     3.62   24.85  161.70  2217.69  5640.61 
   84.25     0.73    3.93    0.58    4.45   0.17   3.22
sdc               0.05     1.75   26.03  152.82  1959.02  5376.83 
   82.03     0.45    2.52    0.55    2.85   0.18   3.25
sde               0.06     4.52   28.41  154.77  2444.19  4660.30 
   77.57     0.59    3.22    0.53    3.72   0.15   2.68
sdf               0.08     5.16   28.99  192.36  3150.86  7024.10 
   91.94     1.39    6.27    0.74    7.11   0.18   3.93
sdg               0.05     3.39   26.78  181.90  2592.68  7402.03 
   95.79     1.30    6.21    0.67    7.03   0.18   3.86
sdh               0.11     5.33   39.63  231.45  3214.61  7086.68 
   76.00     1.41    5.22    0.63    6.00   0.16   4.40
md1               0.00     0.00    0.49   25.44    14.30  2846.64 
  220.69     0.00    0.00    0.00    0.00   0.00   0.00
md0               0.00     0.00    0.00    0.00     0.12     0.01 
   51.55     0.00    0.00    0.00    0.00   0.00   0.00
sdi               0.01     0.95    0.08   29.14     3.33  2848.95 
  195.26     0.05    1.71    0.28    1.71   0.24   0.70
sdk               0.08     4.89   26.78  196.32  2292.12  6333.01 
   77.32     0.90    4.01    0.59    4.48   0.17   3.76
sdj               0.05     1.89   18.91  126.88  2278.79  4909.42 
   98.61     0.53    3.62    0.76    4.05   0.20   2.88
sdm               0.07     4.83   35.83  203.28  3197.73  7063.41 
   85.83     0.87    3.66    0.65    4.19   0.18   4.20
sdl               0.07     4.60   28.30  183.15  2414.73  6478.64 
   84.12     0.83    3.91    0.59    4.42   0.17   3.64
sdp               0.05     3.30   26.35  154.14  2485.64  6156.47 
   95.76     1.00    5.56    0.67    6.39   0.18   3.30
sdn               0.07     2.66   27.31  153.58  3237.96  6450.18 
  107.12     0.97    5.35    0.79    6.16   0.20   3.56
sdq               0.06     3.33   27.89  129.79  2768.62  4435.46 
   91.37     0.59    3.76    0.62    4.44   0.19   2.98
sdo               0.08     7.37   41.22  234.76  2468.76  7442.73 
   71.83     1.61    5.83    0.55    6.75   0.16   4.38
sdr               0.10     6.78   37.88  240.03  2680.16  7912.17 
   76.23     1.12    4.02    0.58    4.56   0.17   4.60
sds               0.07     3.84   31.48  206.88  2272.33  6364.27 
   72.47     0.74    3.08    0.55    3.47   0.16   3.92
sdt               0.07     2.96   28.18  158.38  2888.65  5934.89 
   94.59     1.09    5.83    0.71    6.74   0.18   3.41
sdu               0.04     2.30   24.03  123.84  2058.85  5431.40 
  101.31     0.49    3.29    0.56    3.81   0.16   2.39
dm-0              0.00     0.00    0.26   20.79    13.29  2845.47 
  271.73     0.07    3.55    0.27    3.59   0.88   1.85
dm-1              0.00     0.00    0.23    0.29     0.98     1.17 
    8.18     0.00    2.26    0.26    3.85   0.11   0.01
sdb               0.04     1.01   17.49   74.93  2301.97 16607.51 
  409.22     0.90    9.72    0.75   11.81   0.53   4.88

As the first three entries are NVMe SSDs, the value %util doesn't 
have a real value.

Check the fragmentation and saturation of the file system.
Following is an example of a disk fragmentation:
[0] storage-node-001(cloud_ceph production)~ #xfs_db -r /dev/sde1
xfs_db> frag -d
actual 4994, ideal 776, fragmentation factor 84.46%
Note, this number is largely meaningless.
Files on this filesystem average 6.44 extents per file
xfs_db> frag -f
actual 533266, ideal 394548, fragmentation factor 26.01%
Note, this number is largely meaningless.
Files on this filesystem average 1.35 extents per file
xfs_db>

I'm not really sure, what you mean with the saturation of the file 
system.
[0] storage-node-001(cloud_ceph production)~ #ceph osd df
ID  WEIGHT   REWEIGHT SIZE   USE    AVAIL  %USE  VAR  PGS
...
...
 0  1.74100  1.00000  1782G   984G   798G 55.21 1.50  44
 1  1.74100  1.00000  1782G  1148G   634G 64.41 1.75  57
 2  1.74100  1.00000  1782G   930G   852G 52.19 1.42  44
 4  1.74100  1.00000  1782G  1149G   633G 64.48 1.75  61
 5  1.74100  1.00000  1782G   991G   791G 55.62 1.51  53
 6  1.74100  1.00000  1782G  1084G   698G 60.84 1.66  52
 8  1.74100  1.00000  1782G  1036G   746G 58.15 1.58  51
 9  1.74100  1.00000  1782G   906G   876G 50.83 1.38  46
36  1.74100  1.00000  1782G  1086G   695G 60.96 1.66  53
39  1.74100  1.00000  1782G  1196G   586G 67.12 1.83  55
44  1.74100  1.00000  1782G  1137G   644G 63.82 1.74  54
47  1.74100  1.00000  1782G  1073G   709G 60.22 1.64  59
48  1.74100  1.00000  1782G  1144G   638G 64.21 1.75  59
51  1.74100  1.00000  1782G  1035G   746G 58.10 1.58  45
54  1.74100  1.00000  1782G  1131G   651G 63.46 1.73  55
17  1.74100  1.00000  1782G   770G  1012G 43.22 1.18  43
29  1.74100  1.00000  1782G  1086G   695G 60.96 1.66  53
158  1.74080  1.00000  1782G   844G   938G 47.35 1.29  44
...
...

What about the leveldb compaction?
We've compacted all three mons without any positive impact:
[0] ctrl-vm-020(cloud ceph (monitor))~ #time ceph compact
compacted leveldb in 3.067166

real    0m3.436s
user    0m0.275s
sys     0m0.045s

[0] storage-node-001(cloud_ceph production)~ #time ceph compact
compacted leveldb in 3.190430

real    0m3.480s
user    0m0.180s
sys     0m0.037s

[0] storage-node-002(cloud_ceph production)~ #time ceph compact
compacted leveldb in 2.284909

real    0m2.559s
user    0m0.170s
sys     0m0.041s

The fio benchmark as shown above has not changed.

Has the CPU load on the nodes changed?

No, it is more or less the same as before. An example:
top - 18:03:02 up 3 days, 10:24,  2 users,  load average: 3.19, 2.74, 
2.83
Tasks:693 total,  1 running,692 sleeping,  0 stopped,  0 zombie
top - 18:04:09 up 3 days, 10:25,  2 users,  load average: 3.04, 2.79, 
2.85
Tasks:697 total,  1 running,696 sleeping,  0 stopped,  0 zombie
%Cpu0  : 5.9 us, 3.6 sy, 0.0 ni,88.5 id, 0.3 wa, 0.0 hi, 1.6 si, 0.0 st
%Cpu1  : 5.6 us, 3.7 sy, 0.0 ni,89.4 id, 0.3 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu2  : 7.3 us, 4.0 sy, 0.0 ni,87.7 id, 0.7 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu3  : 7.0 us, 3.6 sy, 0.0 ni,88.4 id, 0.7 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu4  : 7.6 us, 3.6 sy, 0.0 ni,88.1 id, 0.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5  : 7.9 us, 3.3 sy, 0.0 ni,87.5 id, 1.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu6  : 7.3 us, 4.0 sy, 0.0 ni,86.8 id, 1.7 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu7  : 6.6 us, 4.0 sy, 0.0 ni,88.1 id, 1.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu8  : 7.4 us, 3.7 sy, 0.0 ni,88.0 id, 0.7 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu9  : 6.6 us, 4.0 sy, 0.0 ni,87.7 id, 1.0 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu10 : 6.9 us, 3.3 sy, 0.0 ni,88.8 id, 0.7 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu11 : 6.9 us, 4.2 sy, 0.0 ni,87.6 id, 1.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu12 : 3.6 us, 1.7 sy, 0.0 ni,94.1 id, 0.3 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu13 : 3.3 us, 3.0 sy, 0.0 ni,93.4 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu14 : 3.9 us, 2.6 sy, 0.0 ni,93.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu15 : 3.6 us, 2.3 sy, 0.0 ni,94.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu16 : 4.3 us, 2.3 sy, 0.0 ni,92.7 id, 0.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu17 : 3.9 us, 2.6 sy, 0.0 ni,93.1 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu18 : 3.9 us, 2.3 sy, 0.0 ni,92.4 id, 1.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu19 : 3.0 us, 2.3 sy, 0.0 ni,91.4 id, 3.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu20 : 3.3 us, 2.0 sy, 0.0 ni,94.4 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu21 : 3.3 us, 2.6 sy, 0.0 ni,93.7 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu22 : 2.6 us, 2.3 sy, 0.0 ni,94.7 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu23 : 3.6 us, 2.3 sy, 0.0 ni,93.8 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu24 : 2.0 us, 0.7 sy, 0.0 ni,97.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu25 : 2.0 us, 1.3 sy, 0.0 ni,96.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu26 : 2.3 us, 1.0 sy, 0.0 ni,96.1 id, 0.3 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu27 : 2.3 us, 1.7 sy, 0.0 ni,95.7 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu28 : 1.7 us, 2.3 sy, 0.0 ni,95.7 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu29 : 2.3 us, 1.0 sy, 0.0 ni,96.3 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu30 : 2.0 us, 1.0 sy, 0.0 ni,96.7 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu31 : 2.6 us, 1.0 sy, 0.0 ni,95.8 id, 0.3 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu32 : 2.6 us, 1.0 sy, 0.0 ni,96.1 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu33 : 3.0 us, 1.0 sy, 0.0 ni,95.4 id, 0.0 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu34 : 2.3 us, 1.0 sy, 0.0 ni,96.4 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu35 : 2.0 us, 0.7 sy, 0.0 ni,97.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu36 : 0.3 us, 1.0 sy, 0.0 ni,98.4 id, 0.3 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu37 : 1.0 us, 1.0 sy, 0.0 ni,97.7 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu38 : 1.3 us, 0.7 sy, 0.0 ni,98.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu39 : 0.7 us, 0.7 sy, 0.0 ni,98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu40 : 0.3 us, 0.3 sy, 0.0 ni,99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu41 : 1.0 us, 1.0 sy, 0.0 ni,97.4 id, 0.3 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu42 : 0.3 us, 0.3 sy, 0.0 ni,99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu43 : 1.0 us, 0.7 sy, 0.0 ni,98.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu44 : 1.0 us, 1.0 sy, 0.0 ni,98.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu45 : 0.7 us, 1.0 sy, 0.0 ni,98.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu46 : 0.7 us, 0.7 sy, 0.0 ni,98.0 id, 0.7 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu47 : 0.7 us, 0.3 sy, 0.0 ni,99.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem :13174224+total, 3221368 free,30296824 used,98224048 buff/cache
KiB Swap: 8388604 total, 8312828 free,   75776 used.88300832 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ 
COMMAND
11097 ceph      20   0 6995508   1.4g   5956 S  63.3  1.1   3911:53 
ceph-osd
10380 ceph      20   0 4538152 591284  13052 S  24.0  0.4 464:08.13 
ceph-osd
17588 ceph      20   0 4934448 690908  11932 S  24.0  0.5 802:59.36 
ceph-osd
11770 ceph      20   0 4384968 606812   6504 S  16.2  0.5 652:43.91 
ceph-osd
18986 ceph      20   0 4434068 672880   5868 S  13.0  0.5 636:27.15 
ceph-osd
 8731 ceph      20   0 4781112 697388   5876 S  12.7  0.5 820:54.88 
ceph-osd
...
...

During rebalancing (recorvery), we experienced a recovery io between 
a fee 100 MB/s and 3500 MB/s. The following output was an average 
example:
recovery io 1702 MB/s, 426 objects/s
   client io 551 MB/s rd, 131 MB/s wr, 4433 op/s rd, 3510 op/s wr

With best regards, Michael

On 09/07/2022 16.08, Joachim Kraftmayer wrote:
Hi Michael,
I have the following starting points:
Has the read and write latency also changed?
How does the utilisation of the hard disks look (iostat)?
Check the fragmentation and saturation of the file system.
What about the leveldb compaction?
Has the CPU load on the nodes changed?

Regards, Joachim

Am 09.07.2022 15:18 schrieb Michael Eichenberger 
<michael.eichenberger@xxxxxxxxxxxxxxxxx>:

    Hi all,

    We currently run out of ideas what causes a 50% performance
    decrease on
    disk io, after removing a OSD out of our cluster.

    Performance measurement is regularly done by a canary virtual
    machine
    which does an hourly disk IO measurement. Performance
    degradation is
    reported by customers on other virtual machines also.

    Symptoms:
    - After taking a failed disk out of our ceph cluster ('ceph osd
    out X')
      the canary VM measures a 50% performance degree.
    - Finishing re-balancing did not have an impact on performance
    - 'recovery io' as reported by ceph status is as high a usually
    - 'client io' as reported by ceph status is significant lower than
      usual, peaks are approximately factor 10 lower than 'recovery io'
      which was not the case before.

    Actions/Checks done all without impact on performance:
    - Logs do not show any indication of failures or irregularities
      (but we found a flapping OSD, which we also took out without
      further performance impact !).
    - No full or near full osd, PG's are balanced on OSDs.
    - Network operates in usual manner (and did not change); no
    saturation
      or high usage on links; bonds are ok; MTU settings checked and
    ok.
    - Crush map does not show any unexpected entries.
    - Boot of mons (one after another).
    - Boot of storage nodes (one after another).

    Cluster information:
    - Version: ceph version 10.2.11 (Jewel)
    - Operating System: CentOS 7
    - 3 mons, 180 OSD on 8 storage nodes
    - 382 TB used, 661 TB / 1043 TB avail
    - OSD's on NVME, SSD and Disks, pools mapped
      to either type (no mixed pools).
    - All OSD's use filestore

    Since we currently run out of ideas what could cause these
    performance
    troubles, we appreciate any hint that increases the probability
    to find
    a solution !

    Thanks in advance.

    With best regards, Michael

    -- 
    stepping stone AG
    Wasserwerkgasse 7
    CH-3011 Bern

    Telefon: +41 31 332 53 63
    www.stepping-stone.ch
    michael.eichenberger@xxxxxxxxxxxxxxxxx

    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
stepping stone AG
Wasserwerkgasse 7
CH-3011 Bern

Telefon: +41 31 332 53 63
www.stepping-stone.ch
michael.eichenberger@xxxxxxxxxxxxxxxxx

--
stepping stone AG
Wasserwerkgasse 7
CH-3011 Bern

Telefon: +41 31 332 53 63
www.stepping-stone.ch
michael.eichenberger@xxxxxxxxxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx