Re: Migration Nautilus to Pacifi : Very high latencies (EC profile)

stéphane chalansonnet <schalans@xxxxxxxxx> · Mon, 16 May 2022 23:56:10 +0200

Hello,

Yes we got several slow ops stocks for many seconds.
What we noted  : CPU/MeM usage less than Nautilus (
https://drive.google.com/file/d/1NGa5sA8dlQ65ld196Ku2hm_Y0xxvfvNs/view?usp=sharingt
)
Same behaviour than you .

For the moment, the rebuild of one our node seems to fix the latency issue
for it.
Exemple Disk write request avg waiting time ( HDD)
Nautilus : 8-11ms
Pacific before rebuild : 29-46ms
Pacific after rebuild : 4-5ms

disk Average queue size
Nautilus : 3-5ms
Pacific before rebuild : 6-10ms
Pacific after rebuild : 1-2ms

*As a part of this upgrade, did you also migrate the OSDs to sharded
rocksdb column families? This would have been done by setting bluestore's
"quick fix on mount" setting to true or by issuing a "ceph-bluestore-tool
repair" offline, perhaps in response to a BLUESTORE_NO_PER_POOL_OMAP
warning post-upgrade*
*=> * I'm going to let my colleague answer parts of that(he will probably
answer tomorrow)

Regards,

Le lun. 16 mai 2022 à 17:20, Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> a
écrit :

> In our case it appears that file deletes have a very high impact on osd
> operations. Not a significant delete either ~20T on a 1PB utilized
> filesystem (large files as well).
>
> We are trying to tune down cephfs delayed deletes via:
>     "mds_max_purge_ops": "512",
>     "mds_max_purge_ops_per_pg": "0.100000",
>
> with some success but still experimenting with how we can reduce the
> throughput impact from osd slow ops.
>
> Respectfully,
>
> *Wes Dillingham*
> wes@xxxxxxxxxxxxxxxxx
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Mon, May 16, 2022 at 9:49 AM Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx>
> wrote:
>
>> We have a newly-built pacific (16.2.7) cluster running 8+3 EC jerasure
>> ~250 OSDS across 21 hosts which has significantly lower than expected IOPS.
>> Only doing about 30 IOPS per spinning disk (with appropriately sized SSD
>> bluestore db) around ~100 PGs per OSD. Have around 100 CephFS (ceph fuse
>> 16.2.7) clients using the cluster. Cluster regularly reports slow ops from
>> the OSDs but the vast majority, 90% plus of the OSDs, are only <50% IOPS
>> utilized. Plenty of cpu/ram/network left on all cluster nodes. We have
>> looked for hardware (disk/bond/network/mce) issues across the cluster with
>> no findings / checked send-qs and received-q's across the cluster to try
>> and narrow in on an individual failing component but nothing found there.
>> Slow ops are also spread equally across the servers in the cluster. Does
>> your cluster report any health warnings (slow ops etc) alongside your
>> reduced performance?
>>
>> Respectfully,
>>
>> *Wes Dillingham*
>> wes@xxxxxxxxxxxxxxxxx
>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>
>>
>> On Mon, May 16, 2022 at 2:00 AM Martin Verges <martin.verges@xxxxxxxx>
>> wrote:
>>
>>> Hello,
>>>
>>> depending on your workload, drives and OSD allocation size, using the 3+2
>>> can be way slower than the 4+2. Maybe give it a small benchmark and try
>>> if
>>> you see a huge difference. We had some benchmarks with such and they
>>> showed
>>> quite ugly results in some tests. Best way to deploy EC in our findings
>>> is
>>> in power of 2, like 2+x, 4+x, 8+x, 16+x. Especially when you deploy OSDs
>>> before the Ceph allocation change patch, you might end up consuming way
>>> more space if you don't use power of 2. With the 4k allocation size at
>>> least this has been greatly improved for newer deployed OSDs.
>>>
>>> --
>>> Martin Verges
>>> Managing director
>>>
>>> Mobile: +49 174 9335695  | Chat: https://t.me/MartinVerges
>>>
>>> croit GmbH, Freseniusstr. 31h, 81247 Munich
>>> CEO: Martin Verges - VAT-ID: DE310638492
>>> Com. register: Amtsgericht Munich HRB 231263
>>> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
>>>
>>>
>>> On Sun, 15 May 2022 at 20:30, stéphane chalansonnet <schalans@xxxxxxxxx>
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > Thank you for your answer.
>>> > this is not a good news if you also notice a performance decrease on
>>> your
>>> > side
>>> > No, as far as we know, you cannot downgrade to Octopus.
>>> > Going forward seems to be the only way, so Quincy .
>>> > We have a a qualification cluster so we can try on it (but full virtual
>>> > configuration)
>>> >
>>> >
>>> > We are using 4+2 and 3+2 profile
>>> > Are you also on the same profile on your Cluster ?
>>> > Maybe replicated profile are not be impacted ?
>>> >
>>> > Actually, we are trying to recreate one by one the OSD.
>>> > some parameters can be only set by this way .
>>> > The first storage Node is almost rebuild, we will see if the latencies
>>> on
>>> > it are below the others ...
>>> >
>>> > Wait and see .....
>>> >
>>> > Le dim. 15 mai 2022 à 10:16, Martin Verges <martin.verges@xxxxxxxx> a
>>> > écrit :
>>> >
>>> >> Hello,
>>> >>
>>> >> what exact EC level do you use?
>>> >>
>>> >> I can confirm, that our internal data shows a performance drop when
>>> using
>>> >> pacific. So far Octopus is faster and better than pacific but I doubt
>>> you
>>> >> can roll back to it. We haven't rerun our benchmarks on Quincy yet,
>>> but
>>> >> according to some presentation it should be faster than pacific.
>>> Maybe try
>>> >> to jump away from the pacific release into the unknown!
>>> >>
>>> >> --
>>> >> Martin Verges
>>> >> Managing director
>>> >>
>>> >> Mobile: +49 174 9335695  | Chat: https://t.me/MartinVerges
>>> >>
>>> >> croit GmbH, Freseniusstr. 31h, 81247 Munich
>>> >> CEO: Martin Verges - VAT-ID: DE310638492
>>> >> Com. register: Amtsgericht Munich HRB 231263
>>> >> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
>>> >>
>>> >>
>>> >> On Sat, 14 May 2022 at 12:27, stéphane chalansonnet <
>>> schalans@xxxxxxxxx>
>>> >> wrote:
>>> >>
>>> >>> Hello,
>>> >>>
>>> >>> After a successful update from Nautilus to Pacific on Centos8.5, we
>>> >>> observed some high latencies on our cluster.
>>> >>>
>>> >>> We did not find very much thing on community related to latencies
>>> post
>>> >>> migration
>>> >>>
>>> >>> Our setup is
>>> >>> 6x storage Node (256GRAM, 2SSD OSD + 5*6To SATA HDD)
>>> >>> Erasure coding profile
>>> >>> We have two EC pool :
>>> >>> -> Pool1 : Full HDD SAS Drive 6To
>>> >>> -> Pool2 : Full SSD Drive
>>> >>>
>>> >>> Object S3 and RBD block workload
>>> >>>
>>> >>> Our performances in nautilus, before the upgrade , are acceptable.
>>> >>> However , the next day , performance dropped by 3 or 4
>>> >>> Benchmark showed 15KIOPS on flash drive , before upgrade we had
>>> >>> almost 80KIOPS
>>> >>> Also, HDD pool is almost down (too much lantencies
>>> >>>
>>> >>> We suspected , maybe, an impact on erasure Coding configuration on
>>> >>> Pacific
>>> >>> Anyone observed the same behaviour ? any tuning ?
>>> >>>
>>> >>> Thank you for your help.
>>> >>>
>>> >>> ceph osd tree
>>> >>> ID   CLASS  WEIGHT     TYPE NAME                 STATUS  REWEIGHT
>>> >>> PRI-AFF
>>> >>>  -1         347.61304  root default
>>> >>>  -3          56.71570      host cnp31tcephosd01
>>> >>>   0    hdd    5.63399          osd.0                 up   1.00000
>>> >>> 1.00000
>>> >>>   1    hdd    5.63399          osd.1                 up   1.00000
>>> >>> 1.00000
>>> >>>   2    hdd    5.63399          osd.2                 up   1.00000
>>> >>> 1.00000
>>> >>>   3    hdd    5.63399          osd.3                 up   1.00000
>>> >>> 1.00000
>>> >>>   4    hdd    5.63399          osd.4                 up   1.00000
>>> >>> 1.00000
>>> >>>   5    hdd    5.63399          osd.5                 up   1.00000
>>> >>> 1.00000
>>> >>>   6    hdd    5.63399          osd.6                 up   1.00000
>>> >>> 1.00000
>>> >>>   7    hdd    5.63399          osd.7                 up   1.00000
>>> >>> 1.00000
>>> >>>  40    ssd    5.82190          osd.40                up   1.00000
>>> >>> 1.00000
>>> >>>  48    ssd    5.82190          osd.48                up   1.00000
>>> >>> 1.00000
>>> >>>  -5          56.71570      host cnp31tcephosd02
>>> >>>   8    hdd    5.63399          osd.8                 up   1.00000
>>> >>> 1.00000
>>> >>>   9    hdd    5.63399          osd.9               down   1.00000
>>> >>> 1.00000
>>> >>>  10    hdd    5.63399          osd.10                up   1.00000
>>> >>> 1.00000
>>> >>>  11    hdd    5.63399          osd.11                up   1.00000
>>> >>> 1.00000
>>> >>>  12    hdd    5.63399          osd.12                up   1.00000
>>> >>> 1.00000
>>> >>>  13    hdd    5.63399          osd.13                up   1.00000
>>> >>> 1.00000
>>> >>>  14    hdd    5.63399          osd.14                up   1.00000
>>> >>> 1.00000
>>> >>>  15    hdd    5.63399          osd.15                up   1.00000
>>> >>> 1.00000
>>> >>>  49    ssd    5.82190          osd.49                up   1.00000
>>> >>> 1.00000
>>> >>>  50    ssd    5.82190          osd.50                up   1.00000
>>> >>> 1.00000
>>> >>>  -7          56.71570      host cnp31tcephosd03
>>> >>>  16    hdd    5.63399          osd.16                up   1.00000
>>> >>> 1.00000
>>> >>>  17    hdd    5.63399          osd.17                up   1.00000
>>> >>> 1.00000
>>> >>>  18    hdd    5.63399          osd.18                up   1.00000
>>> >>> 1.00000
>>> >>>  19    hdd    5.63399          osd.19                up   1.00000
>>> >>> 1.00000
>>> >>>  20    hdd    5.63399          osd.20                up   1.00000
>>> >>> 1.00000
>>> >>>  21    hdd    5.63399          osd.21                up   1.00000
>>> >>> 1.00000
>>> >>>  22    hdd    5.63399          osd.22                up   1.00000
>>> >>> 1.00000
>>> >>>  23    hdd    5.63399          osd.23                up   1.00000
>>> >>> 1.00000
>>> >>>  51    ssd    5.82190          osd.51                up   1.00000
>>> >>> 1.00000
>>> >>>  52    ssd    5.82190          osd.52                up   1.00000
>>> >>> 1.00000
>>> >>>  -9          56.71570      host cnp31tcephosd04
>>> >>>  24    hdd    5.63399          osd.24                up   1.00000
>>> >>> 1.00000
>>> >>>  25    hdd    5.63399          osd.25                up   1.00000
>>> >>> 1.00000
>>> >>>  26    hdd    5.63399          osd.26                up   1.00000
>>> >>> 1.00000
>>> >>>  27    hdd    5.63399          osd.27                up   1.00000
>>> >>> 1.00000
>>> >>>  28    hdd    5.63399          osd.28                up   1.00000
>>> >>> 1.00000
>>> >>>  29    hdd    5.63399          osd.29                up   1.00000
>>> >>> 1.00000
>>> >>>  30    hdd    5.63399          osd.30                up   1.00000
>>> >>> 1.00000
>>> >>>  31    hdd    5.63399          osd.31                up   1.00000
>>> >>> 1.00000
>>> >>>  53    ssd    5.82190          osd.53                up   1.00000
>>> >>> 1.00000
>>> >>>  54    ssd    5.82190          osd.54                up   1.00000
>>> >>> 1.00000
>>> >>> -11          56.71570      host cnp31tcephosd05
>>> >>>  32    hdd    5.63399          osd.32                up   1.00000
>>> >>> 1.00000
>>> >>>  33    hdd    5.63399          osd.33                up   1.00000
>>> >>> 1.00000
>>> >>>  34    hdd    5.63399          osd.34                up   1.00000
>>> >>> 1.00000
>>> >>>  35    hdd    5.63399          osd.35                up   1.00000
>>> >>> 1.00000
>>> >>>  36    hdd    5.63399          osd.36                up   1.00000
>>> >>> 1.00000
>>> >>>  37    hdd    5.63399          osd.37                up   1.00000
>>> >>> 1.00000
>>> >>>  38    hdd    5.63399          osd.38                up   1.00000
>>> >>> 1.00000
>>> >>>  39    hdd    5.63399          osd.39                up   1.00000
>>> >>> 1.00000
>>> >>>  55    ssd    5.82190          osd.55                up   1.00000
>>> >>> 1.00000
>>> >>>  56    ssd    5.82190          osd.56                up   1.00000
>>> >>> 1.00000
>>> >>> -13          64.03453      host cnp31tcephosd06
>>> >>>  41    hdd    7.48439          osd.41                up   1.00000
>>> >>> 1.00000
>>> >>>  42    hdd    7.48439          osd.42                up   1.00000
>>> >>> 1.00000
>>> >>>  43    hdd    7.48439          osd.43                up   1.00000
>>> >>> 1.00000
>>> >>>  44    hdd    7.48439          osd.44                up   1.00000
>>> >>> 1.00000
>>> >>>  45    hdd    7.48439          osd.45                up   1.00000
>>> >>> 1.00000
>>> >>>  46    hdd    7.48439          osd.46                up   1.00000
>>> >>> 1.00000
>>> >>>  47    hdd    7.48439          osd.47                up   1.00000
>>> >>> 1.00000
>>> >>>  57    ssd    5.82190          osd.57                up   1.00000
>>> >>> 1.00000
>>> >>>  58    ssd    5.82190          osd.58                up   1.00000
>>> >>> 1.00000
>>> >>> _______________________________________________
>>> >>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> >>>
>>> >>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx