Re: High OSD latencies afer Upgrade 14.2.16 -> 14.2.22

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Josh,

thank you very much for your answer. Well we do not use encryption on our OSDs, but the symptoms you found are quite similar to what I observed after the upgrade to 14.2.22.

We also use the new default with bluefs_buffered_io=true which probably cause the higher OSD latencies seen. Since we do not use dmcrypt but still have higher latencies as well as higher throuput it seems dmcrypt ist not the only way to see these symptims.

How did you find out that dmcrypt causes this situation in your case?

Thanks you also for the description what bluefs_buffered_io=true actually does. I googled a bit but did not find any documentation about this buffer. Do you know where more information can be found?

Thanks a lot
Rainer

Am 16.07.21 um 16:35 schrieb Josh Baergen:
Hi Rainer,

Are you using dmcrypt on your OSDs? I ask because I'm wondering if you're seeing something similar to what we saw in our systems with bluefs_buffered_io=true (as it is by default in 14.2.22, whereas it's false in 14.2.16): With this set, bluefs writes go through Linux's buffer cache, and large writes actually get chopped up at buffer boundaries, with the expectation that the I/O scheduler will merge those writes into larger ones again. However, because the dmcrypt layer is a bit slower, we found that the individual writes would leak through to the scheduler at a slow enough pace that they wouldn't all be merged, resulting in a bunch of smaller writes to the HDDs that were missing revolutions, increasing OSD commit/apply latency. Like you, we didn't see a major effect on end user performance for most of our systems, but the effect was pretty drastic for one of them. Not as big of an issue for SSDs, of course, because most of them have some sort of cache (SLC or otherwise) that can absorb these smaller writes and internally commit as larger ones. (Theoretically a writeback cache in front of HDDs should help with this as well, but we tend to avoid those.)

bluefs_buffered_io=true helps with some rocksdb iterate-and-delete workloads in particular, such as snaptrim. PG removal was optimized in 14.2.17 to avoid the performance issues that bluefs_buffered_io is needed to solve. We run HDDs in some of our RGW clusters only and so we chose to set bluefs_buffered_io=false for all of our HDD nodes to avoid the latency hit, since PG removal is the only workload of this type in a typical RGW system.

Josh

On Fri, Jul 16, 2021 at 6:58 AM Rainer Krienke <krienke@xxxxxxxxxxxxxx <mailto:krienke@xxxxxxxxxxxxxx>> wrote:

    Hello,

    Today I upgraded a ceph (HDD) cluster consisting of 9 hosts with
    each 16
    OSDs (a total of 144) to the latest Nautilus version 14.2.22. The
    upgrade proceeded without problems. The cluster is healthy. After all
    hosts were on 14.2.22 I saw in grafana that OSD latencies were by
    85msec
    after an hour  they dropped to about 45 ms. And now probably because
    the
    cluster faces a little higher IO demand from the Proxmox client side
    the
    OSD latencies are again at 57ms.

    Before the upgrade running 14.2.16 this value was about 33msec.

    I looked at ceph os perf where I can see an always changing set of OSDS
    that have latencies of about 300, right after the upgrade up some
    had up
    to 800 ms. Now there are always say 20 OSD that are between 100 and
    400msec. They are not all from one host and this high latency osd set
    has members that stay longer in this high state and others that change
    more often to a lower value again:

    # ceph osd perf|sort -n -k 2|tail -30
    134                 37                37
       19                 38                38
    112                 39                39
       12                 42                42
       75                 42                42
       67                 43                43
       51                 45                45
       81                 45                45
       92                 50                50
       40                 56                56
       63                 60                60
       59                 61                61
    128                 65                65
    135                 65                65
    124                 66                66
    117                 94                94
       35                 94                94
       26                112               112
       14                127               127
       56                135               135
    100                164               164
       83                168               168
       62                177               177
       82                182               182
       30                186               186
       72                186               186
    102                203               203
    131                211               211
    121                247               247
       46                254               254
    137                340               340

    On the other hand if I try to test performance on a linux VM running on
    proxmox that uses this cluster as a storage backend I do not have the
    feeling that its slower than before, when I test eg IO Performance
    using
    bonnie++ .  It actually seems to be faster. But why then the higher osd
    latencies?

    Does anyone have an idea why those latencies could have nearly doubled?
    How can I find out more about this strangeness? Any Ideas?

    Thanks
    Rainer
-- Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
    56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke
    <http://www.uni-koblenz.de/~krienke>, Tel: +49261287 1312
    PGP: http://www.uni-koblenz.de/~krienke/mypgp.html
    <http://www.uni-koblenz.de/~krienke/mypgp.html>,     Fax: +49261287
    1001312
    _______________________________________________
    ceph-users mailing list -- ceph-users@xxxxxxx
    <mailto:ceph-users@xxxxxxx>
    To unsubscribe send an email to ceph-users-leave@xxxxxxx
    <mailto:ceph-users-leave@xxxxxxx>


--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse  1
56070 Koblenz, Web: http://www.uni-koblenz.de/~krienke, Tel: +49261287 1312
PGP: http://www.uni-koblenz.de/~krienke/mypgp.html, Fax: +49261287 1001312
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux