Re: Upgrading to 16.2.11 timing out on ceph-volume due to raw list performance bug, downgrade isn't possible due to new OP code in bluestore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you for the suggestion Frank. We've managed to avoid patches so far,
but I guess that convenience ends now :(
With
# lsblk -P -p -o 'NAME' | wc -l
137
it takes about 10 minutes to run. 70 probably would also bring you over the
2 minute timeout window, so I certainly wouldn't consider updating unless
you have this bug patched.

Best regards, Mikael


On Wed, Apr 5, 2023 at 9:35 AM Frank Schilder <frans@xxxxxx> wrote:

> Hi Mikael, thanks for sharing this (see also
> https://www.stroustrup.com/whitespace98.pdf, python ha ha ha). We would
> probably have observed the same problem (70+ OSDs per host). You might want
> to consider configuring deployment against a local registry and use a
> patched image. Local container images is always a god idea, post-release
> patches are common and not an exception.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Mikael Öhman <micketeer@xxxxxxxxx>
> Sent: Wednesday, April 5, 2023 1:18 AM
> To: ceph-users@xxxxxxx
> Subject:  Upgrading to 16.2.11 timing out on ceph-volume due
> to raw list performance bug, downgrade isn't possible due to new OP code in
> bluestore
>
> Trying to upgrade a containerized setup from 16.2.10 to 16.2.11 gave us two
> big surprises, I wanted to share in case anyone else encounters the same. I
> don't see any nice solution to this apart from a new release that fixes the
> performance regression that completely breaks the container setup in
> cephadm due to timeouts:
>
> After some digging, we would that the it was the "ceph-volume" command that
> kept timing out, and after a ton of digging, found that it does so because
> of
>
> https://github.com/ceph/ceph/commit/bea9f4b643ce32268ad79c0fc257b25ff2f8333c#diff-29697ff230f01df036802c8b2842648267767b3a7231ea04a402eaf4e1819d29R104
> which was introduced into 16.2.11.
> Unfortunately, the vital fix for this
>
> https://github.com/ceph/ceph/commit/8d7423c3e75afbe111c91e699ef3cb1c0beee61b
> was not included in 16.2.11
>
> So, in a setup like ours, with *many* devices, a simple "ceph-volume raw
> list" now takes over 10 minutes to run (instead of 5 seconds in 16.2.10).
> As a result, the service files that cephadm generates
>
> [Service]
> LimitNOFILE=1048576
> LimitNPROC=1048576
> EnvironmentFile=-/etc/environment
> ExecStart=/bin/bash
> /var/lib/ceph/5406fed0-d52b-11ec-beff-7ed30a54847b/%i/unit.run
> ExecStop=-/bin/bash -c '/bin/podman stop
> ceph-5406fed0-d52b-11ec-beff-7ed30a54847b-%i ; bash
> /var/lib/ceph/5406fed0-d52b-11ec-beff-7ed30a54847b/%i/unit.stop'
> ExecStopPost=-/bin/bash
> /var/lib/ceph/5406fed0-d52b-11ec-beff-7ed30a54847b/%i/unit.poststop
> KillMode=none
> Restart=on-failure
> RestartSec=10s
> TimeoutStartSec=120
> TimeoutStopSec=120
> StartLimitInterval=30min
> StartLimitBurst=5
> ExecStartPre=-/bin/rm -f %t/%n-pid %t/%n-cid
> ExecStopPost=-/bin/rm -f %t/%n-pid %t/%n-cid
> Type=forking
> PIDFile=%t/%n-pid
> Delegate=yes
>
> will repeatedly be marked as failed, as they take over 2 minutes to run
> now. This tells systemd to restart, and we now have an infinite loop, as
> the 5 restarts takes over 50 minutes, it never even triggers the
> StarLimitInterval, leaving this OSD in an infinite loop over listing the
> n^2 devices (which, as a bonus, is also filling up the root  disk with an
> enormous amount of repeated logging in ceph-volume.log as it infinitely
> tries to figure out if a block device is a bluestore)
> And trying to just fix the service or unit files manually to at least just
> stop this container from being incorrectly restarted over and over, is also
> a dead end, since the orchestration stuff just overwrites this
> automatically, and restarts the services again.
> I found it seemed to be
>
> /var/lib/ceph/5406fed0-d52b-11ec-beff-7ed30a54847b/cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2
> on my system that generated these files, so i tried tweaking that to have
> the necessary 1200 second TimeoutStart and finally that managed to get the
> darn container to stop restarting endlessly. (I admit i'm very fuzzy on how
> these services and orchestration stuff is triggered as i usually don't work
> on our storage stuff)
> Still though, it takes 11 minutes to start each OSD service now, so this
> isn't great.
>
> We wanted to revert back to 16.2.10 but it turns out to also be a no-go, as
> a new operation added to bluefs https://github.com/ceph/ceph/pull/42750 in
> 16.2.11 (though this isn't mentioned in the changelogs, i had to compare
> the source code to see that it was in fact added 16.2.11). So trying to
> revert an OSD then fails with:
>
> debug 2023-04-04T11:42:45.927+0000 7f2c12f6a200 -1 bluefs _replay 0x100000:
> stop: unrecognized op 12
> debug 2023-04-04T11:42:45.927+0000 7f2c12f6a200 -1 bluefs mount failed to
> replay log: (5) Input/output error
> debug 2023-04-04T11:42:45.927+0000 7f2c12f6a200 -1
> bluestore(/var/lib/ceph/osd/ceph-10) _open_bluefs failed bluefs mount: (5)
> Input/output error
> debug 2023-04-04T11:42:45.927+0000 7f2c12f6a200 -1
> bluestore(/var/lib/ceph/osd/ceph-10) _open_db failed to prepare db
> environment:
> debug 2023-04-04T11:42:45.927+0000 7f2c12f6a200  1 bdev(0x5590e80a0400
> /var/lib/ceph/osd/ceph-10/block) close
> debug 2023-04-04T11:42:46.153+0000 7f2c12f6a200 -1 osd.10 0 OSD:init:
> unable to mount object store
> debug 2023-04-04T11:42:46.153+0000 7f2c12f6a200 -1  ** ERROR: osd init
> failed: (5) Input/output error
>
> Ouch
> Best regards, Mikael
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux