Re: v16.2.8 Pacific released

Cory Snyder <csnyder@xxxxxxxxx> · Tue, 17 May 2022 07:13:30 -0400

Hi all,

Unfortunately, we experienced some issues with the upgrade to 16.2.8
on one of our larger clusters. Within a few hours of the upgrade, all
5 of our managers had become unavailable. We found that they were all
deadlocked due to (what appears to be) a regression with GIL and mutex
handling. See https://tracker.ceph.com/issues/39264 and
https://github.com/ceph/ceph/pull/38677 for context on previous
manifestations of the issue.

I discovered some mistakes within a recent Pacific backport that seem
to be responsible. Here is the tracker for the regression:
https://tracker.ceph.com/issues/55687. Here is an open PR that should
resolve the problem: https://github.com/ceph/ceph/pull/38677.

Note that this is a sort of race condition, and the issue tends to
manifest itself more frequently in larger clusters. Enabling certain
modules may also make it more likely to occur. On our cluster, MGRs
are consistently deadlocking within about an hour.

Hopefully this is useful to others who are considering an upgrade!

Thanks,

Cory Snyder

On Mon, May 16, 2022 at 3:46 PM David Galloway <dgallowa@xxxxxxxxxx> wrote:
>
> We're happy to announce the 8th backport release in the Pacific series.
> We recommend users to update to this release. For a detailed release
> notes with links & changelog please refer to the official blog entry at
> https://ceph.io/en/news/blog/2022/v16-2-8-pacific-released
>
> Notable Changes
> ---------------
>
> * MON/MGR: Pools can now be created with `--bulk` flag. Any pools
> created with `bulk` will use a profile of the `pg_autoscaler` that
> provides more performance from the start. However, any pools created
> without the `--bulk` flag will remain using it's old behavior by
> default. For more details, see:
> https://docs.ceph.com/en/latest/rados/operations/placement-groups/
>
> * MGR: The pg_autoscaler can now be turned `on` and `off` globally with
> the `noautoscale` flag. By default this flag is unset and the default
> pg_autoscale mode remains the same. For more details, see:
> https://docs.ceph.com/en/latest/rados/operations/placement-groups/
>
> * A health warning will now be reported if the ``require-osd-release``
> flag is not set to the appropriate release after a cluster upgrade.
>
> * CephFS: Upgrading Ceph Metadata Servers when using multiple active
> MDSs requires ensuring no pending stray entries which are directories
> are present for active ranks except rank 0. See
> https://docs.ceph.com/en/latest/releases/pacific/#upgrading-from-octopus-or-nautilus.
>
> Getting Ceph
> ------------
> * Git at git://github.com/ceph/ceph.git
> * Tarball at https://download.ceph.com/tarballs/ceph-16.2.8.tar.gz
> * Containers at https://quay.io/repository/ceph/ceph
> * For packages, see https://docs.ceph.com/docs/master/install/get-packages/
> * Release git sha1: 209e51b856505df4f2f16e54c0d7a9e070973185
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx