Dear Ceph Users,
We have discovered a bug with the pg-upmap-primary interface (related to the offline read balancer [1]) that affects all Reef releases.
In all Reef versions, users are required to set `require-min-compat-client=reef` in order to use the pg-upmap-primary interface to prevent pre-reef clients from connecting and not understanding the new interface. We found this setting is simply not enforced [2], which leads to miscommunication between older and newer peers or, depending on version, to an assert in the mons and/or osds [3]. However, the fundamental precondition is making use of the new `pg-upmap-primary` feature.
If you have not yet upgraded to v18.2.2, we recommend that you refrain from upgrading to v18.2.2 until a later version is out with a fix. We also recommend removing any existing pg-upmap-primary mappings to prevent hitting the assert [3], as well as to prevent any miscommunication between older and newer peers about pg primaries [2].
Remove mappings by:
$ `ceph osd dump`
For each pg_upmap_primary entry in the above output:
$ `ceph osd rm-pg-upmap-primary <pgid>`
If you have already upgraded to v18.2.2, your cluster is more likely to hit the osd/mon assert [3] when you set a `pg-upmap-primary` mapping (this would involve explicitly setting a mapping via the osdmaptool or the CLI command). As long as you refrain from setting any pg-upmap-primary mappings, your cluster will NOT be affected by [3].
Follow the trackers below for further updates.
1. pg-upmap-primary documentation: https://docs.ceph.com/en/reef/rados/operations/read-balancer/
2. mon, osd, *: require-min-compat-client is not really honored - https://tracker.ceph.com/issues/66260
3. Failed assert "pg_upmap_primaries.empty()" in the read balancer
- https://tracker.ceph.com/issues/61948
We have discovered a bug with the pg-upmap-primary interface (related to the offline read balancer [1]) that affects all Reef releases.
In all Reef versions, users are required to set `require-min-compat-client=reef` in order to use the pg-upmap-primary interface to prevent pre-reef clients from connecting and not understanding the new interface. We found this setting is simply not enforced [2], which leads to miscommunication between older and newer peers or, depending on version, to an assert in the mons and/or osds [3]. However, the fundamental precondition is making use of the new `pg-upmap-primary` feature.
If you have not yet upgraded to v18.2.2, we recommend that you refrain from upgrading to v18.2.2 until a later version is out with a fix. We also recommend removing any existing pg-upmap-primary mappings to prevent hitting the assert [3], as well as to prevent any miscommunication between older and newer peers about pg primaries [2].
Remove mappings by:
$ `ceph osd dump`
For each pg_upmap_primary entry in the above output:
$ `ceph osd rm-pg-upmap-primary <pgid>`
If you have already upgraded to v18.2.2, your cluster is more likely to hit the osd/mon assert [3] when you set a `pg-upmap-primary` mapping (this would involve explicitly setting a mapping via the osdmaptool or the CLI command). As long as you refrain from setting any pg-upmap-primary mappings, your cluster will NOT be affected by [3].
Follow the trackers below for further updates.
1. pg-upmap-primary documentation: https://docs.ceph.com/en/reef/rados/operations/read-balancer/
2. mon, osd, *: require-min-compat-client is not really honored - https://tracker.ceph.com/issues/66260
3. Failed assert "pg_upmap_primaries.empty()" in the read balancer
- https://tracker.ceph.com/issues/61948
Thanks,
Laura Flores
--
Laura Flores
She/Her/Hers
Software Engineer, Ceph Storage
Chicago, IL
lflores@xxxxxxx | lflores@xxxxxxxxxx
M: +17087388804
|
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx