Dups trimming https://tracker.ceph.com/issues/53729

Neha Ojha <nojha@xxxxxxxxxx> · Fri, 17 Jun 2022 11:25:42 -0700

Hi everyone,

We recently reverted the offline and the online version of the dups trimming fixes, which were made as a part of https://github.com/ceph/ceph/pull/45529. 

1. During the testing of the offline version of the tool we found out that the dups trimming code also results in excessive trimming the PG log entries, which leaves the objectstore version of the PG log inconsistent with what's stored in PG info after the OSD is restarted (can be seen by comparing the head/tail of the pg log in the output of pg query vs in the pg log by dumping it using the COT). This is unintended and may cause unforeseen circumstances.

Fix: https://github.com/ceph/ceph/pull/46630/commits/a2190f901abf2fed20c65e59f53b38c10545cb5a has been created to make the offline dups trimming independent of PG log trimming. We are also including a minimal patch https://github.com/ceph/ceph/commit/8f1c8a7309976098644bb978d2c1095089522846, which will help users check whether they are affected by this bug or not. 

We intend to release this as early as 17.2.1. Pacific and Octopus versions are also in the pipeline, but we will decide the release vehicle depending on the ETA of (2) (ship the tool change only or the complete fix with the tool change). 

2. It has come to our attention that this bug can cause peering stalls due to accumulation/transfer of millions of dups during peering. We have reverted the online version of the fix, because we intend to add more guardrails around how many dups we are reading off disk and loading into memory so as to not cause any surprises during or after an upgrade (especially for users who have been affected by this issue without any visible effects). The current code would attempt to trim all the dups at one go after the upgrade, which in the case of more than 80 millions dups can be a problem. We also feel the need to perform extensive tests, replicating the bug to validate our upgrade approach and not worsen the situation for existing clusters. https://github.com/ceph/ceph/pull/46694 is a step in that direction. We do not have an ETA for this at the moment but we will keep you informed as we finalize our plan of action.

Hope this clarifies things.

Thanks,
Neha
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx