Hi, As I described in another mail(*1), my development Ceph cluster was corrupted when using problematic binary. When I upgraded to v16.2.7 + some patches (*2) + PR#45963 patch, unfound pgs and inconsistent pgs appeared. In the end, I deleted this cluster. pacific: bluestore: set upper and lower bounds on rocksdb omap iterators https://github.com/ceph/ceph/pull/45963 This problem happened because PR#45963 causes data corruption about OSDs which were created in octopus or older. This patch was reverted, and the correct version (PR#46096) was applied later. pacific: revival and backport of fix for RocksDB optimized iterators https://github.com/ceph/ceph/pull/46096 It's mainly because I applied the not-well-tested patch carelessly. To prevent the same a mistake from happening again, let me ask some questions. a. About QA process a.1 In my understanding, the test cases differ between the QA for merging a PR and the QA for release. For example, the upgrade test was run only in the release QA process. Is my understanding correct? I thought so because the bug in #45963 was not detected in the QA for merging but was detected in the QA for release. a.2 If a.1 is correct, is it possible to run all test cases in both QA? I guess that some time-consuming tests are skipped to improve efficient development. a.3 Is there any detailed document about how to run Teuthology in the user's local environment? Once I tried this by reading the official document, it didn't work well. https://docs.ceph.com/en/quincy/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-intro/#how-to-run-integration-tests At that time, Teuthology failed to connect to paddles.front.sepia.ceph.com, which wasn't written in this document. ``` requests.exceptions.ConnectionError: HTTPConnectionPool(host='paddles.front.sepia.ceph.com', port=80): Max retries exceeded with url: /nodes/?machine_type=vps&count=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fc945880490>: Failed to establish a new connection: [Errno 110] Connection timed out')) ``` b. To minimize the risk, I'd like to use the newest data format of both OSD and MON as possible. More precisely, I'd like to re-create all OSDs and MONs if their default data format was changed. Please let me know if there is a convenient way to know the data format of each OSD and MON. As an example, when I re-created some OSDs created in octopus or older in my pacific cluster, I assumed that the older OSDs than the upgrade-to-pacific date were created in octopus or older. It seemed to work, but it's better to use a more straightforward way. *1) https://lists.ceph.io/hyperkitty/list/dev@xxxxxxx/message/TT6ZQ5LUS54ZK4NNXSDJIOBS5A2ZFAGT/ *2) PR#43581, 44413, 45502, 45654, these patches don't relate to the topic of this mail Best, Satoru _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx