Hi Satoru, Apologies for the delay in responding to your questions. In the case of https://github.com/ceph/ceph/pull/45963, we caught the bug in an upgrade test (as described in https://tracker.ceph.com/issues/55444) and not in the rados test suite. Our upgrade test suites are meant to be run before we do point releases, to ensure backward compatibility with N-2 and N-1 releases. As I mentioned in https://tracker.ceph.com/issues/55444#note-12, the same test scenario was run as a part of the rados suite before the original PR was merged, but we did not catch this bug then, because the we were testing the patch against pacific, where resharding is on by default. When the same test scenario was run as a part of the upgrade test, resharding was not on by default, so we caught this issue. We are have minimal upgrade tests within the rados suite, but unfortunately, they weren't enough to catch this particular issue. Rest of the answers are inline. On Fri, Aug 26, 2022 at 2:52 AM Satoru Takeuchi <satoru.takeuchi@xxxxxxxxx> wrote: > > Could anyone answe this question? There are many questions but it's of > course really helpful to know the answer of just one question. > > The summary of my questions. > > - a. About QA process > - a.1 The nunber of test cases differ between the QA for merging a PR and > the QA for release? > - a.2 If a.1 is correct, is it possible to chenge the CI system to run > all test cases in both QAs? Our testing framework, teuthology uses a concept of subsets to determine the number of jobs scheduled in a run, you can find detailed information about it in https://docs.ceph.com/en/latest/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-intro/#nested-subsets. I have already explained what was different in the case of the bug you referred to. We can certainly encourage more use of our upgrade suites before merging critical PRs. > - a.3 How to run Teuthology in my local environment? At this point, we have the ability to run some tests locally using teuthology, Junior (cc'ed here) did a presentation on this topic, which was recorded here: https://www.youtube.com/watch?v=wZHcg0oVzhY. > - b. Is there any way to know the data format of each OSD and MON (e.g. > created in version A with cobfiguration B) by issuing Ceph commands? I am not quite sure I understand your question. Hope this helps! Thanks, Neha > > The detail is in the following quoted description. > > 2022年8月19日(金) 15:39 Satoru Takeuchi <satoru.takeuchi@xxxxxxxxx>: > > > Hi, > > > > As I described in another mail(*1), my development Ceph cluster was > > corrupted when using problematic binary. > > When I upgraded to v16.2.7 + some patches (*2) + PR#45963 patch, > > unfound pgs and inconsistent pgs appeared. In the end, I deleted this > > cluster. > > > > pacific: bluestore: set upper and lower bounds on rocksdb omap iterators > > https://github.com/ceph/ceph/pull/45963 > > > > This problem happened because PR#45963 causes data corruption about OSDs > > which were created in octopus or older. > > > > This patch was reverted, and the correct version (PR#46096) was applied > > later. > > > > pacific: revival and backport of fix for RocksDB optimized iterators > > https://github.com/ceph/ceph/pull/46096 > > > > It's mainly because I applied the not-well-tested patch carelessly. To > > prevent the same > > a mistake from happening again, let me ask some questions. > > > > a. About QA process > > a.1 In my understanding, the test cases differ between the QA for > > merging > > a PR and the QA for release. For example, the upgrade test was > > run only > > in the release QA process. Is my understanding correct? > > I thought so because the bug in #45963 was not detected in > > the QA for merging > > but was detected in the QA for release. > > a.2 If a.1 is correct, is it possible to run all test cases in both > > QA? I guess that some > > time-consuming tests are skipped to improve efficient development. > > a.3 Is there any detailed document about how to run Teuthology in > > the user's local environment? > > Once I tried this by reading the official document, it didn't > > work well. > > > > > > https://docs.ceph.com/en/quincy/dev/developer_guide/testing_integration_tests/tests-integration-testing-teuthology-intro/#how-to-run-integration-tests > > > > At that time, Teuthology failed to connect to > > paddles.front.sepia.ceph.com, which wasn't written in this document. > > > > ``` > > requests.exceptions.ConnectionError: > > HTTPConnectionPool(host='paddles.front.sepia.ceph.com', port=80): Max > > retries exceeded with url: /nodes/?machine_type=vps&count=1 (Caused by > > NewConnectionError('<urllib3.connection.HTTPConnection object at > > 0x7fc945880490>: Failed to establish a new connection: [Errno 110] > > Connection timed out')) > > ``` > > b. To minimize the risk, I'd like to use the newest data format of > > both OSD and MON as possible. > > More precisely, I'd like to re-create all OSDs and MONs if their > > default data format was changed. > > Please let me know if there is a convenient way to know the data > > format of each OSD and MON. > > > > As an example, when I re-created some OSDs created in octopus or > > older in my pacific cluster, > > I assumed that the older OSDs than the upgrade-to-pacific date > > were created in octopus or older. > > It seemed to work, but it's better to use a more straightforward way. > > > > *1) > > https://lists.ceph.io/hyperkitty/list/dev@xxxxxxx/message/TT6ZQ5LUS54ZK4NNXSDJIOBS5A2ZFAGT/ > > *2) PR#43581, 44413, 45502, 45654, these patches don't relate to the > > topic of this mail > > > Thanks, > Satoru > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx