Hello, - gibba nodes are used inefficiently - used a lot closer to the end of the major release cycle (or for specific projects, e.g. mclock), but largely idle in the middle of the release cycle - a considerable waste of hardware resources if used only to exercise upgrading to some (currently reef) backport releases - proposal to release gibba nodes for teuthology (Patrick) - for special-purpose suites where jobs require more nodes and/or more time than usual (e.g. running for 10h with 6-8 nodes)? - run tests for different components on the same cluster concurrently, this is lacking today except for a few bits in upgrade suites - ... or even just existing suites (Casey) - need Neha to weigh in as gibba cluster caretaker - 18.2.1 blockers - MDS crashing on old kernel clients - https://github.com/ceph/ceph/pull/54677 is a temporary stop-gap change in smoke and powercycle suites needed for reproducing - increases the number of jobs in reef (scheduling with --subset would defeat the purpose of the change) - needs ack from core - https://github.com/ceph/ceph/pull/54407 is the fix - Venky to test with amended smoke suite, merge and hand off to Yuri for LRC upgrade - discussion on test suite changes would be held separately - https://tracker.ceph.com/issues/63618 (next item) - potential data corruption in bluestore (!!!) - can occur under heavy fragmentation if db is co-located with the main device or after bluefs spillover to the main device, when the main device is configured with 64k alloc size - affects OSDs that were upgraded without redeploying from octopus and earlier releases - a crash on ceph_assert(available >= allocated) during OSD startup is an indicator - more likely than actual data corruption? (Igor) - Laura to check telemetry for instances of this assert - assumed to be caused by https://github.com/ceph/ceph/pull/48854 which shipped in 18.2.0 and was backported to 16.2.14 and 17.2.6, meaning that all release streams are vulnerable - tracked in https://tracker.ceph.com/issues/63618 (hit on 17.2.7) - https://tracker.ceph.com/issues/62282 was hit by Adam on 17.2.6, Igor believes the root cause to be the same - for now, this is a blocker for 16.2.15 and 18.2.1 - might necessitate hot fixes (also for quincy) - regression for RHEL tests on main ("nothing provides lua-devel") - https://tracker.ceph.com/issues/63672 - 42 pacific PRs left to be triaged - https://github.com/ceph/ceph/pulls?q=is%3Aopen+is%3Apr+milestone%3Apacific - move to v16.2.15 milestone or close PR and reject backport Thanks, Ilya _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx