Sure, adding more test scenarios can never hurt! On Mon, Nov 5, 2018 at 3:39 PM, Yuri Weinstein <yweinste@xxxxxxxxxx> wrote: > Neha, > > We with Josh testing mods related to this here > https://github.com/ceph/ceph/pull/24938 > And will do the same for mimic-x in preparation for nautilus, if sounds good? > > Thx > YuriW > On Mon, Nov 5, 2018 at 3:37 PM Neha Ojha <nojha@xxxxxxxxxx> wrote: >> >> Nathan, I don't think we want to revert it for 13.2.2. >> >> This is because the pg log hard limit feature currently doesn't seem >> to work well in a partial upgrade, recovery/backfill scenario. So, >> even if we do revert it in 13.2.3, this still leaves us a chance of >> going into a split scenario, where some osds in the field, running >> 13.2.2(with hard limit code) and others on 13.2.3(without the code), >> may encounter http://tracker.ceph.com/issues/36686. >> >> Therefore, users who have succesfully upgraded to 13.2.2, shouldn't be >> at any risk. >> For users trying to upgrade to a version >= 13.2.2, I am going to make >> a note of this issue and add the suggested workaround in Pending >> Release Notes for mimic. >> >> Does that make sense? >> >> Thanks, >> Neha >> >> On Mon, Nov 5, 2018 at 2:43 PM, Yuri Weinstein <yweinste@xxxxxxxxxx> wrote: >> > Acknowledged >> > >> > On Mon, Nov 5, 2018 at 2:35 PM Nathan Cutler <ncutler@xxxxxxx> wrote: >> >> >> >> Thanks, Neha. The luminous revert was just merged and we'll cut 12.2.10 >> >> to push it out to users. >> >> >> >> Regarding Mimic, will there be a revert there as well? Since the pg hard >> >> limit patches are present in 13.2.2, it sounds like we'll need to revert >> >> them before we release 13.2.3? >> >> >> >> (Note that Yuri was planning to start QE for 13.2.3 - Yuri, please hold >> >> off on that for now?) >> >> >> >> Nathan >> >> >> >> On 11/5/18 6:50 PM, Neha Ojha wrote: >> >> > Hi All, >> >> > >> >> > We have discovered an issue with the pg log hard limit >> >> > patches(https://github.com/ceph/ceph/pull/23211, >> >> > https://github.com/ceph/ceph/pull/24308), where a partial upgrade >> >> > during backfill, can cause the osds on the previous version, to fail >> >> > with "assert(trim_to <= info.last_complete)". Full description of the >> >> > bug is here: http://tracker.ceph.com/issues/36686. >> >> > >> >> > These changes are in 13.2.2 and 12.2.9, and a workaround for users is >> >> > to upgrade and restart all OSDs to a version with the pg hard limit, >> >> > or only upgrade when all PGs are active+clean. >> >> > >> >> > Until we add capability to have the pg log hard limit work smoothly in >> >> > the upgrade case, we will be reverting these changes, >> >> > https://github.com/ceph/ceph/pull/24903, and releasing 12.2.10 as >> >> > early as possible. >> >> > >> >> > We are also reverting https://github.com/ceph/ceph/pull/24902, which >> >> > is a low impact bug, but might causes issues in the field. >> >> > >> >> > Sorry for any inconvenience caused due to this. >> >> > >> >> > Thanks, >> >> > Neha >> >> > >> >> >> >> -- >> >> Nathan Cutler >> >> Software Engineer Distributed Storage >> >> SUSE LINUX, s.r.o. >> >> Tel.: +420 284 084 037