Neha, We with Josh testing mods related to this here https://github.com/ceph/ceph/pull/24938 And will do the same for mimic-x in preparation for nautilus, if sounds good? Thx YuriW On Mon, Nov 5, 2018 at 3:37 PM Neha Ojha <nojha@xxxxxxxxxx> wrote: > > Nathan, I don't think we want to revert it for 13.2.2. > > This is because the pg log hard limit feature currently doesn't seem > to work well in a partial upgrade, recovery/backfill scenario. So, > even if we do revert it in 13.2.3, this still leaves us a chance of > going into a split scenario, where some osds in the field, running > 13.2.2(with hard limit code) and others on 13.2.3(without the code), > may encounter http://tracker.ceph.com/issues/36686. > > Therefore, users who have succesfully upgraded to 13.2.2, shouldn't be > at any risk. > For users trying to upgrade to a version >= 13.2.2, I am going to make > a note of this issue and add the suggested workaround in Pending > Release Notes for mimic. > > Does that make sense? > > Thanks, > Neha > > On Mon, Nov 5, 2018 at 2:43 PM, Yuri Weinstein <yweinste@xxxxxxxxxx> wrote: > > Acknowledged > > > > On Mon, Nov 5, 2018 at 2:35 PM Nathan Cutler <ncutler@xxxxxxx> wrote: > >> > >> Thanks, Neha. The luminous revert was just merged and we'll cut 12.2.10 > >> to push it out to users. > >> > >> Regarding Mimic, will there be a revert there as well? Since the pg hard > >> limit patches are present in 13.2.2, it sounds like we'll need to revert > >> them before we release 13.2.3? > >> > >> (Note that Yuri was planning to start QE for 13.2.3 - Yuri, please hold > >> off on that for now?) > >> > >> Nathan > >> > >> On 11/5/18 6:50 PM, Neha Ojha wrote: > >> > Hi All, > >> > > >> > We have discovered an issue with the pg log hard limit > >> > patches(https://github.com/ceph/ceph/pull/23211, > >> > https://github.com/ceph/ceph/pull/24308), where a partial upgrade > >> > during backfill, can cause the osds on the previous version, to fail > >> > with "assert(trim_to <= info.last_complete)". Full description of the > >> > bug is here: http://tracker.ceph.com/issues/36686. > >> > > >> > These changes are in 13.2.2 and 12.2.9, and a workaround for users is > >> > to upgrade and restart all OSDs to a version with the pg hard limit, > >> > or only upgrade when all PGs are active+clean. > >> > > >> > Until we add capability to have the pg log hard limit work smoothly in > >> > the upgrade case, we will be reverting these changes, > >> > https://github.com/ceph/ceph/pull/24903, and releasing 12.2.10 as > >> > early as possible. > >> > > >> > We are also reverting https://github.com/ceph/ceph/pull/24902, which > >> > is a low impact bug, but might causes issues in the field. > >> > > >> > Sorry for any inconvenience caused due to this. > >> > > >> > Thanks, > >> > Neha > >> > > >> > >> -- > >> Nathan Cutler > >> Software Engineer Distributed Storage > >> SUSE LINUX, s.r.o. > >> Tel.: +420 284 084 037