Re: pg log hard limit upgrade bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sure, adding more test scenarios can never hurt!

On Mon, Nov 5, 2018 at 3:39 PM, Yuri Weinstein <yweinste@xxxxxxxxxx> wrote:
> Neha,
>
> We with Josh testing mods related to this here
> https://github.com/ceph/ceph/pull/24938
> And will do the same for mimic-x in preparation for nautilus, if sounds good?
>
> Thx
> YuriW
> On Mon, Nov 5, 2018 at 3:37 PM Neha Ojha <nojha@xxxxxxxxxx> wrote:
>>
>> Nathan, I don't think we want to revert it for 13.2.2.
>>
>> This is because the pg log hard limit feature currently doesn't seem
>> to work well in a partial upgrade, recovery/backfill scenario. So,
>> even if we do revert it in 13.2.3, this still leaves us a chance of
>> going into a split scenario, where some osds in the field, running
>> 13.2.2(with hard limit code) and others on 13.2.3(without the code),
>> may encounter http://tracker.ceph.com/issues/36686.
>>
>> Therefore, users who have succesfully upgraded to 13.2.2, shouldn't be
>> at any risk.
>> For users trying to upgrade to a version >= 13.2.2, I am going to make
>> a note of this issue and add the suggested workaround in Pending
>> Release Notes for mimic.
>>
>> Does that make sense?
>>
>> Thanks,
>> Neha
>>
>> On Mon, Nov 5, 2018 at 2:43 PM, Yuri Weinstein <yweinste@xxxxxxxxxx> wrote:
>> > Acknowledged
>> >
>> > On Mon, Nov 5, 2018 at 2:35 PM Nathan Cutler <ncutler@xxxxxxx> wrote:
>> >>
>> >> Thanks, Neha. The luminous revert was just merged and we'll cut 12.2.10
>> >> to push it out to users.
>> >>
>> >> Regarding Mimic, will there be a revert there as well? Since the pg hard
>> >> limit patches are present in 13.2.2, it sounds like we'll need to revert
>> >> them before we release 13.2.3?
>> >>
>> >> (Note that Yuri was planning to start QE for 13.2.3 - Yuri, please hold
>> >> off on that for now?)
>> >>
>> >> Nathan
>> >>
>> >> On 11/5/18 6:50 PM, Neha Ojha wrote:
>> >> > Hi All,
>> >> >
>> >> > We have discovered an issue with the pg log hard limit
>> >> > patches(https://github.com/ceph/ceph/pull/23211,
>> >> > https://github.com/ceph/ceph/pull/24308), where a partial upgrade
>> >> > during backfill, can cause the osds on the previous version, to fail
>> >> > with "assert(trim_to <= info.last_complete)". Full description of the
>> >> > bug is here: http://tracker.ceph.com/issues/36686.
>> >> >
>> >> > These changes are in 13.2.2 and 12.2.9, and a workaround for users is
>> >> > to upgrade and restart all OSDs to a version with the pg hard limit,
>> >> > or only upgrade when all PGs are active+clean.
>> >> >
>> >> > Until we add capability to have the pg log hard limit work smoothly in
>> >> > the upgrade case, we will be reverting these changes,
>> >> > https://github.com/ceph/ceph/pull/24903, and releasing 12.2.10 as
>> >> > early as possible.
>> >> >
>> >> > We are also reverting https://github.com/ceph/ceph/pull/24902, which
>> >> > is a low impact bug, but might causes issues in the field.
>> >> >
>> >> > Sorry for any inconvenience caused due to this.
>> >> >
>> >> > Thanks,
>> >> > Neha
>> >> >
>> >>
>> >> --
>> >> Nathan Cutler
>> >> Software Engineer Distributed Storage
>> >> SUSE LINUX, s.r.o.
>> >> Tel.: +420 284 084 037



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux