Re: pg log hard limit upgrade bug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Nathan, I don't think we want to revert it for 13.2.2.

This is because the pg log hard limit feature currently doesn't seem
to work well in a partial upgrade, recovery/backfill scenario. So,
even if we do revert it in 13.2.3, this still leaves us a chance of
going into a split scenario, where some osds in the field, running
13.2.2(with hard limit code) and others on 13.2.3(without the code),
may encounter http://tracker.ceph.com/issues/36686.

Therefore, users who have succesfully upgraded to 13.2.2, shouldn't be
at any risk.
For users trying to upgrade to a version >= 13.2.2, I am going to make
a note of this issue and add the suggested workaround in Pending
Release Notes for mimic.

Does that make sense?

Thanks,
Neha

On Mon, Nov 5, 2018 at 2:43 PM, Yuri Weinstein <yweinste@xxxxxxxxxx> wrote:
> Acknowledged
>
> On Mon, Nov 5, 2018 at 2:35 PM Nathan Cutler <ncutler@xxxxxxx> wrote:
>>
>> Thanks, Neha. The luminous revert was just merged and we'll cut 12.2.10
>> to push it out to users.
>>
>> Regarding Mimic, will there be a revert there as well? Since the pg hard
>> limit patches are present in 13.2.2, it sounds like we'll need to revert
>> them before we release 13.2.3?
>>
>> (Note that Yuri was planning to start QE for 13.2.3 - Yuri, please hold
>> off on that for now?)
>>
>> Nathan
>>
>> On 11/5/18 6:50 PM, Neha Ojha wrote:
>> > Hi All,
>> >
>> > We have discovered an issue with the pg log hard limit
>> > patches(https://github.com/ceph/ceph/pull/23211,
>> > https://github.com/ceph/ceph/pull/24308), where a partial upgrade
>> > during backfill, can cause the osds on the previous version, to fail
>> > with "assert(trim_to <= info.last_complete)". Full description of the
>> > bug is here: http://tracker.ceph.com/issues/36686.
>> >
>> > These changes are in 13.2.2 and 12.2.9, and a workaround for users is
>> > to upgrade and restart all OSDs to a version with the pg hard limit,
>> > or only upgrade when all PGs are active+clean.
>> >
>> > Until we add capability to have the pg log hard limit work smoothly in
>> > the upgrade case, we will be reverting these changes,
>> > https://github.com/ceph/ceph/pull/24903, and releasing 12.2.10 as
>> > early as possible.
>> >
>> > We are also reverting https://github.com/ceph/ceph/pull/24902, which
>> > is a low impact bug, but might causes issues in the field.
>> >
>> > Sorry for any inconvenience caused due to this.
>> >
>> > Thanks,
>> > Neha
>> >
>>
>> --
>> Nathan Cutler
>> Software Engineer Distributed Storage
>> SUSE LINUX, s.r.o.
>> Tel.: +420 284 084 037



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux