Re: OSDs RocksDB corrupted when upgrading nautilus->octopus: unknown WriteBatch tag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Igor;

Does this only impact CephFS then?

Thank you,

Dominic L. Hilsbos, MBA 
Director – Information Technology 
Perform Air International Inc.
DHilsbos@xxxxxxxxxxxxxx 
www.PerformAir.com


-----Original Message-----
From: Igor Fedotov [mailto:ifedotov@xxxxxxx] 
Sent: Monday, April 12, 2021 9:16 AM
To: Dominic Hilsbos; ceph-users@xxxxxxx
Subject: Re:  Re: OSDs RocksDB corrupted when upgrading nautilus->octopus: unknown WriteBatch tag

The workaround would be to disable bluestore_fsck_quick_fix_on_mount, do 
an upgrade and then do a regular fsck.

Depending on fsck  results either proceed with a repair or not.


Thanks,

Igor


On 4/12/2021 6:35 PM, DHilsbos@xxxxxxxxxxxxxx wrote:
> Is there a way to check for these zombie blobs, and other issues needing repair, prior to the upgrade?  That would allow us to know that issues might be coming, and perhaps address them before they result in corrupt OSDs.
>
> I'm considering upgrading our clusters from 14 to 15, and would really like to avoid these kinds of issues.
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> DHilsbos@xxxxxxxxxxxxxx
> www.PerformAir.com
>
> -----Original Message-----
> From: Igor Fedotov [mailto:ifedotov@xxxxxxx]
> Sent: Monday, April 12, 2021 7:55 AM
> To: ceph-users@xxxxxxx
> Subject:  Re: OSDs RocksDB corrupted when upgrading nautilus->octopus: unknown WriteBatch tag
>
> Sorry for being too late to the party...
>
> I think the root cause is related to the high amount of repairs made
> during the first post-upgrade fsck run.
>
> The check (and fix) for zombie spanning blobs was been backported to
> v15.2.9 (here is the PR https://github.com/ceph/ceph/pull/39256). And I
> presumt it's the one which causes BlueFS data corruption due to huge
> transaction happening during such a repair.
>
> I haven't seen this exact issue (as having that many zombie blobs is a
> rarely met bug by itself) but we had to some degree similar issue with
> upgrading omap names, see: https://github.com/ceph/ceph/pull/39377
>
> Huge resulting transaction could cause too big write to WAL which in
> turn caused data corruption (see https://github.com/ceph/ceph/pull/39701)
>
> Although the fix for the latter has been merged for 15.2.10 some
> additional issues with huge transactions might still exist...
>
>
> If someone can afford another OSD loss it could be interesting to get an
> OSD log for such a repair with debug-bluefs set to 20...
>
> I'm planning to make a fix to cap transaction size for repair in the
> nearest future anyway though..
>
>
> Thanks,
>
> Igor
>
>
> On 4/12/2021 5:15 PM, Dan van der Ster wrote:
>> Too bad. Let me continue trying to invoke Cunningham's Law for you ... ;)
>>
>> Have you excluded any possible hardware issues?
>>
>> 15.2.10 has a new option to check for all zero reads; maybe try it with true?
>>
>>       Option("bluefs_check_for_zeros", Option::TYPE_BOOL, Option::LEVEL_DEV)
>>       .set_default(false)
>>       .set_flag(Option::FLAG_RUNTIME)
>>       .set_description("Check data read for suspicious pages")
>>       .set_long_description("Looks into data read to check if there is a
>> 4K block entirely filled with zeros. "
>>                           "If this happens, we re-read data. If there is
>> difference, we print error to log.")
>>       .add_see_also("bluestore_retry_disk_reads"),
>>
>> The "fix zombie spanning blobs" feature was added in 15.2.9. Does
>> 15.2.8 work for you?
>>
>> Cheers, Dan
>>
>> On Sun, Apr 11, 2021 at 10:17 PM Jonas Jelten <jelten@xxxxxxxxx> wrote:
>>> Thanks for the idea, I've tried it with 1 thread, and it shredded another OSD.
>>> I've updated the tracker ticket :)
>>>
>>> At least non-racecondition bugs are hopefully easier to spot...
>>>
>>> I wouldn't just disable the fsck and upgrade anyway until the cause is rooted out.
>>>
>>> -- Jonas
>>>
>>>
>>> On 29/03/2021 14.34, Dan van der Ster wrote:
>>>> Hi,
>>>>
>>>> Saw that, looks scary!
>>>>
>>>> I have no experience with that particular crash, but I was thinking
>>>> that if you have already backfilled the degraded PGs, and can afford
>>>> to try another OSD, you could try:
>>>>
>>>>       "bluestore_fsck_quick_fix_threads": "1",  # because
>>>> https://github.com/facebook/rocksdb/issues/5068 showed a similar crash
>>>> and the dev said it occurs because WriteBatch is not thread safe.
>>>>
>>>>       "bluestore_fsck_quick_fix_on_mount": "false", # should disable the
>>>> fsck during upgrade. See https://github.com/ceph/ceph/pull/40198
>>>>
>>>> -- Dan
>>>>
>>>> On Mon, Mar 29, 2021 at 2:23 PM Jonas Jelten <jelten@xxxxxxxxx> wrote:
>>>>> Hi!
>>>>>
>>>>> After upgrading MONs and MGRs successfully, the first OSD host I upgraded on Ubuntu Bionic from 14.2.16 to 15.2.10
>>>>> shredded all OSDs on it by corrupting RocksDB, and they now refuse to boot.
>>>>> RocksDB complains "Corruption: unknown WriteBatch tag".
>>>>>
>>>>> The initial crash/corruption occured when the automatic fsck was ran, and when it committed the changes for a lot of "zombie spanning blobs".
>>>>>
>>>>> Tracker issue with logs: https://tracker.ceph.com/issues/50017
>>>>>
>>>>>
>>>>> Anyone else encountered this error? I've "suspended" the upgrade for now :)
>>>>>
>>>>> -- Jonas
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux