Re: *****SPAM***** Re: Corruption on cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Wow!  Thanks everyone!

The bug report at https://tracker.ceph.com/issues/51948 describes 
exactly the behaviour that we are seeing.  I'll update and let everyone 
know when I've finished the upgrade.  This will probably take a few days 
as I need to wait for a window to do the work.

Sincerely

-Dave

On 2021-09-21 11:42 a.m., Dan van der Ster wrote:
> [△EXTERNAL]
>
>
>
> It's this: https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftracker.ceph.com%2Fissues%2F51948&data=04%7C01%7Cdschulz%40ucalgary.ca%7C57a1d5878287471a7c5d08d97d275fec%7Cc609a0eca5e346319686192280bd9151%7C1%7C0%7C637678430333679564%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=GwMoi8NBz%2F6kGlBDnGy60j4uzcoQqrp4XW9ALca8oIc%3D&reserved=0
>
>
> The fix just landed in 4.18.0-305.19.1
>
> https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faccess.redhat.com%2Ferrata%2FRHSA-2021%3A3548&data=04%7C01%7Cdschulz%40ucalgary.ca%7C57a1d5878287471a7c5d08d97d275fec%7Cc609a0eca5e346319686192280bd9151%7C1%7C0%7C637678430333679564%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7A0NDY1912fBbZ9mlZ90Gmerd%2BeFC0ATQvVmoJVKYHg%3D&reserved=0
>
>
>
> On Tue, 21 Sep 2021, 19:35 Marc, <Marc@xxxxxxxxxxxxxxxxx> wrote:
>
>> I do not have access to this page. Maybe others also not, so it is better
>> to paste it's content here.
>>
>>> -----Original Message-----
>>> From: Patrick Donnelly <pdonnell@xxxxxxxxxx>
>>> Sent: Tuesday, 21 September 2021 19:30
>>> To: David Schulz <dschulz@xxxxxxxxxxx>
>>> Cc: ceph-users@xxxxxxx
>>> Subject: *****SPAM*****  Re: Corruption on cluster
>>>
>>> Hi Dave,
>>>
>>> On Tue, Sep 21, 2021 at 1:20 PM David Schulz <dschulz@xxxxxxxxxxx>
>>> wrote:
>>>> Hi Everyone,
>>>>
>>>> For a couple of weeks I've been battling a corruption in Ceph FS that
>>>> happens when a writer on one node writes a line and calls sync as is
>>>> typical with logging and the file is corrupted when the same file that
>>>> is being written is read from another client.
>>>>
>>>> The cluster is a Nautilus 14.2.9 and the clients are all kernel client
>>>> mounting the filesystem with CentOS 8.4 kernel
>>>> 4.18.0-305.10.2.el8_4.x86_64.  Bluestore OSDs and Eraseure coding are
>>>> both used.  The cluster was upgraded from Mimic (the first installed
>>>> versoin) at some point.
>>>>
>>>> Here is a little python3 program that triggers the issue:
>>>>
>>>> import os
>>>> import time
>>>>
>>>> fh=open("test.log", "a")
>>>>
>>>> while True:
>>>>       start = time.time()
>>>>       fh.writelines("test2\n")
>>>>       end = time.time()
>>>>       fh.flush()
>>>>       junk=os.getpid()
>>>>       fh.writelines(f"took {(end - start)}\n")
>>>>       fh.flush()
>>>>       time.sleep(1)
>>>>
>>>> If I run this on one client and repeatedly run "wc -l " on a different
>>>> client.  The wc will do 2 different behaviours, sometimes NULL bytes
>>> get
>>>> scribbled in the file and the next line of output is appended and
>>> other
>>>> times the file gets truncated.
>>>>
>>>> I did update from 14.2.2 to 14.2.9 (I had the a clone of the 14.2.9
>>> repo
>>>> on hand).  I read the release notes and there did seem to be some
>>>> related fixes between 14.2.2 and 14.2.9 but nothing after 14.2.9.
>>>>
>>>> I can't seem to find any references to a problem like this anywhere.
>>>> Does anyone have any ideas?
>>> You're probably hitting this bug:
>>> https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1996680&amp;data=04%7C01%7Cdschulz%40ucalgary.ca%7C57a1d5878287471a7c5d08d97d275fec%7Cc609a0eca5e346319686192280bd9151%7C1%7C0%7C637678430333679564%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=6JtiyzCBHnxMfcrrWDHoUztRIEKSh2zkqDWXpWzgU2U%3D&amp;reserved=0
>>>
>>> Try upgrading your kernel.
>>>
>>> --
>>> Patrick Donnelly, Ph.D.
>>> He / Him / His
>>> Principal Software Engineer
>>> Red Hat Sunnyvale, CA
>>> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux