Wow! Thanks everyone! The bug report at https://tracker.ceph.com/issues/51948 describes exactly the behaviour that we are seeing. I'll update and let everyone know when I've finished the upgrade. This will probably take a few days as I need to wait for a window to do the work. Sincerely -Dave On 2021-09-21 11:42 a.m., Dan van der Ster wrote: > [△EXTERNAL] > > > > It's this: https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftracker.ceph.com%2Fissues%2F51948&data=04%7C01%7Cdschulz%40ucalgary.ca%7C57a1d5878287471a7c5d08d97d275fec%7Cc609a0eca5e346319686192280bd9151%7C1%7C0%7C637678430333679564%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=GwMoi8NBz%2F6kGlBDnGy60j4uzcoQqrp4XW9ALca8oIc%3D&reserved=0 > > > The fix just landed in 4.18.0-305.19.1 > > https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Faccess.redhat.com%2Ferrata%2FRHSA-2021%3A3548&data=04%7C01%7Cdschulz%40ucalgary.ca%7C57a1d5878287471a7c5d08d97d275fec%7Cc609a0eca5e346319686192280bd9151%7C1%7C0%7C637678430333679564%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=7A0NDY1912fBbZ9mlZ90Gmerd%2BeFC0ATQvVmoJVKYHg%3D&reserved=0 > > > > On Tue, 21 Sep 2021, 19:35 Marc, <Marc@xxxxxxxxxxxxxxxxx> wrote: > >> I do not have access to this page. Maybe others also not, so it is better >> to paste it's content here. >> >>> -----Original Message----- >>> From: Patrick Donnelly <pdonnell@xxxxxxxxxx> >>> Sent: Tuesday, 21 September 2021 19:30 >>> To: David Schulz <dschulz@xxxxxxxxxxx> >>> Cc: ceph-users@xxxxxxx >>> Subject: *****SPAM***** Re: Corruption on cluster >>> >>> Hi Dave, >>> >>> On Tue, Sep 21, 2021 at 1:20 PM David Schulz <dschulz@xxxxxxxxxxx> >>> wrote: >>>> Hi Everyone, >>>> >>>> For a couple of weeks I've been battling a corruption in Ceph FS that >>>> happens when a writer on one node writes a line and calls sync as is >>>> typical with logging and the file is corrupted when the same file that >>>> is being written is read from another client. >>>> >>>> The cluster is a Nautilus 14.2.9 and the clients are all kernel client >>>> mounting the filesystem with CentOS 8.4 kernel >>>> 4.18.0-305.10.2.el8_4.x86_64. Bluestore OSDs and Eraseure coding are >>>> both used. The cluster was upgraded from Mimic (the first installed >>>> versoin) at some point. >>>> >>>> Here is a little python3 program that triggers the issue: >>>> >>>> import os >>>> import time >>>> >>>> fh=open("test.log", "a") >>>> >>>> while True: >>>> start = time.time() >>>> fh.writelines("test2\n") >>>> end = time.time() >>>> fh.flush() >>>> junk=os.getpid() >>>> fh.writelines(f"took {(end - start)}\n") >>>> fh.flush() >>>> time.sleep(1) >>>> >>>> If I run this on one client and repeatedly run "wc -l " on a different >>>> client. The wc will do 2 different behaviours, sometimes NULL bytes >>> get >>>> scribbled in the file and the next line of output is appended and >>> other >>>> times the file gets truncated. >>>> >>>> I did update from 14.2.2 to 14.2.9 (I had the a clone of the 14.2.9 >>> repo >>>> on hand). I read the release notes and there did seem to be some >>>> related fixes between 14.2.2 and 14.2.9 but nothing after 14.2.9. >>>> >>>> I can't seem to find any references to a problem like this anywhere. >>>> Does anyone have any ideas? >>> You're probably hitting this bug: >>> https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.redhat.com%2Fshow_bug.cgi%3Fid%3D1996680&data=04%7C01%7Cdschulz%40ucalgary.ca%7C57a1d5878287471a7c5d08d97d275fec%7Cc609a0eca5e346319686192280bd9151%7C1%7C0%7C637678430333679564%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=6JtiyzCBHnxMfcrrWDHoUztRIEKSh2zkqDWXpWzgU2U%3D&reserved=0 >>> >>> Try upgrading your kernel. >>> >>> -- >>> Patrick Donnelly, Ph.D. >>> He / Him / His >>> Principal Software Engineer >>> Red Hat Sunnyvale, CA >>> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@xxxxxxx >>> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx