Corruption on cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Everyone,

For a couple of weeks I've been battling a corruption in Ceph FS that 
happens when a writer on one node writes a line and calls sync as is 
typical with logging and the file is corrupted when the same file that 
is being written is read from another client.

The cluster is a Nautilus 14.2.9 and the clients are all kernel client 
mounting the filesystem with CentOS 8.4 kernel 
4.18.0-305.10.2.el8_4.x86_64.  Bluestore OSDs and Eraseure coding are 
both used.  The cluster was upgraded from Mimic (the first installed 
versoin) at some point.

Here is a little python3 program that triggers the issue:

import os
import time

fh=open("test.log", "a")

while True:
     start = time.time()
     fh.writelines("test2\n")
     end = time.time()
     fh.flush()
     junk=os.getpid()
     fh.writelines(f"took {(end - start)}\n")
     fh.flush()
     time.sleep(1)

If I run this on one client and repeatedly run "wc -l " on a different 
client.  The wc will do 2 different behaviours, sometimes NULL bytes get 
scribbled in the file and the next line of output is appended and other 
times the file gets truncated.

I did update from 14.2.2 to 14.2.9 (I had the a clone of the 14.2.9 repo 
on hand).  I read the release notes and there did seem to be some 
related fixes between 14.2.2 and 14.2.9 but nothing after 14.2.9.

I can't seem to find any references to a problem like this anywhere.  
Does anyone have any ideas?

Sincerely

-Dave

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux