Re: Data corruption with XFS on Debian 11 and 12 under heavy load.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 29, 2023 at 06:15:36PM +0100, Jose M Calhariz wrote:
> 
> Hi,
> 
> I have been chasing a data corruption problem under heavy load on 4
> servers that I have at my care.  First I thought of an hardware
> problem because it only happen with RAID 6 disks.  So I reported to Debian: 
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1032391
> 
> Further research pointed to be the XFS the common pattern, not an
> hardware issue.  So I made an informal query to a friend in a software
> house that relies heavily on XFS about his thought on this issue.  He
> made reference to several problems fixed on kernel 6.2 and a
> discussion on this mailing list about back porting the fixes to 6.1
> kernel.
> 
> With this information I have tried the latest kernel at that time on
> Debian testing over Debian v12 and I could not reproduce the
> problem.  So I made another bug report:
> 
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1040416
> 
> My questions to this mailing list:
> 
>   - Have anyone experienced under Debian or with vanilla kernels
>   corruption under heavy load on XFS?

Yes.  There were a rash of corruption problems that got fixed in 6.2:
https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/tag/?h=xfs-6.2-merge-8

My guess with no other information is either the write invalidation
problem in iomap; or maybe COW extent allocations racing with the log.

Most of these haven't been backported to 6.1 because our only choices as
a community were (a) let a dumb bot shovel in patches with zero QA or
(b) try to scare up volunteers to backport things to LTS kernels.  (a)
wasn't acceptable, but then with (b)...

>   - Should I stop waiting for the fixes being back ported to vanilla
>   6.1 and run the latest kernel from Debian testing anyway?  Taking
>   notice that kernels from testing have less security updates on time
>   than stable kernels, specially security issues with limited
>   disclosure.

...there isn't really a designated 6.1 LTS backport engineer right now.
A couple folks from Cloudflare; Amir Goldstein; and Ted Ts'o have been
sharing the work when they have spare time.

--D

> I am happy to provide more info about my setup or my stability tests
> that fail under XFS.
> 
> 
> Kind regards
> Jose M Calhariz
> 
> -- 
> --
> Um falso amigo nunca o xinga
> 
> Um verdadeiro amigo já o xingou de tudo quanto é
> palavrão que existe - e até inventou alguns novos





[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux