Re: [general question] rare silent data corruption when writing data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 13, 2020 at 01:49:10PM -0400, John Stoffel wrote:
I wonder if this problem can be replicated on loop devices?  Once
there's a way to cause it reliably, we can then start doing a
bisection of the kernel to try and find out where this is happening.

I ran a week or so of attempting to replicate the problem in a VM on loop devices replicating the lvm/raid config, without success. Basically just having a random bunch of 1-25 concurrent writers banging out middling to largish files.

The fact it wasn't replicable in that environment could be pointing towards the LSI driver or hardware - or I simply wasn't able to match the conditions well enough.

So far, it looks like it happens sometimes on bare RAID6 systems
without lv-thin in place, which is both good and bad.  And without
using VMs on top of the storage either.  So this helps narrow down the
cause.

Note: We don't have any bare RAID6 so I haven't seen it there: our main fs is xfs on sequential LVM on raid6 (6 x 11-disk sets), and we saw it once on xfs directly on HDD partition.

Is there any info on the work load on these systems?  Lots of small
fils which are added/removed?  Large files which are just written to
and not touched again?

Large files written and not touched again. Most of the time 2-5 concurrent writers but regularly (daily) up to 20-25 concurrent.

I assume finding a bad file with corruption and then doing a cp of the
file keeps the same corruption?

Yep.



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux