>>>>> "Alex" == Alex Lieflander <atlief@xxxxxxxxxx> writes: >> On May 7, 2022, at 4:41 PM, Stuart D Gathman wrote: >> >>> On Fri, 6 May 2022, Alex Lieflander wrote: >>> >>> Thanks. I really don’t want to give up the DM-Integrity management. Less complexity is just a bonus. >> >> What are you trying to get out of RAID6? If redundancy and integrity >> are already managed at another layer, then just use RAID0 for striping. >> >> I like to use RAID10 for mirror + striping, but I understand parity disks give redundancy without halving capacity. Parity means RMW cycles of >> largish blocks, whereas straight mirroring (RAID1, RAID10) can write >> single sectors without a RMW cycle. Alex> I don’t trust the hardware I’m running on very much, but it’s Alex> all I have to work with at the moment; it’s important that the Alex> array is resilient to *any* (and multiple) single chunk Alex> corruptions because such corruptions are likely to happen in the Alex> future. Ouch! I hope you have good backups somewhere, because I suspect you're doing to suffer a complete failure at some point. Alex> For the last several months I’ve periodically been seeing Alex> (DM-Integrity) checksum mismatch warnings at various locations Alex> on all of my disks. I stopped using a few SATA ports that were Alex> explicitly throwing SATA errors, but I suspect that the Alex> remaining connections are unpredictably (albeit infrequently) Alex> corrupting data in ways that are more difficult to detect. This is interesting. And worrisome, because I would not expect moving from one SATA port to another to cure problems, unless it was A) moving to a different controller, or B) you changed/reseated the SATA cable. But I also wonder about your power supply and what it's rated for. You might just be hitting the ragged edge of what it can supply, and so you're running into problems with voltage dropping just enough to make things slightly flaky. Alex> I’ve tried to “check” and “repair” my array on multiple kernel Alex> versions and live recovery USB sticks, but the “check" always Alex> seems to freeze and all subsequent IO to the array hangs until Alex> reboot; at the moment, a chunk is only ever made consistent when Alex> its data is overwritten, so it needs to survive periodic, random Alex> corruption for as long as possible. This is also a warning to my that maybe you have power supply issues. Can you give a summary of your hardware configuration and model numbers? If you're running a smallish power supply, maybe look for a replacement which can get you more power. Go from a 430W one to 600W, or 500W to 750W and see if that makes a difference. Looking at your data from before, I see you have 12 disks on the system, 11 spinning disks and one nvme device. So I *really* suspect you have an overloaded power supply. Are you also using a disk controller? And which version of linux? Alex> I also have a disk that infrequently fails to read from a Alex> particular area, but the rest of the disk is fine. I wouldn’t Alex> trust that disk with valuable data, but it seems like a perfect Alex> candidate to hold additional parity (raid6_ls_6) that I Alex> hopefully never need. This is not how RAID6 parity works. The entire disk (or partition) is used to write data and/or parity. It's RAID4 which dedicates a single disk to parity duties. So thinking that a known flaky disk will be ok for just parity use isn't really a good idea. I'd also look at the output of 'smartctl --all /dev/sd<letter>' for all your disks and see what the numbers say. But honestly, it sounds like you have some serious hardware issues which you're trying to paper over with DM-Integrity and RAID5. And I suspect it will all end in tears sooner or later. You do have backups of your data, right? Even onto a single new 10tb disk that's now connected to the system all the time? Good luck, John _______________________________________________ linux-lvm mailing list linux-lvm@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/linux-lvm read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/