Il 17-08-2017 14:41 Roger Heflin ha scritto:
Here is a guess based on what you determined was the cause. The mid-layer does not know the writes were lost. The writes were in the drives write cache (already submitted to the drive and confirmed back to the mid-layer as done, even though they were not yet on the platter), and when the driver lost power and "rebooted" those writes disappeared, the write(s) the mid-layer had in progress and that never got a done from the drive failed were retried and succeeded after the driver reset was completed. In high reliability raid the solution is to turn off that write cache, *but* if you do direct io writes (most databases) with the drives write cache off and no battery backed up cache between the 2 then the drive becomes horribly slow since it must actually write the data to the platter before telling the next level up that the data was safe.
Sure, disabling caching should at least greatly reduce the problem (torn writes remain a problem, but their are inevitable).
However, the entire idea of barriers/cache flushes/FUAs was to *safely enable* unprotected write caches, even in the face of powerloss. Indeed, for full-system powerloss their are adequate. However, device-level micro-powerlosses seem to pose an bigger threat to data reliability.
I suspect that the recurrent "my RAID1 array develops huge amount of mismatch_cnt sectors" question, which is often labeled as "don't worry about RAID1 mismatches", really has a strong tie with this specific problem.
I suggest anyone reading this list to also read the current thread on the linux-scsi list - it is very interesting.
Regards. -- Danti Gionatan Supporto Tecnico Assyoma S.r.l. - www.assyoma.it email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx GPG public key ID: FF5F32A8 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html