On Thu, Aug 17, 2017 at 3:23 AM, Gionatan Danti <g.danti@xxxxxxxxxx> wrote: > On 14/07/2017 12:46, Gionatan Danti wrote:> Hi, so a premature/preventive > drive detachment is not a silver bullet, > but is this the right solution)? > - how to deal with this problem (other than being 100% sure power is never > lost by any disks)? > > Thank you all, > regards. > Here is a guess based on what you determined was the cause. The mid-layer does not know the writes were lost. The writes were in the drives write cache (already submitted to the drive and confirmed back to the mid-layer as done, even though they were not yet on the platter), and when the driver lost power and "rebooted" those writes disappeared, the write(s) the mid-layer had in progress and that never got a done from the drive failed were retried and succeeded after the driver reset was completed. In high reliability raid the solution is to turn off that write cache, *but* if you do direct io writes (most databases) with the drives write cache off and no battery backed up cache between the 2 then the drive becomes horribly slow since it must actually write the data to the platter before telling the next level up that the data was safe. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html