Re: Filesystem corruption on RAID1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Il 18-08-2017 00:51 Wols Lists ha scritto:
Except that that is not what should be happening. I don't know my hard
drive details, but I believe drives have an instruction "async write
this data and let me know when you have done so".

This should NOT return "yes I've flushed it TO cache". Which is how you
get your problem - the level above thinks it's been safely flushed to
disk (because the disk has said "yes I've got it"), but it then gets
lost because of your power fluctuation. It should only acknowledge it
*after* it's been flushed *from* cache.

And this is apparently exactly what cheap drives do ...

If the level above says "tell me when it's safely on disk", and the
drive truly does as its told, your problem won't happen because the disk
block layer will time out waiting for the acknowledgement and retry the
write.

SATA drives generally guarantee persistent storage on physical medium by issuing *two* different FLUSH_CACHE commands, which do *not* form an atomic operation. In other words, it's not a problem of "cheap drives" or "lying hardware", rather, it seems a specific SATA limitation.

This means the problem can not be solved by simply "buying better disks". Traditional flushing/barrier infrastructure simply has *no* method to ensure an atomic commit at the hardware level, and if something goes wrong between the two flushes, a (small) possibility exists to have corrupted writes without I/O errors reported to the upper layer, even in case of sync() writes. It's basically as a failing DRAM cache, but with *no* real failures...

Newer drivers should implement FUAs, but I don't know if libata alredy uses them by default. Anyway, the disk's firmware is free to split a single FUA in more internal operations, so I am not sure they solves all problems.

I really found the linux-scsi discussion interesting. Give it a look...

Regards.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
GPG public key ID: FF5F32A8
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux