Re: add volatile flag to PV/LVs (for cache) to avoid degraded state on reboot

lists.linux.dev@xxxxxxxxx · Sat, 20 Jan 2024 20:29:40 +0000

On Thu, Jan 18, 2024 at 04:40:47PM +0100, Zdenek Kabelac wrote:
> Cache can contain blocks that are still being 'synchronized' to the cache
> origin. So while the 'writing' process doesn't get ACK for writes - the
> cache
> may have valid blocks that are 'dirty' in terms of being synchronized to
> origin device.
> 
> And while this is usually not a problem when system works properly,
> it's getting into weird 'state machine' model when i.e. origin device has
> errors - which might be even 'transient' with all the variety of storage
> types and raid arrays with integrity and self-healing and so on...
> 
> So while it's usually not a problem for a laptop with 2 disks, the world is
> more complex...

Ehm, but wouldn't anything other than discarding that block from the cache and using whatever is on the backing storage introduce unpredictable errors?
As like you already said it was never ACKed, so the software that tried to write it never expected it to be written.
Why exactly are we allowed to use the data from the write-through cache to modify the data on the backing storage in such cases?
I.E. Why can we safely consider it as valid data?

> metadata - so if there is again some 'reboot' and PV with cache appears back
> - it will not interfere with the system (aka providing some historical
> cached blocks,  so just like mirrored leg needs some care...)

Same here, why do we have to consider these blocks at all and can't discard them? We know when a drive re-appears, so we could just not use it without validation, or in the case the volatile flag I suggested would be used, just wipe it and start over...

After all I don't know anyone that designs their storage systems with the assumption that the write-through cache has to be redundant.
Even more, I know enough people in data center environments that reuse their "failing but still kinda good" SSDs and NVMEs for write-through caches using the assumption that them failing at most impacts read performance but not data security.

Is there some common missconception at play? Or what exaclty am I missing here?

Sincerely,
Klaus Frank