Re: resend patch - bcache may mistakenly write data to another disk when writes error

Joe Thornber <thornber@xxxxxxxxxx> · Wed, 23 Oct 2019 22:31:01 +0100

On Tue, Oct 22, 2019 at 09:47:32AM +0000, Heming Zhao wrote:
> Hello List & David,
> 
> This patch is responsible for legacy mail:
>  pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
> 
> I had send it to our customer, the code ran as expected. I think this code is enough to fix this issue.
> 
> Thanks
> zhm
> 
> ------(patch for branch stable-2.02) ----------
>  From d0d77d0bdad6136c792c9664444d73dd47b809cb Mon Sep 17 00:00:00 2001
> From: Zhao Heming <heming.zhao@xxxxxxxx>
> Date: Tue, 22 Oct 2019 17:22:17 +0800
> Subject: [PATCH] bcache may mistakenly write data to another disk when writes
>   error
> 
> When bcache write data error, the errored fd and its data is saved in
> cache->errored, then this fd is closed. Later lvm will reuse this
> closed fd to new opened devs, but the fd related data still in
> cache->errored and flags with BF_DIRTY. It make the data may mistakenly
> write to another disk.

I think real issue here is that the flush fails, and the error path for that
calls invalidate dev, which also fails, but that return value is not checked.
The fd is subsequently closed, and reopened with data still in the cache.

So I think the correct fix is to have a variant of invalidate, that doesn't
bother retrying the IO, and just throws away the dirty data.  bcache_abort()?
This should be called when the flush() fails.

- Joe

_______________________________________________
linux-lvm mailing list
linux-lvm@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-lvm
read the LVM HOW-TO at http://tldp.org/HOWTO/LVM-HOWTO/