Re: bcache failure hangs something in kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/13/2017 12:59 AM, Alexandr Kuznetsov wrote:
> Hi
> 
>> It looks like probably the superblock of md0p2 and other data structures
>> were corrupted during the lvm commands, and in turn this is triggering
>> bugs with bcache (bcache should detect the situation and abort
>> everything, but instead is left with the bucket_lock held and freezes).
> This immediately rises questions about reliability and safety of lvm and
> bcache.

Neither is safe if you overwrite the superblock with an errant command.
If you pvcreate'd on the backing device directly, or did something
similarly, that would be expected to go badly.

> I thought that lvm is old, mature and safe technology, but here it is
> stuck, then manualy interrupted and result is catastrophic data corruption.
> lvm sits on top of that sandwich of block devices, on layer of
> /dev/bcache* devices. Another question here is how crazy lvm could
> damage data outside of /dev/bcache* devices? This means that some
> necessary io buffer range checks are missing inside bcache.

I don't know what commands you ran.  I've never seen/heard of a bcache
superblock corrupted, and I believe the mappings/shrink are appropriate.

> Unfortunately this md0p* block devices are not separate from each other
> - there is one 2Tb volume on top of them inside lvm. Loss of one 100Gib
> part and dirty data in another 100Gib part can kill entire file system
> with very high probability. Yesterday I have read that bcache failures
> are nasty, because file system roots data often resides on cache and is
> dirty on backing device.> Is there any tool like fsck exist, that can check and may be try to
> recover data from caching and backing devices? Or developers can get
> this corrupted images to experiment for bugfixing?
Sorry, no.  Other filesystems / block devices will not behave well if
you overwrite their superblock, either.  This is not behavior bcache is
expected to recover gracefully from (though it shouldn't hang).

re: the dirty data in the 100GB part, having a filesystem with a
superblock marked dirty is fine if the cache device is available.

Mike

> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux