On 10/13/2017 12:59 AM, Alexandr Kuznetsov wrote:
Hi
It looks like probably the superblock of md0p2 and other data structures
were corrupted during the lvm commands, and in turn this is triggering
bugs with bcache (bcache should detect the situation and abort
everything, but instead is left with the bucket_lock held and freezes).
This immediately rises questions about reliability and safety of lvm and
bcache.
Neither is safe if you overwrite the superblock with an errant command.
If you pvcreate'd on the backing device directly, or did something
similarly, that would be expected to go badly.
I thought that lvm is old, mature and safe technology, but here it is
stuck, then manualy interrupted and result is catastrophic data corruption.
lvm sits on top of that sandwich of block devices, on layer of
/dev/bcache* devices. Another question here is how crazy lvm could
damage data outside of /dev/bcache* devices? This means that some
necessary io buffer range checks are missing inside bcache.
I don't know what commands you ran. I've never seen/heard of a bcache
superblock corrupted, and I believe the mappings/shrink are appropriate.
I was not manipulating directly with backing devices or lvm pv's. I was
not doing something illegal from lvm or bcache points of view, otherwise
i would not write here, because then i would know that file system was
killed by myself.
There was only lvcreate and lvremove commands that creates and removes
logical volumes inside lvm, nothing more, there wasn't any direct access
outside of /dev/bcache* devices. Thats why i wrote "This means that some
necessary io buffer range checks are missing inside bcache". So how
bcache allowed to damage data outside of bcache* devices if any access
to them went through bcache, not directly? I'm sure thats a bug. Why
bcache freezes when he meets corrupted data instead of reporting errors?
I'm sure thats a bug.
Sorry, no. Other filesystems / block devices will not behave well if
you overwrite their superblock, either. This is not behavior bcache is
expected to recover gracefully from (though it shouldn't hang).
re: the dirty data in the 100GB part, having a filesystem with a
superblock marked dirty is fine if the cache device is available.
Mike
The cache device is available and it looks fine at first, but it behaves
same way even if cache meets any backing device that marked as "clean".
So i am not sure that everything is fine with caching device. I need at
least to check it, and I ask for appropriate tool for that. Is it exist?
Registering any device (caching or backing) alone behaves nicely... at
first sight. But if bcache tryes to connect cache and backing device
marked "clean" to each other during register process, it hangs. That's
all nasty because i was not doing anything wrong, but i lost my data due
to bugs in both, lvm and bcache :(
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html