Re: bcache fails after reboot if discard is enabled

Dan Merillat <dan.merillat@xxxxxxxxx> · Wed, 29 Apr 2015 13:48:38 -0400

Killed it again - enabled bcache discard, copied a few TB of data from
the backup the the drive, rebooted, different error
"bcache: bch_cached_dev_attach() Couldn't find uuid for <REDACTED> in set"

The exciting failure that required reboot this time was an infinite
spin in bcache_writeback.

I'll give it another shot at narrowing down exactly what causes the
failure before I give up on bcache entirely.

On Sun, Apr 12, 2015 at 1:56 AM, Dan Merillat <dan.merillat@xxxxxxxxx> wrote:
> On Sat, Apr 11, 2015 at 4:09 PM, Kai Krakow <hurikhan77@xxxxxxxxx> wrote:
>
>> With this knowledge, I guess that bcache could probably detect its backing
>> device signature twice - once through the underlying raw device and once
>> through the md device. From your logs I'm not sure if they were complete
>
> It doesn't, the system is smarter than you think it is.
>
>> enough to see that case. But to be sure I'd modify the udev rules to exclude
>> the md parent devices from being run through probe-bcache. Otherwise all
>> sorts of strange things may happen (like one process accessing the backing
>> device through md, while bcache access it through the parent device -
>> probably even on different mirror stripes).
>
> This didn't occur, I copied all the lines pertaining to bcache but
> skipped the superfluous ones.
>
>> It's your setup, but personally I'd avoid MD for that reason and go with
>> lvm. MD is just not modern, neither appropriate for modern system setups. It
>> should really be just there for legacy setups and migration paths.
>
> Not related to bcache at all.  Perhaps complain about MD on the
> appropriate list?  I'm not seeing any evidence that MD had anything to
> do with this, especially since the issues with bcache are entirely
> confined to the direct SATA access to /dev/sda4.
>
> In that vein, I'm reading the on-disk format of bcache and seeing
> exactly what's still valid on my system.  It looks like I've got
> 65,000 good buckets before the first bad one.  My idea is to go
> through, look for valid data in the buckets and use a COW in
> user-mode-linux to write that data back to the (copy-on-write version
> of) the backing device.  Basically, anything that passes checksum and
> is still 'dirty', force-write-it-out.  Then see what the status of my
> backing-store is.  If it works, do it outside UML to the real backing
> store.
>
> Are there any diagnostic tools outside the bcache-tools repo? Not much
> there other than show the superblock info.  Otherwise I'll just finish
> writing it myself.
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html