Re: bcache failure hangs something in kernel

Nix <nix@xxxxxxxxxxxxx> · Fri, 17 Nov 2017 20:13:36 +0000

On 14 Nov 2017, Michael Lyle stated:

> On Tue, Nov 14, 2017 at 10:25 AM, Nix <nix@xxxxxxxxxxxxx> wrote:
>> [   11.497914] bad checksum at bucket 28262, block 0, 36185 keys
>
> That's no good-- shouldn't have checksum errors.  It means either the
> metadata we wrote got corrupted by the disk, or a metadata write
> didn't happen in the order we requested.

Ugh!!! That would cause definite problems for any fs...

>> is way too short for the SSD in question (an ATA-connected DC3510) to
>> write more than a GiB or so, a small fraction of the 350GiB I have
>> devoted to bcache.
>
> I've seen things hit this couple second timeout before.  It basically
> means that garbage collection is busy analyzing stuff on the disk and
> doesn't get around to checking the "should I exit now?" flag in time.

Note that at its peak the cache had 120GiB of stuff in it. The cache is
350GiB. I find it hard to understand why GC would be running at all,
let alone taking ages to do anything.

> Even if it did, as long as acknowledged IO is written it's OK.  That
> is, it's OK for anything we're trying to write to be lost, as long as
> the drive hasn't told us it's done and then later that write gets
> "undone".
>
> I think there has to be something somewhat unique to your
> environment-- at an environment I used to administrate (before working

Oh I'm sure it is. One of the uniquenesses is that my shutdown procedure
is gross: kill as many processes as possible, toposort and lazily
unmount everything, wait a bit, sync, wait a bit more, reboot... nothing
saner seems to work reliably in the presence of the maze of bind mounts
and unshared fs hierarchies on my system.

Hence my plan to revisit this and redesign it so it can reliably unmount
everything, pivot to an initramfs, unmount the root, and stop the bcache
before I try to enable the caches again.

(There is nothing unusual about the hardware, and the storage stack is
just WD disks -> partitions -> md6 -> bcache -> LVM PV (and then xfs and
LUKSed xfs in that). The LVM PV is part of a VG that extends over
unbcached md6 too. The SSD is just partitioned with one partition
devoted to a cache device. No unusual controllers or anything, just
ordinary Intel S2600CWTR built-in mobo ATA stuff.)

> have a bad arc-fault circuit breaker in my home that has dumped power
> on my two ext4 root-on bcache-on md machines three times in the past
> couple weeks without issue.  Each of my production machines has 15
> unsafe shutdowns in smartctl -- a number that I can't quite explain
> because I think the real number should be 7-8 or so... and my bcache
> development test rig has 145 (!).

Hm. Maybe I should re-enable it and see what happens? If it goes wrong,
if there anything I can do with the wreckage to help track this down?
(In particular the wreckage left on the cache device after I've flipped
it back into none mode?)

-- 
NULL && (void)
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html