Re: bcache failure hangs something in kernel

Michael Lyle <mlyle@xxxxxxxx> · Tue, 14 Nov 2017 11:03:58 -0800

On Tue, Nov 14, 2017 at 10:25 AM, Nix <nix@xxxxxxxxxxxxx> wrote:
> [   11.497914] bad checksum at bucket 28262, block 0, 36185 keys

That's no good-- shouldn't have checksum errors.  It means either the
metadata we wrote got corrupted by the disk, or a metadata write
didn't happen in the order we requested.

> Reboots with the cache enabled always featured a message from bcache an
> instant before reboot saying it had timed out: from the code, the
> timeout is based on a (short!) delay without any concern for whether,
> say, the SSD is in the middle of writing a bunch of data, and the delay
> is way too short for the SSD in question (an ATA-connected DC3510) to
> write more than a GiB or so, a small fraction of the 350GiB I have
> devoted to bcache.

I've seen things hit this couple second timeout before.  It basically
means that garbage collection is busy analyzing stuff on the disk and
doesn't get around to checking the "should I exit now?" flag in time.
Not ideal but relatively harmless.  (It's not trying to write back the
dirty data at this phase or anything).

> I note that the SMART data's bus reset count on the SSD suggests that
> rebooting resets the bus as part of POST (the count of bus resets is
> identical to the count of OS reboots plus firmware upgrades from the
> IPMI event log), which likely halts any ongoing writes.

Even if it did, as long as acknowledged IO is written it's OK.  That
is, it's OK for anything we're trying to write to be lost, as long as
the drive hasn't told us it's done and then later that write gets
"undone".

I think there has to be something somewhat unique to your
environment-- at an environment I used to administrate (before working
on bcache myself), there were about 100 bcache roots in writeback
mode-- and we both unceremoniously lost power with active workload a
couple of times and did several clean shutdowns for upgrades without
losing a volume to corruption (though we did lose many disks that
didn't feel like working at all again after power failure).  And now I
have a bad arc-fault circuit breaker in my home that has dumped power
on my two ext4 root-on bcache-on md machines three times in the past
couple weeks without issue.  Each of my production machines has 15
unsafe shutdowns in smartctl -- a number that I can't quite explain
because I think the real number should be 7-8 or so... and my bcache
development test rig has 145 (!).

Mike
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html