Re: md-raid5 with bcache member devices => kernel panic

Kent Overstreet <kmo@xxxxxxxxxxxxx> · Thu, 5 Dec 2013 14:52:34 -0800

On Thu, Dec 05, 2013 at 10:29:13PM +0100, Matthias Ferdinand wrote:
> Hi,
> 
> I am currently experimenting with bcache. The hardware is rather old:
> Intel Core2 6600, 2.4GHz, 8GB RAM. I intend using it as a KVM host. OS
> is Ubuntu 13.10 amd64.
> 
> SSD: single Intel 530 series 120G (SSDSC2BW120A4), i.e. same cache
> device for all backing devices
> 
> But not only is it rather slow, it reliably (but nondeterministically)
> produces kernel panics. It might panic while copying the first VM image
> (dd_rescue), or during startup of the first VM, while the copy process
> for the second VM image (dd_rescue) is already running.
> 
> Tried with different kernels, all produce the panics:
>   - Ubuntu 3.11.0-13.20
>   - kernel.org 3.12.2
>   - kernel.org 3.13-rc2
> 
> Having so many layers on top of bcache may be stupid, but sure it should
> not panic :-)
> 
> You can find the complete serial console output of those crashing runs
> at http://dl.mfedv.net/md5raid_on_bcache_panic/
> 
> I can't see bcache mentioned in those kernel backtraces - perhaps it's
> not really bcaches fault. (there is a single bcache line in the 3.12.2
> trace, though)

Erk. I thought I was done with these bugs. Nick, do you think you could try and
track this down?

Looking at this:
http://dl.mfedv.net/md5raid_on_bcache_panic/mdraid5_on_bcache_panic_3.12.2.txt

that's a null pointer deref; if Matthias could get the exact line number it
happened on we could tell what variable was null. I _think_ it's *sg because
it's running off the end of the scatterlist; if that's the case (and you should
verify that that is what's happening, then what's going on is bcache is sending
down a bio larger than what the device expects.

Assuming that's the case, the bug would be in bch_bio_max_sectors(), which is in
drivers/md/bcache/io.c. Backstory:

In current kernels, the way the block layer works to do an io, you fill out a
struct bio and pass it down: BUT, the bio is not allowed to be bigger than
whatever the device can do atomically as a single request.

The way this is normally done is a filesystem will add data to a bio, a page at
a time, with bio_add_page() (in fs/bio.c); bio_add_page() checks all the device
constraints and may fail, and then the filesystem sends the bio down and starts
making a new bio.

Anyways, bcache doesn't do things this way, because it's braindead and gets
obscenely complicated when you have stacked block devices - instead, when it
goes to submit a bio it first splits the bio if necessary. That
bch_bio_max_sectors() is supposed to check all the same constraints and
essentially replicate the behaviour of building up a bio with bio_add_page().

Matthias - I'm running bcache on top of a raid6 at home and I've never seen
this, so there's probably something unusual about your setup that's required to
trigger this. Can you help Nick out with reproducing the bug and/or getting him
more information?
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html