size limit for backing store? block sizes? ("[sdx] Bad block number requested")

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi *,

we're facing an obsure problem with a fresh bcache setup:

After creating a 8TB (netto) RAID5 device (hardware RAID controller), setting it up for bcache (using an existing cache set) and populating it with data, we got struck by massive dmesg reports of "[sdx] Bad block number requested" during writeback of dirty data. Both with our 4.1.x kernel, as well as a 4.9.8 kernel.

After recreating the backing store with 3 TB (netto) and recreating the bcache setup, population went without any noticable errors.

While the 8TB device was populated with only the same amount of data (2.7 TB), block placement was probably across all of the 8TB space available.

Another parameter catching the eye is block sizes - the 8 TB backing store was created in a way such that 4k block size was exposed to the OS, while the 3 TB backing store was created so that 512b block size was reported. The caching set is on a PCI SSD with 512b block size.

So with backing:4k and cache:512b and 8 TB backing store size, bcache went mad during writeback ("echo 0 > writeback_running" immediately made the messages stop). With backing:512b and cache:512b and 3 TB backing store size, we had no error reports at all.

On a second node, we have (had) a similar situation - backing:4k and cache:512b, but 4 TB backing store size. We've seen the errors there, too, when accessing an especially big logical volume that likely crossed some magic limit (block number on "physical volume"?). We still see the message there today, only much less frequent since we no longer use that large volume on the bcache device. Other volumes are there now, probably with a few data spaces at high block numbers, leading to the occasional error message (every few minutes) during writeback?

Even more puzzling, we have a third node, identical to the latter one - except that the bcache device is more filled with data and we see no such error (yet)...

So here we are - what are we facing? Is it a size limit regarding the backing store? Or does the error result from mixing block sizes, plus some other triggers?

If the former, where's the limit?

If it is about block sizes, questions pile up: Are the "dos" and "don'ts" documented anywhere? It's a rather common situation for us to run multiple backing devices on a single cache set, with both complete HDDs and logical volumes as backing stores. So it's very easy to come into a situation where we see either different block sizes between backing store and caching device or even differing block sizes between the various backing stores.

- using 512b for cache and 4k for backing device seems not to work, unless above is purely a size limit problem

- 512b for cache and 512b for backing store seems to work

- 4k for cache and 4k for backing store will probably work as well

- will 4k for cache and 512b for backing store work (sounds likely, as there will be no alignment problem in the backing store. OTOH, will bcache try to write 4k data (cache block) into 512b blocks (backing store) or will it write 8 blocks then, mapping the block size differences?)

- if the latter works, will using both 4k and 512b backing stores in parallel work if using a 4k cache?

Any insight and/or help tracking down the error are most welcome!

Regards,
Jens

--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux