Re: Consistent failure of bcache upgrading from 5.10 to 5.15.2

Coly Li <colyli@xxxxxxx> · Thu, 6 Jan 2022 23:49:06 +0800

On 1/6/22 10:51 AM, Eric Wheeler wrote:
On Tue, 23 Nov 2021, Coly Li wrote:
On 11/20/21 8:06 AM, Eric Wheeler wrote:
Hi Coly, Kai, and Kent, I hope you are well!

On Thu, 18 Nov 2021, Kai Krakow wrote:

Hi Coly!

Reading the commit logs, it seems to come from using a non-default
block size, 512 in my case (although I'm pretty sure that *is* the
default on the affected system). I've checked:
```
dev.sectors_per_block   1
dev.sectors_per_bucket  1024
```

The non-affected machines use 4k blocks (sectors per block = 8).
If it is the cache device with 4k blocks, then this could be a known issue
(perhaps) not directly related to the 5.15 release. We've hit a before:
    https://www.spinics.net/lists/linux-bcache/msg05983.html

and I just talked to Frédéric Dumas this week who hit it too (cc'ed).
His solution was to use manufacturer disk tools to change the cachedev's
logical block size from 4k to 512-bytes and reformat (see below).

We've not seen issues with the backing device using 4k blocks, but bcache
doesn't always seem to make 4k-aligned IOs to the cachedev.  It would be
nice to find a long-term fix; more and more SSDs support 4k blocks, which
is a nice x86 page-alignment and may provide for less CPU overhead.

I think this was the last message on the subject from Kent and Coly:

  > On 2018/5/9 3:59 PM, Kent Overstreet wrote:
  > > Have you checked extent merging?
  >
  > Hi Kent,
  >
  > Not yet. Let me look into it.
  >
  > Thanks for the hint.
  >
  > Coly Li
I tried and I still remember this, the headache is, I don't have a 4Kn SSD to
debug and trace, just looking at the code is hard...

Hi Eric,

The scsi_debug driver can do it:
	modprobe scsi_debug sector_size=4096 dev_size_mb=$((128*1024))

That will give you a 128gb SCSI ram disk with 4k sectors.  If that is
enough for a cache to test against then you could run your super-high-IO
test against it and see what you get.  I would be curious how testing
bcache on the scsi_debug ramdisk in writeback performs!

The dram is not big enough on my testing server....

If anybody can send me (in China to Beijing) a 4Kn SSD to debug and testing,
maybe I can make some progress. Or can I configure the kernel to force a
specific non-4Kn SSD to only accept 4K aligned I/O ?
I think the scsi_debug option above might be cheaper ;)

But seriously, Frédéric who reported this error was using an Intel P3700
if someone (SUSE?) wants to fund testing on real hardware.  <$150 used on
eBay:

Currently all my testing SSDs are supported from Lenovo and Memblaze. I 
tried the hdparm command which Kai Krakow told me, and didn't work out.

Thanks for the hint for Intel P3700, I will try to find some and try to 
reproduce.

I'm not sure how to format it 4k, but this is how Frédéric set it to 512
bytes and fixed his issue:

# intelmas start -intelssd 0 -nvmeformat LBAFormat=0
# intelmas start -intelssd 1 -nvmeformat LBAFormat=0

Copied. Let me try to find Intel P3700 firstly.

Coly Li