Re: Consistent failure of bcache upgrading from 5.10 to 5.15.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 23 Nov 2021, Coly Li wrote:
> On 11/20/21 8:06 AM, Eric Wheeler wrote:
> > Hi Coly, Kai, and Kent, I hope you are well!
> >
> > On Thu, 18 Nov 2021, Kai Krakow wrote:
> >
> >> Hi Coly!
> >>
> >> Reading the commit logs, it seems to come from using a non-default
> >> block size, 512 in my case (although I'm pretty sure that *is* the
> >> default on the affected system). I've checked:
> >> ```
> >> dev.sectors_per_block   1
> >> dev.sectors_per_bucket  1024
> >> ```
> >>
> >> The non-affected machines use 4k blocks (sectors per block = 8).
> > If it is the cache device with 4k blocks, then this could be a known issue
> > (perhaps) not directly related to the 5.15 release. We've hit a before:
> >    https://www.spinics.net/lists/linux-bcache/msg05983.html
> >
> > and I just talked to Frédéric Dumas this week who hit it too (cc'ed).
> > His solution was to use manufacturer disk tools to change the cachedev's
> > logical block size from 4k to 512-bytes and reformat (see below).
> >
> > We've not seen issues with the backing device using 4k blocks, but bcache
> > doesn't always seem to make 4k-aligned IOs to the cachedev.  It would be
> > nice to find a long-term fix; more and more SSDs support 4k blocks, which
> > is a nice x86 page-alignment and may provide for less CPU overhead.
> >
> > I think this was the last message on the subject from Kent and Coly:
> >
> >  > On 2018/5/9 3:59 PM, Kent Overstreet wrote:
> >  > > Have you checked extent merging?
> >  >
> >  > Hi Kent,
> >  >
> >  > Not yet. Let me look into it.
> >  >
> >  > Thanks for the hint.
> >  >
> >  > Coly Li
> 
> I tried and I still remember this, the headache is, I don't have a 4Kn SSD to
> debug and trace, just looking at the code is hard...

The scsi_debug driver can do it:
	modprobe scsi_debug sector_size=4096 dev_size_mb=$((128*1024)) 

That will give you a 128gb SCSI ram disk with 4k sectors.  If that is 
enough for a cache to test against then you could run your super-high-IO 
test against it and see what you get.  I would be curious how testing 
bcache on the scsi_debug ramdisk in writeback performs!

> If anybody can send me (in China to Beijing) a 4Kn SSD to debug and testing,
> maybe I can make some progress. Or can I configure the kernel to force a
> specific non-4Kn SSD to only accept 4K aligned I/O ?

I think the scsi_debug option above might be cheaper ;) 

But seriously, Frédéric who reported this error was using an Intel P3700 
if someone (SUSE?) wants to fund testing on real hardware.  <$150 used on 
eBay: 

I'm not sure how to format it 4k, but this is how Frédéric set it to 512 
bytes and fixed his issue:

# intelmas start -intelssd 0 -nvmeformat LBAFormat=0
# intelmas start -intelssd 1 -nvmeformat LBAFormat=0

-Eric


> 
> Coly Li
> 
> 
> 
> 

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux