Re: Bad/Unaligned block number requested

Eric Wheeler <bcache@xxxxxxxxxxxxxxxxxx> · Thu, 26 Apr 2018 20:32:22 +0000 (UTC)

On Thu, 26 Apr 2018, Eric Wheeler wrote:

> On Fri, 20 Apr 2018, Coly Li wrote:
> > On 2018/4/20 12:29 AM, Eric Wheeler wrote:
> > > On Thu, 19 Apr 2018, Coly Li wrote:
> > > 
> > >> On 2018/4/19 8:12 AM, Eric Wheeler wrote:
> > >>> On Tue, 17 Apr 2018, Coly Li wrote:
> > >>>
> > >>>> On 2018/4/17 6:45 AM, Eric Wheeler wrote:
> > >>>>> On Sat, 14 Apr 2018, Coly Li wrote:
> > >>>>>
> > >>>>>> On 2018/4/14 6:46 AM, Eric Wheeler wrote:
> > >>>>>>> Hello all,
> > >>>>>>>
> > >>>>>>> We are running bcache in 4.1.49 with both the cache and backing device 
> > >>>>>>> having 4k blocks. The disk stack is DRBD->dm-thin->bcache->[sdc->sdb] 
> > >>>>>>> Where sdc is the cache.
> > >>>>>>>
> > >>>>>>> Sometimes we get errors like the following:
> > >>>>>>>
> > >>>>>>> [432015.934869] block drbd8065: Began resync as SyncTarget (will sync 880 KB [220 bits set]).
> > >>>>>>> [432015.949469] sd 0:0:0:1: [sdb] Unaligned block number requested: sector_size=4096, block=15724561783, blk_rq=9
> > >>>>>>> [432015.950347] sd 0:0:0:2: [sdc] Unaligned block number requested: sector_size=4096, block=353041040, blk_rq=7
> > >>>>>>> [432015.951146] bcache: bch_count_io_errors() dm-6: IO error on reading from cache, recovering
> > >>>>>>> [432015.952015] block drbd8065: read: error=-5 s=19281488s
> > >>>>>>> [432015.952866] block drbd8065: Local IO failed in drbd_endio_read_sec_final.
> > >>>>>>> [432015.953777] sd 0:0:0:2: [sdc] Unaligned block number requested: sector_size=4096, block=387084784, blk_rq=7
> > >>>>>>> [432015.954710] bcache: bch_count_io_errors() dm-6: IO error on reading from cache, recovering
> > >>>>>>> [432015.959037] sd 0:0:0:1: [sdb] Unaligned block number requested: sector_size=4096, block=15725385535, blk_rq=1
> > >>>>>>> [432015.959938] block drbd8065: read: error=-5 s=19391384s
> > >>>>>>> [432015.960862] block drbd8065: Local IO failed in drbd_endio_read_sec_final.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Note that 15724561783 is not divisible by 8, thus it is unaligned to 4k 
> > >>>>>>> blocks.
> > >>>>>>>
> > >>>>>>> Does anyone know if the bcache code is enforcing correct alignment?
> > >>>>>>>
> > >>>>>>> Is there any way that bcache could introduce misalignment?
> > >>>>>>>
> > >>>>>>> We ran blockdev --getbsz and --getpbsz all the way down the stack and 
> > >>>>>>> everything reports 4k.
> > >>>>>>
> > >>>>>> Hi Eric,
> > >>>>>>
> > >>>>>> Do you use 4.1 stable tree, or with your extra patches ? It would be
> > >>>>>> helpful if I may access your kernel tree.
> > >>>>>>
> > >>>>>> So far I cannot tell where the problem is, I just feel there might be
> > >>>>>> some hidden issue triggered by 4KB sector size hard drive. Maybe adding
> > >>>>>> a garden code to detect unaligned I/O request from bcache will be
> > >>>>>> helpful to diagnose the root cause.
> > >>>>>
> > >>>>> Hi Coly,
> > >>>>>
> > >>>>> I just pushed the branch that we built for this system to bitbucket:
> > >>>>> 	https://bitbucket.org/ewheelerinc/linux/branch/ewi-4.1.49-rpmbuild
> > >>>>>
> > >>>>> There are a few changes, but probably nothing that you haven't seen:
> > >>>>>
> > >>>>> 1. We use the BFQ scheduler (but the problem presents in CFQ also)
> > >>>>> 2. dmthin fixes were backported from 4.2/4.3 to fix pool space issues on 
> > >>>>> rolling snapshots
> > >>>>> 3. bcache and dmcrypt have my ioprio patches, but no dmcrypt in this bug
> > >>>>> 4. ibrs patches from Oracle are included for Spectre mitigation
> > >>>>> 5. We attempted to use Mauricio's patch to fix 4k alignment issues, which 
> > >>>>> shows in our tree as 4a595ccc, but it did not fix the issue.
> > >>>>> 6. We updated the error strings in sd.c with our commits aa372d91 and 
> > >>>>> f603bf7e while troubleshooting this issue.
> > >>>>>
> > >>>>> Thank you for your help, let me know if there is anything else that you 
> > >>>>> need to help troubleshoot. I can produce this pretty reliably at the 
> > >>>>> moment.
> > >>>>
> > >>>> Eric,
> > >>>>
> > >>>> Is it possible to apply the attached patch ? This patch will dump call
> > >>>> trace if the request to backing device is not 4KB aligned. Then we can
> > >>>> see what happens indeed.
> > >>>
> > >>>
> > >>> Thanks Coly.  It produced this trace.  Can you tell what it is doing?
> > >>>
> > >>> [   81.426439] bcache: check_4k_alignment() PTR_OFFSET is not 4KB aligned: 4368280612101
> > >>> [   81.426439] CPU: 4 PID: 37 Comm: kworker/4:0 Tainted: G        W  O    4.1.49-2.el7.x86_64 #1
> > >>> [   81.426440] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.10 01/09/2014
> > >>> [   81.426443] Workqueue: bcache bch_data_insert_keys [bcache]
> > >>> [   81.426444]  0000000000000286 c3f07f1af1e6f322 ffff88080ba276d8 ffffffff816ff534
> > >>> [   81.426445]  0000000000000000 ffff8807f8d5d4e8 ffff88080ba27728 ffffffffa057bfa4
> > >>> [   81.426445]  ffff88080ba27718 00000000812a17ec 0000000000000086 ffff8807f8d5d4e8
> > >>> [   81.426446] Call Trace:
> > >>> [   81.426447]  [<ffffffff816ff534>] dump_stack+0x63/0x81
> > >>> [   81.426450]  [<ffffffffa057bfa4>] __bch_cut_front+0x1a4/0x1b0 [bcache]
> > >>> [   81.426454]  [<ffffffffa0585cb3>] bch_extent_insert_fixup+0x593/0x680 [bcache]
> > >>> [   81.426458]  [<ffffffffa057c347>] ? __bch_btree_iter_init+0x77/0xc0 [bcache]
> > >>> [   81.426461]  [<ffffffffa057c474>] bch_btree_insert_key+0xc4/0x320 [bcache]
> > >>> [   81.426465]  [<ffffffffa057ddae>] btree_insert_key+0x5e/0x120 [bcache]
> > >>> [   81.426468]  [<ffffffffa057e6f2>] bch_btree_insert_keys+0x62/0x250 [bcache]
> > >>> [   81.426472]  [<ffffffffa0581c4c>] bch_btree_insert_node+0x13c/0x3d0 [bcache]
> > >>> [   81.426476]  [<ffffffffa0582e88>] btree_insert_fn+0x28/0x50 [bcache]
> > >>> [   81.426479]  [<ffffffffa05804f4>] bch_btree_map_nodes_recurse+0x54/0x190 [bcache]
> > >>> [   81.426482]  [<ffffffffa0582e60>] ? bch_btree_insert_check_key+0x1c0/0x1c0 [bcache]
> > >>> [   81.426483]  [<ffffffff817051a6>] ? down_write+0x16/0x50
> > >>> [   81.426487]  [<ffffffffa0580278>] ? bch_btree_node_get+0x78/0x2a0 [bcache]
> > >>> [   81.426490]  [<ffffffffa05805a5>] bch_btree_map_nodes_recurse+0x105/0x190 [bcache]
> > >>> [   81.426494]  [<ffffffffa0582e60>] ? bch_btree_insert_check_key+0x1c0/0x1c0 [bcache]
> > >>> [   81.426497]  [<ffffffffa05802d5>] ? bch_btree_node_get+0xd5/0x2a0 [bcache]
> > >>> [   81.426500]  [<ffffffffa05805a5>] bch_btree_map_nodes_recurse+0x105/0x190 [bcache]
> > >>> [   81.426504]  [<ffffffffa0582e60>] ? bch_btree_insert_check_key+0x1c0/0x1c0 [bcache]
> > >>> [   81.426507]  [<ffffffffa058376c>] __bch_btree_map_nodes+0x11c/0x150 [bcache]
> > >>> [   81.426510]  [<ffffffffa0582e60>] ? bch_btree_insert_check_key+0x1c0/0x1c0 [bcache]
> > >>> [   81.426514]  [<ffffffffa0583894>] bch_btree_insert+0xf4/0x170 [bcache]
> > >>> [   81.426515]  [<ffffffff810e7000>] ? prepare_to_wait_event+0xf0/0xf0
> > >>> [   81.426519]  [<ffffffffa058b2be>] bch_data_insert_keys+0x3e/0x160 [bcache]
> > >>> [   81.426520]  [<ffffffff810bac60>] process_one_work+0x150/0x450
> > >>> [   81.426521]  [<ffffffff810bb8c2>] worker_thread+0x112/0x530
> > >>> [   81.426522]  [<ffffffff810bb7b0>] ? rescuer_thread+0x3e0/0x3e0
> > >>> [   81.426523]  [<ffffffff810c10d8>] kthread+0xd8/0xf0
> > >>> [   81.426524]  [<ffffffff810c1000>] ? kthread_create_on_node+0x1b0/0x1b0
> > >>> [   81.426525]  [<ffffffff817074d2>] ret_from_fork+0x42/0x70
> > >>> [   81.426525]  [<ffffffff810c1000>] ? kthread_create_on_node+0x1b0/0x1b0
> > >>
> > >> Hi Eric,
> > >>
> > >> I will post another patch for your test. This patch only provides
> > >> information of LBA offset on cache device, backing device LBA check is
> > >> missing. For cache device, 4KB align is not hardware mandatory IMHO.
> > >>
> > >> Will post an update patch soon.
> > > 
> > > Thank you Coly!
> > > 
> > > Note that our cache and backing device both have 4k logical blocks.
> > 
> > Aha, thanks for the hint. So it is good to check 4K alignment for both
> > cache and cached devices :-)
> 
> Hi Coly,
> 
> I'm sure you've been busy with the v4.17 merge but I thought I 
> would check in:  
> 
> Have you had a chance to look at this?  It is an opportunity to fix this 
> 4k bug for the future since we can still reproduce the error.
> 
> >>> bcache: bch_count_io_errors() dm-6: IO error on reading from cache, recovering <<<
> 
> What do you think, is there data corruption exposure here since 4.1.49 
> still has the dirty-cache-recorvery bug? 

I just noticed that "bcache: only permit to recovery read error when cache 
device is clean" is in v4.1.49, but would this recover gracefully in the 
4k error situation?

> Also, would your failure-recovery patch series address this type of 
> failure?

-Eric 

> 
> --
> Eric Wheeler
> 
> 
> > 
> > Coly Li
> > 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html