On 2018/4/27 4:32 AM, Eric Wheeler wrote: > On Thu, 26 Apr 2018, Eric Wheeler wrote: > >> On Fri, 20 Apr 2018, Coly Li wrote: >>> On 2018/4/20 12:29 AM, Eric Wheeler wrote: >>>> On Thu, 19 Apr 2018, Coly Li wrote: >>>> >>>>> On 2018/4/19 8:12 AM, Eric Wheeler wrote: >>>>>> On Tue, 17 Apr 2018, Coly Li wrote: >>>>>> >>>>>>> On 2018/4/17 6:45 AM, Eric Wheeler wrote: >>>>>>>> On Sat, 14 Apr 2018, Coly Li wrote: >>>>>>>> >>>>>>>>> On 2018/4/14 6:46 AM, Eric Wheeler wrote: >>>>>>>>>> Hello all, >>>>>>>>>> >>>>>>>>>> We are running bcache in 4.1.49 with both the cache and backing device >>>>>>>>>> having 4k blocks. The disk stack is DRBD->dm-thin->bcache->[sdc->sdb] >>>>>>>>>> Where sdc is the cache. >>>>>>>>>> >>>>>>>>>> Sometimes we get errors like the following: >>>>>>>>>> >>>>>>>>>> [432015.934869] block drbd8065: Began resync as SyncTarget (will sync 880 KB [220 bits set]). >>>>>>>>>> [432015.949469] sd 0:0:0:1: [sdb] Unaligned block number requested: sector_size=4096, block=15724561783, blk_rq=9 >>>>>>>>>> [432015.950347] sd 0:0:0:2: [sdc] Unaligned block number requested: sector_size=4096, block=353041040, blk_rq=7 >>>>>>>>>> [432015.951146] bcache: bch_count_io_errors() dm-6: IO error on reading from cache, recovering >>>>>>>>>> [432015.952015] block drbd8065: read: error=-5 s=19281488s >>>>>>>>>> [432015.952866] block drbd8065: Local IO failed in drbd_endio_read_sec_final. >>>>>>>>>> [432015.953777] sd 0:0:0:2: [sdc] Unaligned block number requested: sector_size=4096, block=387084784, blk_rq=7 >>>>>>>>>> [432015.954710] bcache: bch_count_io_errors() dm-6: IO error on reading from cache, recovering >>>>>>>>>> [432015.959037] sd 0:0:0:1: [sdb] Unaligned block number requested: sector_size=4096, block=15725385535, blk_rq=1 >>>>>>>>>> [432015.959938] block drbd8065: read: error=-5 s=19391384s >>>>>>>>>> [432015.960862] block drbd8065: Local IO failed in drbd_endio_read_sec_final. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Note that 15724561783 is not divisible by 8, thus it is unaligned to 4k >>>>>>>>>> blocks. >>>>>>>>>> >>>>>>>>>> Does anyone know if the bcache code is enforcing correct alignment? >>>>>>>>>> >>>>>>>>>> Is there any way that bcache could introduce misalignment? >>>>>>>>>> >>>>>>>>>> We ran blockdev --getbsz and --getpbsz all the way down the stack and >>>>>>>>>> everything reports 4k. >>>>>>>>> >>>>>>>>> Hi Eric, >>>>>>>>> >>>>>>>>> Do you use 4.1 stable tree, or with your extra patches ? It would be >>>>>>>>> helpful if I may access your kernel tree. >>>>>>>>> >>>>>>>>> So far I cannot tell where the problem is, I just feel there might be >>>>>>>>> some hidden issue triggered by 4KB sector size hard drive. Maybe adding >>>>>>>>> a garden code to detect unaligned I/O request from bcache will be >>>>>>>>> helpful to diagnose the root cause. >>>>>>>> >>>>>>>> Hi Coly, >>>>>>>> >>>>>>>> I just pushed the branch that we built for this system to bitbucket: >>>>>>>> https://bitbucket.org/ewheelerinc/linux/branch/ewi-4.1.49-rpmbuild >>>>>>>> >>>>>>>> There are a few changes, but probably nothing that you haven't seen: >>>>>>>> >>>>>>>> 1. We use the BFQ scheduler (but the problem presents in CFQ also) >>>>>>>> 2. dmthin fixes were backported from 4.2/4.3 to fix pool space issues on >>>>>>>> rolling snapshots >>>>>>>> 3. bcache and dmcrypt have my ioprio patches, but no dmcrypt in this bug >>>>>>>> 4. ibrs patches from Oracle are included for Spectre mitigation >>>>>>>> 5. We attempted to use Mauricio's patch to fix 4k alignment issues, which >>>>>>>> shows in our tree as 4a595ccc, but it did not fix the issue. >>>>>>>> 6. We updated the error strings in sd.c with our commits aa372d91 and >>>>>>>> f603bf7e while troubleshooting this issue. >>>>>>>> >>>>>>>> Thank you for your help, let me know if there is anything else that you >>>>>>>> need to help troubleshoot. I can produce this pretty reliably at the >>>>>>>> moment. >>>>>>> >>>>>>> Eric, >>>>>>> >>>>>>> Is it possible to apply the attached patch ? This patch will dump call >>>>>>> trace if the request to backing device is not 4KB aligned. Then we can >>>>>>> see what happens indeed. >>>>>> >>>>>> >>>>>> Thanks Coly. It produced this trace. Can you tell what it is doing? >>>>>> >>>>>> [ 81.426439] bcache: check_4k_alignment() PTR_OFFSET is not 4KB aligned: 4368280612101 >>>>>> [ 81.426439] CPU: 4 PID: 37 Comm: kworker/4:0 Tainted: G W O 4.1.49-2.el7.x86_64 #1 >>>>>> [ 81.426440] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.10 01/09/2014 >>>>>> [ 81.426443] Workqueue: bcache bch_data_insert_keys [bcache] >>>>>> [ 81.426444] 0000000000000286 c3f07f1af1e6f322 ffff88080ba276d8 ffffffff816ff534 >>>>>> [ 81.426445] 0000000000000000 ffff8807f8d5d4e8 ffff88080ba27728 ffffffffa057bfa4 >>>>>> [ 81.426445] ffff88080ba27718 00000000812a17ec 0000000000000086 ffff8807f8d5d4e8 >>>>>> [ 81.426446] Call Trace: >>>>>> [ 81.426447] [<ffffffff816ff534>] dump_stack+0x63/0x81 >>>>>> [ 81.426450] [<ffffffffa057bfa4>] __bch_cut_front+0x1a4/0x1b0 [bcache] >>>>>> [ 81.426454] [<ffffffffa0585cb3>] bch_extent_insert_fixup+0x593/0x680 [bcache] >>>>>> [ 81.426458] [<ffffffffa057c347>] ? __bch_btree_iter_init+0x77/0xc0 [bcache] >>>>>> [ 81.426461] [<ffffffffa057c474>] bch_btree_insert_key+0xc4/0x320 [bcache] >>>>>> [ 81.426465] [<ffffffffa057ddae>] btree_insert_key+0x5e/0x120 [bcache] >>>>>> [ 81.426468] [<ffffffffa057e6f2>] bch_btree_insert_keys+0x62/0x250 [bcache] >>>>>> [ 81.426472] [<ffffffffa0581c4c>] bch_btree_insert_node+0x13c/0x3d0 [bcache] >>>>>> [ 81.426476] [<ffffffffa0582e88>] btree_insert_fn+0x28/0x50 [bcache] >>>>>> [ 81.426479] [<ffffffffa05804f4>] bch_btree_map_nodes_recurse+0x54/0x190 [bcache] >>>>>> [ 81.426482] [<ffffffffa0582e60>] ? bch_btree_insert_check_key+0x1c0/0x1c0 [bcache] >>>>>> [ 81.426483] [<ffffffff817051a6>] ? down_write+0x16/0x50 >>>>>> [ 81.426487] [<ffffffffa0580278>] ? bch_btree_node_get+0x78/0x2a0 [bcache] >>>>>> [ 81.426490] [<ffffffffa05805a5>] bch_btree_map_nodes_recurse+0x105/0x190 [bcache] >>>>>> [ 81.426494] [<ffffffffa0582e60>] ? bch_btree_insert_check_key+0x1c0/0x1c0 [bcache] >>>>>> [ 81.426497] [<ffffffffa05802d5>] ? bch_btree_node_get+0xd5/0x2a0 [bcache] >>>>>> [ 81.426500] [<ffffffffa05805a5>] bch_btree_map_nodes_recurse+0x105/0x190 [bcache] >>>>>> [ 81.426504] [<ffffffffa0582e60>] ? bch_btree_insert_check_key+0x1c0/0x1c0 [bcache] >>>>>> [ 81.426507] [<ffffffffa058376c>] __bch_btree_map_nodes+0x11c/0x150 [bcache] >>>>>> [ 81.426510] [<ffffffffa0582e60>] ? bch_btree_insert_check_key+0x1c0/0x1c0 [bcache] >>>>>> [ 81.426514] [<ffffffffa0583894>] bch_btree_insert+0xf4/0x170 [bcache] >>>>>> [ 81.426515] [<ffffffff810e7000>] ? prepare_to_wait_event+0xf0/0xf0 >>>>>> [ 81.426519] [<ffffffffa058b2be>] bch_data_insert_keys+0x3e/0x160 [bcache] >>>>>> [ 81.426520] [<ffffffff810bac60>] process_one_work+0x150/0x450 >>>>>> [ 81.426521] [<ffffffff810bb8c2>] worker_thread+0x112/0x530 >>>>>> [ 81.426522] [<ffffffff810bb7b0>] ? rescuer_thread+0x3e0/0x3e0 >>>>>> [ 81.426523] [<ffffffff810c10d8>] kthread+0xd8/0xf0 >>>>>> [ 81.426524] [<ffffffff810c1000>] ? kthread_create_on_node+0x1b0/0x1b0 >>>>>> [ 81.426525] [<ffffffff817074d2>] ret_from_fork+0x42/0x70 >>>>>> [ 81.426525] [<ffffffff810c1000>] ? kthread_create_on_node+0x1b0/0x1b0 >>>>> >>>>> Hi Eric, >>>>> >>>>> I will post another patch for your test. This patch only provides >>>>> information of LBA offset on cache device, backing device LBA check is >>>>> missing. For cache device, 4KB align is not hardware mandatory IMHO. >>>>> >>>>> Will post an update patch soon. >>>> >>>> Thank you Coly! >>>> >>>> Note that our cache and backing device both have 4k logical blocks. >>> >>> Aha, thanks for the hint. So it is good to check 4K alignment for both >>> cache and cached devices :-) >> >> Hi Coly, >> Hi Eric, >> I'm sure you've been busy with the v4.17 merge but I thought I >> would check in: >> Yes, our partner found several bugs of the device failure handling patchset, just finish fixing them. If you may offer code review, that will be very helpful to make them into 4.17. >> Have you had a chance to look at this? It is an opportunity to fix this >> 4k bug for the future since we can still reproduce the error. >> I just have time back to look at this today. And I am quite uncomfortable with this issue. Now I don't have idea what the problem is before I compose a new debug patch and get some clue. But I need to look close to this issue. Any issue might related to data corruption is very high priority. >>>>> bcache: bch_count_io_errors() dm-6: IO error on reading from cache, recovering <<< >> >> What do you think, is there data corruption exposure here since 4.1.49 >> still has the dirty-cache-recorvery bug? > Yes, you should have the patches which fixes the issue. Recent patches for failed cache handling may help to avoid potential data corruption. Unfortunately there are some bugs in my cache device failure handling patch set, and now I am looking for peer review before ask Jens to pick these patches. Hopefully these fixes may go into 4.17 together. > I just noticed that "bcache: only permit to recovery read error when cache > device is clean" is in v4.1.49, but would this recover gracefully in the > 4k error situation? > Hmm, I don't think this fix has impact to 4K address alignment. So far I am understanding the btree/bset code by fixing bcache bug, I am not able to imagine how the code running in my brain. Let me compose the debug patch ASAP and check what information it may provide. >> Also, would your failure-recovery patch series address this type of >> failure? I doubt. The cache device failure handling patch set dose not touch any LBA address stuff. Thanks. Coly Li -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html