On 5/9/18 3:36 PM, Coly Li wrote:
On 2018/5/9 12:57 AM, Eric Wheeler wrote:
On Tue, 8 May 2018, Coly Li wrote:
Hi Coly,
We did get traces over night, so hopefully these are useful. In summary,
these are the ones that hit:
check_4k_alignment() KEY_OFFSET(&w->key) is not 4KB aligned
check_4k_alignment() KEY_OFFSET(l) + KEY_SIZE(r) is not 4KB aligned
check_4k_alignment() KEY_START(k) is not 4KB aligned
The whole dmesg output that we have is here: https://pastebin.com/nuYFi66K
And some of the traces separated by error message are shown below. The
ones below have a unique backtrace, but they may not cover all unique
backtraces.
====================================================================
Of those that hit, These are the ones that were accompanied by SCSI errors:
[54947.892574] bcache: check_4k_alignment() KEY_OFFSET(&w->key) is not 4KB aligned: 15724561783
[54947.893173] CPU: 5 PID: 1166 Comm: bcache_writebac Tainted: G O 4.1.49-5.el7.x86_64 #1
[54947.893757] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.10 01/09/2014
[54947.894323] 0000000000000286 8c136ca15cff4205 ffff8807ebea3d58 ffffffff816ff534
[54947.894907] ffff88080a7b6aa0 ffff88080a7b0000 ffff8807ebea3d68 ffffffffa05beb63
[54947.895515] ffff8807ebea3e08 ffffffffa05be174 00000003a93e4e90 ffff8807ef36c4c0
[54947.896132] Call Trace:
[54947.896705] [<ffffffff816ff534>] dump_stack+0x63/0x81
[54947.897285] [<ffffffffa05beb63>] check_4k_alignment.part.9+0x24/0x26 [bcache]
[54947.897853] [<ffffffffa05be174>] read_dirty+0x444/0x4a0 [bcache]
[54947.898418] [<ffffffffa05be1d0>] ? read_dirty+0x4a0/0x4a0 [bcache]
[54947.898980] [<ffffffffa05be5cc>] bch_writeback_thread+0x3fc/0x4e0 [bcache]
[54947.899544] [<ffffffffa05be1d0>] ? read_dirty+0x4a0/0x4a0 [bcache]
[54947.900121] [<ffffffff810c10d8>] kthread+0xd8/0xf0
[54947.900673] [<ffffffff810c1000>] ? kthread_create_on_node+0x1b0/0x1b0
[54947.901226] [<ffffffff817074d2>] ret_from_fork+0x42/0x70
[54947.901783] [<ffffffff810c1000>] ? kthread_create_on_node+0x1b0/0x1b0
[54947.902401] sd 0:0:0:2: [sdc] Unaligned block number requested: sector_size=4096, block=353041024, blk_rq=23
[54947.903054] bcache: bch_count_io_errors() dm-6: IO error on reading dirty data from cache, recovering
[54947.903874] sd 0:0:0:1: [sdb] Unaligned block number requested: sector_size=4096, block=15724561760, blk_rq=23
[54958.301274] bcache: check_4k_alignment() KEY_OFFSET(&w->key) is not 4KB aligned: 15725385535
[54958.301889] CPU: 2 PID: 1166 Comm: bcache_writebac Tainted: G O 4.1.49-5.el7.x86_64 #1
[54958.302532] Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.10 01/09/2014
[54958.303144] 0000000000000286 8c136ca15cff4205 ffff8807ebea3d58 ffffffff816ff534
[54958.303805] ffff88080a7b7dc0 ffff88080a7b0000 ffff8807ebea3d68 ffffffffa05beb63
[54958.304423] ffff8807ebea3e08 ffffffffa05be174 00000003a949ec10 ffff8807ef36c4c0
[54958.305080] Call Trace:
[54958.305728] [<ffffffff816ff534>] dump_stack+0x63/0x81
[54958.306371] [<ffffffffa05beb63>] check_4k_alignment.part.9+0x24/0x26 [bcache]
[54958.307049] [<ffffffffa05be174>] read_dirty+0x444/0x4a0 [bcache]
[54958.307694] [<ffffffffa05be1d0>] ? read_dirty+0x4a0/0x4a0 [bcache]
[54958.308338] [<ffffffffa05be5cc>] bch_writeback_thread+0x3fc/0x4e0 [bcache]
[54958.308986] [<ffffffffa05be1d0>] ? read_dirty+0x4a0/0x4a0 [bcache]
[54958.309631] [<ffffffff810c10d8>] kthread+0xd8/0xf0
[54958.310267] [<ffffffff810c1000>] ? kthread_create_on_node+0x1b0/0x1b0
[54958.310914] [<ffffffff817074d2>] ret_from_fork+0x42/0x70
[54958.311533] [<ffffffff810c1000>] ? kthread_create_on_node+0x1b0/0x1b0
[54958.312265] sd 0:0:0:2: [sdc] Unaligned block number requested: sector_size=4096, block=387084760, blk_rq=31
[54958.313064] bcache: bch_count_io_errors() dm-6: IO error on reading dirty data from cache, recovering
[54958.314154] sd 0:0:0:1: [sdb] Unaligned block number requested: sector_size=4096, block=15725385504, blk_rq=31
Hi Eric,
Wow, the above lines are very informative, thanks!
I will start to look into what happens here. And at the meantime I will
compose another patch which does extra LBA 4k alignment check in
make_request() entries, to make sure I don't miss anything.
Hi Eric,
Now I have two 4Kn SSD (format by intelmas with your hint), I use the
800G SSD as cache device and another 2TB SSD as backing device. They are
all formatted as 4K sector size by intelmas.
Currently I run fio with random 4K size write on Linux v5.16 kernel, and
try to run it overnight. Do you have any suggestion to run some workload
similar to your condition?
Thanks.
Coly Li