Re: Critical bug on bcache kernel module in Fedora 30

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2019/5/17 1:38 下午, Coly Li wrote:
> On 2019/5/17 12:45 下午, Pierre JUHEN wrote:
>> Le 16/05/2019 à 16:12, Coly Li a écrit
>>
>>
>>> Such problem is very easy to produce, only a few sequential I/Os may
>>> trigger a panic from bch_btree_iter_next_check().
>>>
>>> Here is content of my fio script,
>>> [global]
>>> lockmem=1
>>> direct=1
>>> ioengine=psync
>>>
>>> [job1]
>>> filename=/dev/bcache0
>>> readwrite=write
>>> blocksize=4k
>>> iodepth=1
>>> numjobs=1
>>> io_size=64K
>>>
>>> In my observation, 2 sequential I/Os may trigger this panic. Here is the
>>> kernel output when panic in bch_btree_iter_next_check() triggered,
>>>
>>> [  153.478620] bcache: bch_dump_bset() block 1 key 0/9:
>>> [  153.478621] bcache: bch_bkey_dump()  0x0:0x10 len 0x8 -> [check dev]
>>> [  153.478623] bcache: bch_bkey_dump()  bucket 7560168717
>>> [  153.478624] bcache: bch_bkey_dump()
>>> [  153.478624]
>>> [  153.478625] bcache: bch_dump_bset() block 1 key 3/9:
>>> [  153.478626] bcache: bch_bkey_dump()  0x0:0x18 len 0x8 -> [check dev]
>>> [  153.478627] bcache: bch_bkey_dump()  bucket 4400861924
>>> [  153.478628] bcache: bch_bkey_dump()
>>> [  153.478628]
>>> [  153.478629] bcache: bch_dump_bset() Key skipped backwards
>>> [  153.478629]
>>> [  153.478630] bcache: bch_dump_bset() block 1 key 6/9:
>>> [  153.478631] bcache: bch_bkey_dump()  0x0:0x10 len 0x8 -> [0:392192
>>> gen 1] dirty
>>> [  153.478632] bcache: bch_bkey_dump()  bucket 383
>>> [  153.478635] bcache: bch_bkey_dump()
>>> [  153.478635]
>>> [  153.532890] Kernel panic - not syncing: Key skipped backwards
>>> [  153.535924] CPU: 0 PID: 790 Comm: bcache_writebac Tainted: G        W
>>>          5.1.0+ #3
>>> [  153.539656] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
>>> Desktop Reference Platform, BIOS 6.00 04/13/2018
>>> [  153.545002] Call Trace:
>>> [  153.546702]  dump_stack+0x85/0xc0
>>> [  153.548675]  panic+0x106/0x2da
>>> [  153.550560]  ? bch_ptr_invalid+0x10/0x10 [bcache]
>>> [  153.553178]  bch_btree_iter_next_filter.cold+0xff/0x12e [bcache]
>>> [  153.556117]  ? btree_insert_key+0x190/0x190 [bcache]
>>> [  153.558646]  bch_btree_map_keys_recurse+0x5c/0x190 [bcache]
>>> [  153.561557]  bch_btree_map_keys+0x177/0x1a0 [bcache]
>>> [  153.564085]  ? btree_insert_key+0x190/0x190 [bcache]
>>> [  153.566688]  ? dirty_init+0x80/0x80 [bcache]
>>> [  153.569224]  bch_refill_keybuf+0xcc/0x290 [bcache]
>>> [  153.571609]  ? finish_wait+0x90/0x90
>>> [  153.573525]  ? dirty_init+0x80/0x80 [bcache]
>>> [  153.575705]  bch_writeback_thread+0x3b9/0x5c0 [bcache]
>>> [  153.578527]  ? __kthread_parkme+0x58/0x80
>>> [  153.580662]  kthread+0x108/0x140
>>> [  153.582541]  ? read_dirty+0x620/0x620 [bcache]
>>> [  153.584998]  ? kthread_park+0x90/0x90
>>> [  153.586991]  ret_from_fork+0x3a/0x50
>>> [  153.589236] Kernel Offset: 0x12000000 from 0xffffffff81000000
>>> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
>>> [  153.594868] ---[ end Kernel panic - not syncing: Key skipped
>>> backwards ]---
>>>
>>> The panic happens in bch_writeback context, because bkeys in btree node
>>> is not in linear increasing order. Adjacent two sequential write
>>> requests is very common condition in bcache, such corrupted btree node
>>> is not reported in recent 2~3 years. Unless the kernel is compiled with
>>> gcc9...
>>>
>>> It is not clear to me why the key 0:0x10 appears in same btree node
>>> twice, and why there are 3 keys for two 4K write requests.
>>>
>>> If anyone may have any clue, please offer. Now I continue to check how
>>> this happens.
>>>
>>> Thanks.
>>>
>> Hi,
>>
>> Either Rolf and I didn't experiment a kernel panic, probably because we
>> are working a bcache device already used at 100% or close.
>>
> 
> It seems something wrong in btree related code with GCC9.
> 
>> The trace shows that's the btree structure that is corrupted, and
>> probably the adresses of the next btree nodes. This could be in the
>> tasks that walk's inside the btree :
>>
>> Could you check that the size of each btree node is the same with the
>> two versions of the compiler  ? (alignment issue)
>>
> 
> Not yet, but since the keys are combined with 64bit words, it should not
> be problematic for bits alignment. bkey assignment is not by data
> structure, is by several uint64 words, this is reliable operation.
> 
> 
>> It could be also a problem where the end of the btree are not correctly
>> detected and you jump on un-itinialized node.
>>
> 
> This is suspicious. I only sent two 4K write, but there are three keys
> in the btree node. Currently I am not fully understand how bcache btree
> code works, still on the way to dig out.
> 
>> Since you are able to reproduce it very fast, could you put a trace that
>> dump the btree nodes that are crossed during the search ?

There is a btree dump sysinterface already,
/sys/kernel/debug/bcache/bcache-<UUID>, which requires bcache debug to
be configured.

The panic I observe in bch_btree_iter_next_check() was because, when
kernel is compiled with gcc9, all adjacent keys are not merged at all.
So when same sequential I/Os submitted again, bcache code just tries to
appends same keys on the tail of existing keys on btree node, then panic
triggered inside bch_btree_iter_next_check().

When the code is compiled with gcc7/8, the sequential keys are merged
into one single key. So the skipped backward key does not exist at all.

The bkey insert/merge is very essential and hot code path, there is very
small possibility that some hidden bug exists. So now I come to think,
maybe there is something wrong in gcc9 code ?

Also I try to analyze the assemble code of bcache, just find out the
generated assembly code between gcc9 and gcc7 is quite different. For
gcc9 there is a XXXX.cold part. So far I can not tell where the problem
is from yet.

Now I feel such problem is beyond my current understanding of bcache
code. It will be great if Kent has time and may have a look on what is
wrong in key insert code path.

Sequential keys not merging is only a simplest problem I observe, I
believe this is not a simple point issue, the gcc9 generated code may
have problem in other multiple places, this might be the reason why we
see different buggy behavior from different people.

Thanks.

-- 

Coly Li



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux