Re: bcache: workqueue lockup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2018/8/7 3:41 AM, Stefan Priebe - Profihost AG wrote:
> Am 06.08.2018 um 16:21 schrieb Coly Li:
>> On 2018/8/6 9:33 PM, Stefan Priebe - Profihost AG wrote:
>>> Hi Coly,
>>> Am 06.08.2018 um 15:06 schrieb Coly Li:
>>>> On 2018/8/6 2:33 PM, Stefan Priebe - Profihost AG wrote:
>>>>> Hi Coly,
>>>>>
>>>>> while running the SLES15 kernel i observed a workqueue lockup and a
>>>>> totally crashed system today.
>>>>>
>>>>> dmesg output is about 3,5mb but it seems it just repeats the
>>>>> bch_data_insert_keys msg.
>>>>>
>>>>
>>>> Hi Stefan,
>>>>
>>>> Thanks for your information!
>>>>
>>>> Could you please to give me any hint on how to reproduce it ? Even it is
>>>> not stable reproducible, a detailed procedure may help me a lot.
>>>
>>> i'm sorry but i can't reproduce it. It happens just out of nothing in
>>> our ceph production cluster.
>>
>> I see. Could you please share the configuration information. E.g.
> 
> sure.
> 
>> - How many CPU cores
> 12
> 
>> - How many physical memory
> 64GB
> 
>> - How large the SSD size, NVMe or SATA
> bcache cache size: 250GB SATA SSD
> 
>> - How many SSDs
> 1x
> 
>> - How large (many) the backing hard drives are
> 2x 1TB
> 
>> I try to simulate similar workload with fio, see how lucky I am.
> 
> Thanks!
> 
> Generally the workload at the timeframe was mostly read, fssync inside
> guests and fstrim.

Hi Stefan,

>From your information, I suspect this was a journal related deadlocking.

If there are too many small I/O to make the btree inside bcache grows
too fast, and in turn make the journal space to be exhausted, there is
probably a dead-lock-like hang happens.

Junhui tries to fix it by increase journal slot size, but the root cause
is not fixed yet. The journal operation in bcache is not atomic, that
means a btree node write goes into journal firstly, then insert into
btree node by journal replay. If the btree node has to be split during
the journal replay, the split meta data needs to go into journal first,
if journal space is already exhausted, a dead-lock may happen.

A real fix is to make bcache journal operation to be atomic, that means,
1, Reserve estimated journal slots before a journal I/O
2, If reservation succeed, go ahead; if failed wait and try again.
3, If journal reply results btree split, journal slot for new meta data
is reserved in journal already and never failed.

This fix is not simple, and I am currently working on other fixes (4Kn
hard drive and big endian...). If no one else helps on the fix, it would
be a while before I may focus on it.

Because you mentioned fstrim happend in your guests, if the backing
device of bcache supports DISCARD/TRIM, bcache will also invalidate the
fstrim range in its internal btree, which may generate more btree metata
I/O. Therefore I guess it might be related to journal.

Hmm, how about I compose a patch to display free journal slot number. If
next time such issue happens and you may still access sysfs, let's check
and see whether this is a journal issue. Maybe I am wrong, but it's good
to try.

Thanks.

Coly Li


>>>>> The beginning is:
>>>>>
>>>>> 2018-08-06 02:08:06     BUG: workqueue lockup - pool cpus=1 node=0
>>>>> flags=0x1 nice=0 stuck for 51s!
>>>>> 2018-08-06 02:08:06     pending: memcg_kmem_cache_create_func
>>>>> 2018-08-06 02:08:06     delayed: memcg_kmem_cache_create_func
>>>>> 2018-08-06 02:08:06     workqueue bcache: flags=0x8
>>>>> 2018-08-06 02:08:06     pwq 22: cpus=11 node=0 flags=0x0 nice=0 active=1/256
>>>>> 2018-08-06 02:08:06     in-flight: 1764369:bch_data_insert_keys [bcache]
>>>>> 2018-08-06 02:08:06     pwq 18: cpus=9 node=0 flags=0x1 nice=0
>>>>> active=256/256 MAYDAY
>>>>> 2018-08-06 02:08:06     in-flight: 1765894:bch_data_insert_keys
>>>>> [bcache], 1765908:bch_data_insert_keys [bcache],
>>>>> 1765931:bch_data_insert_keys [bcache], 1765984:bch_data_insert_keys
>>>>> [bcache], 1765815:bch_data_insert_keys [bcache],
>>>>> 1765893:bch_data_insert_keys [bcache], 1765981:bch_data_insert_keys
>>>>> [bcache], 1765875:bch_data_insert_keys [bcache],
>>>>> 1765963:bch_data_insert_keys [bcache], 1765960:bch_data_insert_keys
>>>>> [bcache], 1765889:bch_data_insert_keys [bcache],
>>>>> 1765989:bch_data_insert_keys [bcache], 1765897:bch_data_insert_keys
>>>>> [bcache], 1765911:bch_data_insert_keys [bcache],
>>>>> 1765924:bch_data_insert_keys [bcache], 1765808:bch_data_insert_keys
>>>>> [bcache], 1765879:bch_data_insert_keys [bcache],
>>>>> 1765948:bch_data_insert_keys [bcache], 1765970:bch_data_insert_keys
>>>>> [bcache], 1765859:bch_data_insert_keys [bcache],
>>>>> 1765884:bch_data_insert_keys [bcache]
>>>>> 2018-08-06 02:08:06     , 1765952:bch_data_insert_keys [bcache],
>>>>> 1765990:bch_data_insert_keys [bcache], 1765817:bch_data_insert_keys
>>>>> [bcache], 1765858:bch_data_insert_keys [bcache],
>>>>> 1765928:bch_data_insert_keys [bcache], 1765936:bch_data_insert_keys
>>>>> [bcache], 1762396:bch_data_insert_keys [bcache],
>>>>> 1765831:bch_data_insert_keys [bcache], 1765847:bch_data_insert_keys
>>>>> [bcache], 1765895:bch_data_insert_keys [bcache],
>>>>> 1765925:bch_data_insert_keys [bcache], 1765967:bch_data_insert_keys
>>>>> [bcache], 1765798:bch_data_insert_keys [bcache],
>>>>> 1765827:bch_data_insert_keys [bcache], 1765857:bch_data_insert_keys
>>>>> [bcache], 1765979:bch_data_insert_keys [bcache],
>>>>> 1765809:bch_data_insert_keys [bcache], 1765856:bch_data_insert_keys
>>>>> [bcache], 1765878:bch_data_insert_keys [bcache],
>>>>> 1765918:bch_data_insert_keys [bcache], 1765934:bch_data_insert_keys [bcache]
>>>>> 2018-08-06 02:08:06     , 1765982:bch_data_insert_keys [bcache],
>>>>> 1765813:bch_data_insert_keys [bcache], 1765883:bch_data_insert_keys
>>>>> [bcache], 1765993:bch_data_insert_keys [bcache],
>>>>> 1765834:bch_data_insert_keys [bcache], 1765920:bch_data_insert_keys
>>>>> [bcache], 1765962:bch_data_insert_keys [bcache],
>>>>> 1765788:bch_data_insert_keys [bcache], 1765882:bch_data_insert_keys
>>>>> [bcache], 1765942:bch_data_insert_keys [bcache],
>>>>> 1765825:bch_data_insert_keys [bcache], 1765854:bch_data_insert_keys
>>>>> [bcache], 1765902:bch_data_insert_keys [bcache],
>>>>> 1765838:bch_data_insert_keys [bcache], 1765868:bch_data_insert_keys
>>>>> [bcache], 1765932:bch_data_insert_keys [bcache],
>>>>> 1765944:bch_data_insert_keys [bcache], 1765975:bch_data_insert_keys
>>>>> [bcache], 1765983:bch_data_insert_keys [bcache],
>>>>> 1765810:bch_data_insert_keys [bcache], 1765863:bch_data_insert_keys [bcache]
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux