Re: bcache: workqueue lockup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
Am 07.08.2018 um 16:35 schrieb Coly Li:
> On 2018/8/7 3:41 AM, Stefan Priebe - Profihost AG wrote:
>> Am 06.08.2018 um 16:21 schrieb Coly Li:
>>> On 2018/8/6 9:33 PM, Stefan Priebe - Profihost AG wrote:
>>>> Hi Coly,
>>>> Am 06.08.2018 um 15:06 schrieb Coly Li:
>>>>> On 2018/8/6 2:33 PM, Stefan Priebe - Profihost AG wrote:
>>>>>> Hi Coly,
>>>>>>
>>>>>> while running the SLES15 kernel i observed a workqueue lockup and a
>>>>>> totally crashed system today.
>>>>>>
>>>>>> dmesg output is about 3,5mb but it seems it just repeats the
>>>>>> bch_data_insert_keys msg.
>>>>>>
>>>>>
>>>>> Hi Stefan,
>>>>>
>>>>> Thanks for your information!
>>>>>
>>>>> Could you please to give me any hint on how to reproduce it ? Even it is
>>>>> not stable reproducible, a detailed procedure may help me a lot.
>>>>
>>>> i'm sorry but i can't reproduce it. It happens just out of nothing in
>>>> our ceph production cluster.
>>>
>>> I see. Could you please share the configuration information. E.g.
>>
>> sure.
>>
>>> - How many CPU cores
>> 12
>>
>>> - How many physical memory
>> 64GB
>>
>>> - How large the SSD size, NVMe or SATA
>> bcache cache size: 250GB SATA SSD
>>
>>> - How many SSDs
>> 1x
>>
>>> - How large (many) the backing hard drives are
>> 2x 1TB
>>
>>> I try to simulate similar workload with fio, see how lucky I am.
>>
>> Thanks!
>>
>> Generally the workload at the timeframe was mostly read, fssync inside
>> guests and fstrim.
> 
> Hi Stefan,
> 
> From your information, I suspect this was a journal related deadlocking.
> 
> If there are too many small I/O to make the btree inside bcache grows
> too fast, and in turn make the journal space to be exhausted, there is
> probably a dead-lock-like hang happens.
> 
> Junhui tries to fix it by increase journal slot size, but the root cause
> is not fixed yet. The journal operation in bcache is not atomic, that
> means a btree node write goes into journal firstly, then insert into
> btree node by journal replay. If the btree node has to be split during
> the journal replay, the split meta data needs to go into journal first,
> if journal space is already exhausted, a dead-lock may happen.
> 
> A real fix is to make bcache journal operation to be atomic, that means,
> 1, Reserve estimated journal slots before a journal I/O
> 2, If reservation succeed, go ahead; if failed wait and try again.
> 3, If journal reply results btree split, journal slot for new meta data
> is reserved in journal already and never failed.
> 
> This fix is not simple, and I am currently working on other fixes (4Kn
> hard drive and big endian...). If no one else helps on the fix, it would
> be a while before I may focus on it.
> 
> Because you mentioned fstrim happend in your guests, if the backing
> device of bcache supports DISCARD/TRIM, bcache will also invalidate the
> fstrim range in its internal btree, which may generate more btree metata
> I/O. Therefore I guess it might be related to journal.
> 
> Hmm, how about I compose a patch to display free journal slot number. If
> next time such issue happens and you may still access sysfs, let's check
> and see whether this is a journal issue. Maybe I am wrong, but it's good
> to try.

I don't believe the journal was full - the workload at that time (02:00
AM) is mostly read only and delete / truncate files and the journal is
pretty big with 250GB. Ceph handles fstrim inside guests as truncate and
file deletes outside guest. So the real workload for bcache was:
- read (backup time)
- delete file (xfs)
- truncate files (xfs)

Greets,
Stefan

> Thanks.
> 
> Coly Li
> 
> 
>>>>>> The beginning is:
>>>>>>
>>>>>> 2018-08-06 02:08:06     BUG: workqueue lockup - pool cpus=1 node=0
>>>>>> flags=0x1 nice=0 stuck for 51s!
>>>>>> 2018-08-06 02:08:06     pending: memcg_kmem_cache_create_func
>>>>>> 2018-08-06 02:08:06     delayed: memcg_kmem_cache_create_func
>>>>>> 2018-08-06 02:08:06     workqueue bcache: flags=0x8
>>>>>> 2018-08-06 02:08:06     pwq 22: cpus=11 node=0 flags=0x0 nice=0 active=1/256
>>>>>> 2018-08-06 02:08:06     in-flight: 1764369:bch_data_insert_keys [bcache]
>>>>>> 2018-08-06 02:08:06     pwq 18: cpus=9 node=0 flags=0x1 nice=0
>>>>>> active=256/256 MAYDAY
>>>>>> 2018-08-06 02:08:06     in-flight: 1765894:bch_data_insert_keys
>>>>>> [bcache], 1765908:bch_data_insert_keys [bcache],
>>>>>> 1765931:bch_data_insert_keys [bcache], 1765984:bch_data_insert_keys
>>>>>> [bcache], 1765815:bch_data_insert_keys [bcache],
>>>>>> 1765893:bch_data_insert_keys [bcache], 1765981:bch_data_insert_keys
>>>>>> [bcache], 1765875:bch_data_insert_keys [bcache],
>>>>>> 1765963:bch_data_insert_keys [bcache], 1765960:bch_data_insert_keys
>>>>>> [bcache], 1765889:bch_data_insert_keys [bcache],
>>>>>> 1765989:bch_data_insert_keys [bcache], 1765897:bch_data_insert_keys
>>>>>> [bcache], 1765911:bch_data_insert_keys [bcache],
>>>>>> 1765924:bch_data_insert_keys [bcache], 1765808:bch_data_insert_keys
>>>>>> [bcache], 1765879:bch_data_insert_keys [bcache],
>>>>>> 1765948:bch_data_insert_keys [bcache], 1765970:bch_data_insert_keys
>>>>>> [bcache], 1765859:bch_data_insert_keys [bcache],
>>>>>> 1765884:bch_data_insert_keys [bcache]
>>>>>> 2018-08-06 02:08:06     , 1765952:bch_data_insert_keys [bcache],
>>>>>> 1765990:bch_data_insert_keys [bcache], 1765817:bch_data_insert_keys
>>>>>> [bcache], 1765858:bch_data_insert_keys [bcache],
>>>>>> 1765928:bch_data_insert_keys [bcache], 1765936:bch_data_insert_keys
>>>>>> [bcache], 1762396:bch_data_insert_keys [bcache],
>>>>>> 1765831:bch_data_insert_keys [bcache], 1765847:bch_data_insert_keys
>>>>>> [bcache], 1765895:bch_data_insert_keys [bcache],
>>>>>> 1765925:bch_data_insert_keys [bcache], 1765967:bch_data_insert_keys
>>>>>> [bcache], 1765798:bch_data_insert_keys [bcache],
>>>>>> 1765827:bch_data_insert_keys [bcache], 1765857:bch_data_insert_keys
>>>>>> [bcache], 1765979:bch_data_insert_keys [bcache],
>>>>>> 1765809:bch_data_insert_keys [bcache], 1765856:bch_data_insert_keys
>>>>>> [bcache], 1765878:bch_data_insert_keys [bcache],
>>>>>> 1765918:bch_data_insert_keys [bcache], 1765934:bch_data_insert_keys [bcache]
>>>>>> 2018-08-06 02:08:06     , 1765982:bch_data_insert_keys [bcache],
>>>>>> 1765813:bch_data_insert_keys [bcache], 1765883:bch_data_insert_keys
>>>>>> [bcache], 1765993:bch_data_insert_keys [bcache],
>>>>>> 1765834:bch_data_insert_keys [bcache], 1765920:bch_data_insert_keys
>>>>>> [bcache], 1765962:bch_data_insert_keys [bcache],
>>>>>> 1765788:bch_data_insert_keys [bcache], 1765882:bch_data_insert_keys
>>>>>> [bcache], 1765942:bch_data_insert_keys [bcache],
>>>>>> 1765825:bch_data_insert_keys [bcache], 1765854:bch_data_insert_keys
>>>>>> [bcache], 1765902:bch_data_insert_keys [bcache],
>>>>>> 1765838:bch_data_insert_keys [bcache], 1765868:bch_data_insert_keys
>>>>>> [bcache], 1765932:bch_data_insert_keys [bcache],
>>>>>> 1765944:bch_data_insert_keys [bcache], 1765975:bch_data_insert_keys
>>>>>> [bcache], 1765983:bch_data_insert_keys [bcache],
>>>>>> 1765810:bch_data_insert_keys [bcache], 1765863:bch_data_insert_keys [bcache]
>>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux ARM Kernel]     [Linux Filesystem Development]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux