On 2018/8/7 3:41 AM, Stefan Priebe - Profihost AG wrote: > Am 06.08.2018 um 16:21 schrieb Coly Li: >> On 2018/8/6 9:33 PM, Stefan Priebe - Profihost AG wrote: >>> Hi Coly, >>> Am 06.08.2018 um 15:06 schrieb Coly Li: >>>> On 2018/8/6 2:33 PM, Stefan Priebe - Profihost AG wrote: >>>>> Hi Coly, >>>>> >>>>> while running the SLES15 kernel i observed a workqueue lockup and a >>>>> totally crashed system today. >>>>> >>>>> dmesg output is about 3,5mb but it seems it just repeats the >>>>> bch_data_insert_keys msg. >>>>> >>>> >>>> Hi Stefan, >>>> >>>> Thanks for your information! >>>> >>>> Could you please to give me any hint on how to reproduce it ? Even it is >>>> not stable reproducible, a detailed procedure may help me a lot. >>> >>> i'm sorry but i can't reproduce it. It happens just out of nothing in >>> our ceph production cluster. >> >> I see. Could you please share the configuration information. E.g. > > sure. > >> - How many CPU cores > 12 > >> - How many physical memory > 64GB > >> - How large the SSD size, NVMe or SATA > bcache cache size: 250GB SATA SSD > >> - How many SSDs > 1x > >> - How large (many) the backing hard drives are > 2x 1TB > >> I try to simulate similar workload with fio, see how lucky I am. > > Thanks! > > Generally the workload at the timeframe was mostly read, fssync inside > guests and fstrim. Hi Stefan, >From your information, I suspect this was a journal related deadlocking. If there are too many small I/O to make the btree inside bcache grows too fast, and in turn make the journal space to be exhausted, there is probably a dead-lock-like hang happens. Junhui tries to fix it by increase journal slot size, but the root cause is not fixed yet. The journal operation in bcache is not atomic, that means a btree node write goes into journal firstly, then insert into btree node by journal replay. If the btree node has to be split during the journal replay, the split meta data needs to go into journal first, if journal space is already exhausted, a dead-lock may happen. A real fix is to make bcache journal operation to be atomic, that means, 1, Reserve estimated journal slots before a journal I/O 2, If reservation succeed, go ahead; if failed wait and try again. 3, If journal reply results btree split, journal slot for new meta data is reserved in journal already and never failed. This fix is not simple, and I am currently working on other fixes (4Kn hard drive and big endian...). If no one else helps on the fix, it would be a while before I may focus on it. Because you mentioned fstrim happend in your guests, if the backing device of bcache supports DISCARD/TRIM, bcache will also invalidate the fstrim range in its internal btree, which may generate more btree metata I/O. Therefore I guess it might be related to journal. Hmm, how about I compose a patch to display free journal slot number. If next time such issue happens and you may still access sysfs, let's check and see whether this is a journal issue. Maybe I am wrong, but it's good to try. Thanks. Coly Li >>>>> The beginning is: >>>>> >>>>> 2018-08-06 02:08:06 BUG: workqueue lockup - pool cpus=1 node=0 >>>>> flags=0x1 nice=0 stuck for 51s! >>>>> 2018-08-06 02:08:06 pending: memcg_kmem_cache_create_func >>>>> 2018-08-06 02:08:06 delayed: memcg_kmem_cache_create_func >>>>> 2018-08-06 02:08:06 workqueue bcache: flags=0x8 >>>>> 2018-08-06 02:08:06 pwq 22: cpus=11 node=0 flags=0x0 nice=0 active=1/256 >>>>> 2018-08-06 02:08:06 in-flight: 1764369:bch_data_insert_keys [bcache] >>>>> 2018-08-06 02:08:06 pwq 18: cpus=9 node=0 flags=0x1 nice=0 >>>>> active=256/256 MAYDAY >>>>> 2018-08-06 02:08:06 in-flight: 1765894:bch_data_insert_keys >>>>> [bcache], 1765908:bch_data_insert_keys [bcache], >>>>> 1765931:bch_data_insert_keys [bcache], 1765984:bch_data_insert_keys >>>>> [bcache], 1765815:bch_data_insert_keys [bcache], >>>>> 1765893:bch_data_insert_keys [bcache], 1765981:bch_data_insert_keys >>>>> [bcache], 1765875:bch_data_insert_keys [bcache], >>>>> 1765963:bch_data_insert_keys [bcache], 1765960:bch_data_insert_keys >>>>> [bcache], 1765889:bch_data_insert_keys [bcache], >>>>> 1765989:bch_data_insert_keys [bcache], 1765897:bch_data_insert_keys >>>>> [bcache], 1765911:bch_data_insert_keys [bcache], >>>>> 1765924:bch_data_insert_keys [bcache], 1765808:bch_data_insert_keys >>>>> [bcache], 1765879:bch_data_insert_keys [bcache], >>>>> 1765948:bch_data_insert_keys [bcache], 1765970:bch_data_insert_keys >>>>> [bcache], 1765859:bch_data_insert_keys [bcache], >>>>> 1765884:bch_data_insert_keys [bcache] >>>>> 2018-08-06 02:08:06 , 1765952:bch_data_insert_keys [bcache], >>>>> 1765990:bch_data_insert_keys [bcache], 1765817:bch_data_insert_keys >>>>> [bcache], 1765858:bch_data_insert_keys [bcache], >>>>> 1765928:bch_data_insert_keys [bcache], 1765936:bch_data_insert_keys >>>>> [bcache], 1762396:bch_data_insert_keys [bcache], >>>>> 1765831:bch_data_insert_keys [bcache], 1765847:bch_data_insert_keys >>>>> [bcache], 1765895:bch_data_insert_keys [bcache], >>>>> 1765925:bch_data_insert_keys [bcache], 1765967:bch_data_insert_keys >>>>> [bcache], 1765798:bch_data_insert_keys [bcache], >>>>> 1765827:bch_data_insert_keys [bcache], 1765857:bch_data_insert_keys >>>>> [bcache], 1765979:bch_data_insert_keys [bcache], >>>>> 1765809:bch_data_insert_keys [bcache], 1765856:bch_data_insert_keys >>>>> [bcache], 1765878:bch_data_insert_keys [bcache], >>>>> 1765918:bch_data_insert_keys [bcache], 1765934:bch_data_insert_keys [bcache] >>>>> 2018-08-06 02:08:06 , 1765982:bch_data_insert_keys [bcache], >>>>> 1765813:bch_data_insert_keys [bcache], 1765883:bch_data_insert_keys >>>>> [bcache], 1765993:bch_data_insert_keys [bcache], >>>>> 1765834:bch_data_insert_keys [bcache], 1765920:bch_data_insert_keys >>>>> [bcache], 1765962:bch_data_insert_keys [bcache], >>>>> 1765788:bch_data_insert_keys [bcache], 1765882:bch_data_insert_keys >>>>> [bcache], 1765942:bch_data_insert_keys [bcache], >>>>> 1765825:bch_data_insert_keys [bcache], 1765854:bch_data_insert_keys >>>>> [bcache], 1765902:bch_data_insert_keys [bcache], >>>>> 1765838:bch_data_insert_keys [bcache], 1765868:bch_data_insert_keys >>>>> [bcache], 1765932:bch_data_insert_keys [bcache], >>>>> 1765944:bch_data_insert_keys [bcache], 1765975:bch_data_insert_keys >>>>> [bcache], 1765983:bch_data_insert_keys [bcache], >>>>> 1765810:bch_data_insert_keys [bcache], 1765863:bch_data_insert_keys [bcache] >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html