Hi Eric, On 23/02/2018 9:35 PM, Eric Tork wrote: > Yes, I can help with testing once there is a patch to try. > This will help a lot, thanks in advance. > To clarify, my setup is using backing device sdh1 and a cache set to create bcache1, which is then sent through a LUKS layer and then becomes a new cache set for backing device secondleveltest, with an LVM LV, which then becomes bcache2. So, the stacking is a bit unique. So, both bcache devices are not truly using the same cache set - bcache1 is being used to form a cache set that is used in bcache2. I see. Thanks for the information. Coly Li > > ----- Original Message ----- > From: "Coly Li" <colyli@xxxxxxx> > To: "Eric A Tork" <etork@xxxxxxxxxxxxxx>, linux-bcache@xxxxxxxxxxxxxxx > Sent: Thursday, February 22, 2018 8:33:40 PM > Subject: Re: possible writeback race/bch_data_insert_keys fails > > On 23/02/2018 8:04 AM, Eric A Tork wrote: >> >> >> Hello, I am hitting a lock issue with bcache while doing some >> testing, and only a reboot brings the system back after encountering >> this issue. >> >> Here is my lsblk: >> >> >> sdh >> sdh1 zfs_member >> bcache1 crypto_LUKS >> loopcrypto1 bcache >> bcache2 LVM2_member >> secondleveltest-fullzfsnfs zfs_member thirdlevelzfs >> >> >> >> There appears to be a race happen as the system is performing normally >> and then all activity to the bcache devices hits 100% and no more I/O >> happens. >> >> This is with two stacked bcache devices (LUKS in between) with writeback >> turned on. It will do the same if set to writethrough as well. >> >> [root@centos-7 log]# uname -a >> Linux centos-7.1-test.talentbankonline.com 4.15.4-1.el7.elrepo.x86_64 #1 >> SMP Sat Feb 17 13:35:20 EST 2018 x86_64 x86_64 x86_64 GNU/Linux >> >> And here is the kernel trace when the system stalls: >> > > Hi Eric, > > At the first glance, the race is very probably from a global work queue: > bcache_wq. Although there are 2 different bcache devices stacked, they > share the unique work queue bcache_wq in request.c. > > I guess bcache code was not originally designed for stacked itself, this > is why you hit this bug. > > I guess the stacked bcache devices may also share same cache set, so the > fix might be to change bcache_wq into a per-bcache-device queue. > > Could you please to help testing once I have a patch for your issue? > > Thanks in advance. > > Coly Li [snipped] -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html