Hi Stefan,
This bug was triggered by following condition:
1, few system memory available to allocate
2, journal delayed its operations to system_wq, which needs to allocate
memory to execute.
3, Due to lack of memory, kernel starts to reclaim system memory, and
trigger writeback to file system on top of bcache device
4, the memory writeback I/O hitting bcache device via upper layer file
system, requiring more bcache journal operations
5, a loop-blocking issue happens in bcache journal
If your system is under heavy memory pressure, this deadlock may also
happens in your environment. Anyway, this is a patch I suggest to apply
because it fix a real deadlock which is probably happens when system
memory is exhausted.
Thanks.
Coly Li
On 9/28/18 1:16 AM, Stefan Priebe - Profihost AG wrote:
Hi Coly,
is this the deadlock I reported some weeks ago?
Greets,
Stefan
Excuse my typo sent from my mobile phone.
Am 27.09.2018 um 17:53 schrieb Eddie Chapman <eddie@xxxxxxxx
<mailto:eddie@xxxxxxxx>>:
On 27/09/18 16:23, Coly Li wrote:
On 9/27/18 9:45 PM, guoju wrote:
After write SSD completed, bcache schedule journal_write work to
system_wq, that is a public workqueue in system, without WQ_MEM_RECLAIM
flag. system_wq is also a bound wq, and there may be no idle kworker on
current processor. Creating a new kworker may unfortunately need to
reclaim memory first, by shrinking cache and slab used by vfs, which
depends on bcache device. That's a deadlock.
This patch create a new workqueue for journal_write with WQ_MEM_RECLAIM
flag. It's rescuer thread will work to avoid the deadlock.
Signed-off-by: guoju <fangguoju@xxxxxxxxx <mailto:fangguoju@xxxxxxxxx>>
Nice catch, this fix is quite important. I will try to submit to
Jens ASAP.
Thanks.
Coly Li
Once this goes into 4.19, would this be a candidate for backporting
to any stable kernels, or does it only fix something introduced in
this cycle?
thanks,
Eddie