Re: bcache hangs on writes, recovers after disabling discard on cache device

Juha Aatrokoski <jha@xxxxxxxxxxx> · Fri, 19 Jul 2013 00:02:14 +0300 (EEST)

On Thu, 18 Jul 2013, Kent Overstreet wrote:

On Thu, Jul 18, 2013 at 03:05:49PM +0300, Juha Aatrokoski wrote:
On Tue, 16 Jul 2013, Kent Overstreet wrote:

On Tue, Jul 16, 2013 at 09:14:09PM +0300, Juha Aatrokoski wrote:
On Fri, 12 Jul 2013, Juha Aatrokoski wrote:
Can you give this patch a try? It's on top of the current
bcache-for-3.11 branch

OK, now running the same kernel with this patch applied and
discard enabled. However, it has previously taken my system 2-4
days to trigger this bug, so I'd say at least two weeks before I
can say the patch (may have) fixed the issue.

No such luck, hit the bug after four days of uptime. Disabling
discard fixed the problem so at least it's not any worse than
before.

Argh, damn peculiar bug... and the fact that it takes so long to trigger
is frustrating. I'm honestly at a loss at this point as to what that IO
actually is.

One thing I noticed is that your patch only affects the allocator,
the journal still does discards the old way. Perhaps it's worth a
try to apply a similar change to the journal discards?

Oh man, thanks for pointing me at that code. This looks like a brown
paper bag bug...

Try this patch and tell me what happens:

Yeah, looks like a very probable culprit for this bug. If I read this 
correctly, the bug is triggered the first time do_journal_discard() is 
called, which results in an infinite discard loop (the switch statement 
alternates between the DISCARD_IN_FLIGHT and DISCARD_READY branches with 
DISCARD_DONE never reached, and do_journal_discard() is called repeatedly 
as it does not seem to accomplish the requested discards), which explains 
the observed 50MB/s write activity.

Now, assuming (with very good reason) that this is the cause of the bug, 
is there something I can do (on a file system on top of the bcache dev) to 
trigger it faster than in 2-4 days? My guess is that this happens when the 
journal gets full/wraps around the first time, but I don't know if the 
journal size is fixed or dynamic, I saw nothing regarding journal size in 
/sys/fs/bcache. My cache dev is 80G with 512k bucket size and 4k block 
size. Will a simple loop like this work: "while true; do cp 200MB_file 
tmpfile; sync; rm tmpfile; sync; done"

BTW, are there performance or other gains to be had by doing the discards 
"manually" by submitting bios? As evidenced by the other patch, the code 
would be much simpler if blkdev_issue_discard() was used instead.


From 72c531ee46e73a63739aa3fd10130f167d6bd30d Mon Sep 17 00:00:00 2001
From: Kent Overstreet <kmo@xxxxxxxxxxxxx>
Date: Thu, 18 Jul 2013 10:50:55 -0700
Subject: [PATCH] Fix a dumb journal discard bug

diff --git a/drivers/md/bcache/journal.c b/drivers/md/bcache/journal.c
index ba95ab8..c0017ca 100644
--- a/drivers/md/bcache/journal.c
+++ b/drivers/md/bcache/journal.c
@@ -428,7 +428,7 @@ static void do_journal_discard(struct cache *ca)
		return;
	}

-	switch (atomic_read(&ja->discard_in_flight) == DISCARD_IN_FLIGHT) {
+	switch (atomic_read(&ja->discard_in_flight)) {
	case DISCARD_IN_FLIGHT:
		return;

--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html