On Thu, 25 Feb 2016, Eric Wheeler wrote: > [ +cc: kent ] > > On Wed, 24 Feb 2016, Marc MERLIN wrote: > > > On Wed, Feb 24, 2016 at 06:53:05AM +0000, Eric Wheeler wrote: > > > Be sure to cherry-pick these from linux 4.5-rc1: > > > git cherry-pick 2ef9ccbf~1..627ccd20 > > > or use one of the 4.1 or 3.18 longterm kernels. > > > > So, I added these patches to my 4.4.2 kernel, but it still crashes when > > seeing one cache device at boot. > > > > Crash: > > https://goo.gl/photos/8H1DtYjSijK4ngFv6 > > > > > static void read_dirty(struct cached_dev *dc) > > [...] > > while (!kthread_should_stop()) { > > try_to_freeze(); > > > > w = bch_keybuf_next(&dc->writeback_keys); > > if (!w) > > break; > > > > >>>>> BUG_ON(ptr_stale(dc->disk.c, &w->key, 0)); > > > > if (KEY_START(&w->key) != dc->last_read || > > Kent, any idea whats going on here? What is this BUG_ON checking? > > It looks like dirty data is being read immediately after register, > possibly due to a crash. The calltrace in the image [https://goo.gl/photos/8H1DtYjSijK4ngFv6] indicates something about kthread_parkme. Is our thread being woken unexpectedly by kthread_park? If so, maybe we can do a better job handling the BUG condition. What would happen if we did something like this: +if (kthread_should_park()) { + kthread_parkme(); + break; +} BUG_ON(ptr_stale(dc->disk.c, &w->key, 0)); /* and maybe this too: */ -BUG_ON(ptr_stale(dc->disk.c, &w->key, 0)); +if (ptr_stale(dc->disk.c, &w->key, 0)) + break; Or would `continue` be more appropriate? I don't think anything is lost at the point we break because the writeback thread will continue to iterate and retry the call to read_dirty(). It hasn't kzalloc'ed yet, so no cleanup necessary. Cleary there is a condition that isn't being handled gracfully enough, and clearly it cannot continue if the BUG condition is met---but its in a loop so can we safely iterate to retry instead of BUGing?? Kent, Do you think this patch would solve the BUG_ON condition in this case? ============================================== diff --git a/drivers/md/bcache/writeback.c b/drivers/md/bcache/writeback.c index ca38362..529310a 100644 --- a/drivers/md/bcache/writeback.c +++ b/drivers/md/bcache/writeback.c @@ -234,7 +234,8 @@ static void read_dirty(struct cached_dev *dc) if (!w) break; - BUG_ON(ptr_stale(dc->disk.c, &w->key, 0)); + if (ptr_stale(dc->disk.c, &w->key, 0)) + goto err; if (KEY_START(&w->key) != dc->last_read || jiffies_to_msecs(delay) > 50) @@ -282,6 +283,10 @@ err: * freed) before refilling again */ closure_sync(&cl); + + if (kthread_should_park()) + kthread_parkme(); + } /* Scan for dirty data */ ============================================== -Eric > > -Eric > > > > > I have to remove the partition for my system to boot. > > > > Before I destroy it, any other patches I should try? > > > > And to be fair, it's a huge pain to deal with this, there should be an > > easier way to just turn bcache off from the kernel command line. In this > > case it was really a lot of work to get back to even a booting system. > > > > You also said: > > > 4.1.18 has the patches, so unless there is something specific in 4.4 that > > > you need, I recommend 4.1. We've been running 4.1.17 with patches in > > > production for a while and it works great. Haven't tried vanilla 4.1.18 > > > yet, but I plan to soon. > > > > Sadly, I run btrfs, I can't just go to random old kernels like this. > > Is bcache not stable in up to date kernels? > > > > Thanks, > > Marc > > -- > > "A mouse is a device used to point at the xterm you want to type in" - A.S.R. > > Microsoft is to operating systems .... > > .... what McDonalds is to gourmet cooking > > Home page: http://marc.merlins.org/ | PGP 1024R/763BE901 > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html