On 2 May 2014 05:54, Slava Pestov <sp@xxxxxxxxx> wrote: > On Thu, May 1, 2014 at 2:38 AM, Daniel Smedegaard Buus > <danielbuus@xxxxxxxxx> wrote: >> On Wed, Apr 30, 2014 at 7:24 PM, Darrick J. Wong >> <darrick.wong@xxxxxxxxxx> wrote: >>> >>> I haven't spent time on figuring out the other source of load average. Kent >>> didn't seem to like the patch to convert the bcache_writeback thread to >>> interruptible sleep (I recall he said it was 'wrong', but didn't elaborate). >> >> Sorry to hear that... Would be really nice to be able to go back to >> normal load. And I cannot revert to an older kernel, as I need >> 3.15-rc2 or greater to fix a different problem concerning Oracle Java > Hi Daniel and Darrick, > > I mailed a patch that attempts to fix the uninterruptible issue while > taking Kent's feedback regarding your earlier patch into account. > Please test it out and let me know what you think. Hi Slava, Apologies for the delays. I rebuilt 3.14.4 with your bcache patch [1] and the 'bcache_writeback blocked for more than 120 seconds' don't occur, though when the bcache threads are torn down during reboot, we crash [2] at: static void cached_dev_free(struct closure *cl) { struct cached_dev *dc = container_of(cl, struct cached_dev, disk.cl); cancel_delayed_work_sync(&dc->writeback_rate_update); kthread_stop(dc->writeback_thread); dc->writeback_thread is clearly zero, as likely the struct cached_dev was freed already. Many thanks, Daniel [1] http://www.spinics.net/lists/linux-bcache/msg02464.html -- [2] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff8108bff6>] kthread_stop+0x16/0xe0 PGD 0 Oops: 0002 [#1] SMP Modules linked in: microcode(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) nfs(F) fscache(F) lockd(F) sunrpc(F) joydev(F) ipmi_si(F) ipmi_msghandler(F) psmouse(F) serio_raw(F) video(F) mac_hid(F) lpc_ich(F) lp(F) parport(F) btrfs(F) raid6_pq(F) bcache(F) xor(F) hid_generic(F) usbhid(F) hid(F) e1000e(F) ptp(F) pps_core(F) ahci(F) CPU: 2 PID: 27 Comm: kworker/2:0 Tainted: GF 3.14.4-bcachefix+ #3 Hardware name: Supermicro X9SCL/X9SCM/X9SCL/X9SCM, BIOS 2.0a 06/08/2012 Workqueue: events cached_dev_free [bcache] task: ffff8804097363c0 ti: ffff8804097d2000 task.ti: ffff8804097d2000 RIP: 0010:[<ffffffff8108bff6>] [<ffffffff8108bff6>] kthread_stop+0x16/0xe0 RSP: 0018:ffff8804097d3df0 EFLAGS: 00010292 RAX: 0000000fffffff00 RBX: ffff8804034e0010 RCX: 000000007fffffff RDX: 0000000000000296 RSI: 000000007fffffff RDI: 0000000000000000 RBP: ffff8804097d3e08 R08: 20100d3800400000 R09: 0080000000000000 R10: dfef7acc030e0010 R11: 0000000000000400 R12: 0000000000000000 R13: ffff8804034e0010 R14: 0000000000000000 R15: 0000000000000080 FS: 0000000000000000(0000) GS:ffff88041fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000001c0e000 CR4: 00000000000407e0 Stack: ffff8804034e0010 ffff88041fd13d80 ffff8804034e0010 ffff8804097d3e20 ffffffffa00b2dd5 ffff880409503580 ffff8804097d3e68 ffffffff81084482 000000001fd13d98 ffff88041fd17f00 ffff88041fd13d98 ffff8804095035b0 Call Trace: [<ffffffffa00b2dd5>] cached_dev_free+0x25/0x100 [bcache] [<ffffffff81084482>] process_one_work+0x182/0x450 [<ffffffff81085241>] worker_thread+0x121/0x410 [<ffffffff81085120>] ? rescuer_thread+0x3e0/0x3e0 [<ffffffff8108bdc2>] kthread+0xd2/0xf0 [<ffffffff8108bcf0>] ? kthread_create_on_node+0x190/0x190 [<ffffffff8173c67c>] ret_from_fork+0x7c/0xb0 [<ffffffff8108bcf0>] ? kthread_create_on_node+0x190/0x190 Code: e8 20 ff ff ff 48 89 df be 00 02 00 00 e8 63 10 01 00 5b 5d c3 66 66 66 66 90 55 48 89 e5 41 55 41 54 49 89 fc 53 66 66 66 66 90 <f0> 41 ff 44 24 10 49 8b 9c 24 a0 04 00 00 48 85 db 74 21 f0 80 RIP [<ffffffff8108bff6>] kthread_stop+0x16/0xe0 RSP <ffff8804097d3df0> CR2: 0000000000000010 -- Daniel J Blueman -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html