> On 12/1/20 12:35 PM, Yi Li wrote: >> sorry, This patch will cause deadlock, i will check and redo it. > > Can you try latest upstream kernel firstly ? Before spending more time > on the fix. > This issue just happened three times (xenserver7.5 dom0 kernel) on the same machine and cannot reproduce it now. and have not reproduce it using the lastest uptream kernel. > If I remember correctly, when cancel_writeback_rate_update_dwork() is > not timed out, the cache set memory won't be freed before the > writeback_rate_update worker terminates. It is possible that I miss > something in the code, but I suggest to test with a kernel after v5.3, > and better a v5.8+ kernel. > > Coly Li > Thanks. it is confused that why writeback_rate_update worker run again after cancel_delayed_work_sync( kernel log telled). Maybe i should dig more. Attach some debug info as below. Nov 30 18:00:44 host-126 kernel: [ 1697.906137] bcache: update_writeback_rate() dc = ffff880818580000 Nov 30 18:00:44 host-126 kernel: [ 1697.906145] bcache: update_writeback_rate() c = ffff8808182c0000 Nov 30 18:00:44 host-126 kernel: [ 1697.906148] bcache: update_writeback_rate() dc->disk.flags = ffff8808185800a0 Nov 30 18:00:44 host-126 kernel: [ 1697.906150] bcache: update_writeback_rate() c->flags = ffff8808182c0368 Nov 30 18:00:45 host-126 kernel: [ 1698.578104] bcache: update_writeback_rate() dc = ffff880839bd0000 Nov 30 18:00:45 host-126 kernel: [ 1698.578108] bcache: update_writeback_rate() c = ffff8808182c0000 Nov 30 18:00:45 host-126 kernel: [ 1698.578109] bcache: update_writeback_rate() dc->disk.flags = ffff880839bd00a0 Nov 30 18:00:45 host-126 kernel: [ 1698.578111] bcache: update_writeback_rate() c->flags = ffff8808182c0368 Nov 30 18:00:49 host-126 kernel: [ 1702.914126] bcache: update_writeback_rate() dc = ffff880818580000 Nov 30 18:00:49 host-126 kernel: [ 1702.914130] bcache: update_writeback_rate() c = ffff8808182c0000 Nov 30 18:00:49 host-126 kernel: [ 1702.914132] bcache: update_writeback_rate() dc->disk.flags = ffff8808185800a0 Nov 30 18:00:49 host-126 kernel: [ 1702.914133] bcache: update_writeback_rate() c->flags = ffff8808182c0368 Nov 30 18:00:50 host-126 kernel: [ 1703.586182] bcache: update_writeback_rate() dc = ffff880839bd0000 Nov 30 18:00:50 host-126 kernel: [ 1703.586188] bcache: update_writeback_rate() c = ffff8808182c0000 Nov 30 18:00:50 host-126 kernel: [ 1703.586191] bcache: update_writeback_rate() dc->disk.flags = ffff880839bd00a0 Nov 30 18:00:50 host-126 kernel: [ 1703.586193] bcache: update_writeback_rate() c->flags = ffff8808182c0368 Nov 30 18:00:54 host-126 kernel: [ 1707.922215] bcache: update_writeback_rate() dc = ffff880818580000 Nov 30 18:00:54 host-126 kernel: [ 1707.922222] bcache: update_writeback_rate() c = ffff8808182c0000 Nov 30 18:00:54 host-126 kernel: [ 1707.922225] bcache: update_writeback_rate() dc->disk.flags = ffff8808185800a0 Nov 30 18:00:54 host-126 kernel: [ 1707.922227] bcache: update_writeback_rate() c->flags = ffff8808182c0368 Nov 30 18:00:55 host-126 kernel: [ 1708.594202] bcache: update_writeback_rate() dc = ffff880839bd0000 Nov 30 18:00:55 host-126 kernel: [ 1708.594206] bcache: update_writeback_rate() c = ffff8808182c0000 Nov 30 18:00:55 host-126 kernel: [ 1708.594208] bcache: update_writeback_rate() dc->disk.flags = ffff880839bd00a0 Nov 30 18:00:55 host-126 kernel: [ 1708.594210] bcache: update_writeback_rate() c->flags = ffff8808182c0368 Nov 30 18:00:55 host-126 kernel: [ 1709.118221] sd 0:0:1:0: device_block, handle(0x0009) Nov 30 18:00:58 host-126 kernel: [ 1711.368197] sd 0:0:1:0: device_unblock and setting to running, handle(0x0009) Nov 30 18:00:58 host-126 kernel: [ 1711.368263] sd 0:0:1:0: [sdb] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Nov 30 18:00:58 host-126 kernel: [ 1711.368277] sd 0:0:1:0: [sdb] tag#0 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 Nov 30 18:00:58 host-126 kernel: [ 1711.368289] blk_update_request: I/O error, dev sdb, sector 36160 Nov 30 18:00:58 host-126 kernel: [ 1711.368326] bcache: error on 96083de4-6b3e-4ede-81d1-44edc1a93729: journal io error, disabling caching Nov 30 18:00:58 host-126 kernel: [ 1711.368372] sd 0:0:1:0: [sdb] tag#1 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Nov 30 18:00:58 host-126 kernel: [ 1711.368387] sd 0:0:1:0: [sdb] tag#1 CDB: Write(10) 2a 00 00 81 b0 40 00 00 08 00 Nov 30 18:00:58 host-126 kernel: [ 1711.368392] blk_update_request: I/O error, dev sdb, sector 8499264 Nov 30 18:00:58 host-126 kernel: [ 1711.368407] bcache: bch_count_io_errors() sdb2: IO error on writing data to cache. Nov 30 18:00:58 host-126 kernel: [ 1711.368447] bcache: bch_cached_dev_detach() bch_cached_dev_detach start Nov 30 18:00:58 host-126 kernel: [ 1711.368458] bcache: bch_cached_dev_detach() bch_cached_dev_detach end Nov 30 18:00:58 host-126 kernel: [ 1711.368463] bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache1 is "auto" and cache is dirty, stop it to avoid potential data corruption. Nov 30 18:00:58 host-126 kernel: [ 1711.368466] bcache: bch_cached_dev_detach() bch_cached_dev_detach start Nov 30 18:00:58 host-126 kernel: [ 1711.368474] bcache: bch_cached_dev_detach() bch_cached_dev_detach end Nov 30 18:00:58 host-126 kernel: [ 1711.368479] bcache: conditional_stop_bcache_device() stop_when_cache_set_failed of bcache0 is "auto" and cache is clean, keep it alive. Nov 30 18:00:58 host-126 kernel: [ 1711.368494] Buffer I/O error on dev bcache1, logical block 210979209, lost async page write Nov 30 18:00:58 host-126 kernel: [ 1711.368585] bcache: cached_dev_detach_finish() cached_dev_detach_finish start Nov 30 18:00:58 host-126 kernel: [ 1711.368589] bcache: cached_dev_detach_finish() dc->disk.flags = f Nov 30 18:00:58 host-126 kernel: [ 1711.368591] bcache: cancel_writeback_rate_update_dwork() cancel_writeback_rate_update_dwork start Nov 30 18:00:58 host-126 kernel: [ 1711.368594] bcache: cancel_writeback_rate_update_dwork() cancel_writeback_rate_update_dwork end Nov 30 18:00:58 host-126 kernel: [ 1711.368617] bcache: cached_dev_detach_finish() cached_dev_detach_finish start Nov 30 18:00:58 host-126 kernel: [ 1711.369022] bcache: cached_dev_detach_finish() Caching disabled for sda2 Nov 30 18:00:58 host-126 kernel: [ 1711.369025] bcache: cached_dev_detach_finish() cached_dev_detach_finish end Nov 30 18:00:58 host-126 kernel: [ 1711.369035] bcache: cached_dev_detach_finish() dc->disk.flags = a Nov 30 18:00:58 host-126 kernel: [ 1711.369039] bcache: cancel_writeback_rate_update_dwork() cancel_writeback_rate_update_dwork start Nov 30 18:00:58 host-126 kernel: [ 1711.369042] bcache: cancel_writeback_rate_update_dwork() cancel_writeback_rate_update_dwork end Nov 30 18:00:58 host-126 kernel: [ 1711.369460] bcache: cached_dev_detach_finish() Caching disabled for sdf2 Nov 30 18:00:58 host-126 kernel: [ 1711.369464] bcache: cached_dev_detach_finish() cached_dev_detach_finish end Nov 30 18:00:58 host-126 kernel: [ 1711.374327] sd 0:0:1:0: [sdb] Synchronizing SCSI cache Nov 30 18:00:58 host-126 kernel: [ 1711.374516] sd 0:0:1:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Nov 30 18:00:58 host-126 kernel: [ 1711.376907] bcache: cache_set_free() Cache set 96083de4-6b3e-4ede-81d1-44edc1a93729 unregistered Nov 30 18:00:58 host-126 kernel: [ 1711.418979] mpt3sas_cm0: removing handle(0x0009), sas_addr(0x4433221102000000) Nov 30 18:00:58 host-126 kernel: [ 1711.418984] mpt3sas_cm0: removing :enclosure logical id(0x5a4bf013514f4000), slot(4) Nov 30 18:00:58 host-126 kernel: [ 1711.418988] mpt3sas_cm0: removing :enclosure level(0x0000), connector name( ) Nov 30 18:01:05 host-126 kernel: [ 1718.873725] bcache: cached_dev_free() cached_dev_free start Nov 30 18:01:05 host-126 kernel: [ 1718.873776] bcache: bcache_device_free() bcache1 stopped Nov 30 18:01:05 host-126 kernel: [ 1718.926598] bcache: cached_dev_free() cached_dev_free end Nov 30 18:01:06 host-126 kernel: [ 1719.668540] sda: sda1 sda2 Nov 30 18:01:11 host-126 kernel: [ 1725.010305] bcache: update_writeback_rate() dc = ffff880818580000 Nov 30 18:01:11 host-126 kernel: [ 1725.010309] bcache: update_writeback_rate() c = (null) ------------the kernel log show that cache_set is null. Nov 30 18:01:11 host-126 kernel: [ 1725.010311] bcache: update_writeback_rate() dc->disk.flags = ffff8808185800a0 Nov 30 18:01:12 host-126 kernel: [ 1725.369791] mpt3sas_cm0: detecting: handle(0x0009), sas_address(0x4433221102000000), phy(2) Nov 30 18:01:12 host-126 kernel: [ 1725.369802] mpt3sas_cm0: REPORT_LUNS: handle(0x0009), retries(0) Nov 30 18:01:12 host-126 kernel: [ 1725.618138] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0009), lun(0) Nov 30 18:01:13 host-126 kernel: [ 1726.619148] mpt3sas_cm0: detecting: handle(0x0009), sas_address(0x4433221102000000), phy(2) Nov 30 18:01:13 host-126 kernel: [ 1726.619156] mpt3sas_cm0: REPORT_LUNS: handle(0x0009), retries(0) Nov 30 18:01:13 host-126 kernel: [ 1726.619300] mpt3sas_cm0: TEST_UNIT_READY: handle(0x0009), lun(0) Nov 30 18:01:13 host-126 kernel: [ 1726.622323] scsi 0:0:6:0: Direct-Access ATA INTEL SSDSC2BB48 0112 PQ: 0 ANSI: 6 Nov 30 18:01:13 host-126 kernel: [ 1726.622839] scsi 0:0:6:0: SATA: handle(0x0009), sas_addr(0x4433221102000000), phy(2), device_name(0x55cd2e414d6dcd4b) Nov 30 18:01:13 host-126 kernel: [ 1726.622845] scsi 0:0:6:0: SATA: enclosure logical id(0x5a4bf013514f4000), slot(4) Nov 30 18:01:13 host-126 kernel: [ 1726.622847] scsi 0:0:6:0: SATA: enclosure level(0x0000), connector name( ) Nov 30 18:01:13 host-126 kernel: [ 1726.622990] scsi 0:0:6:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y) Nov 30 18:01:13 host-126 kernel: [ 1726.623642] scsi 0:0:6:0: serial_number(PHDV7102003Q480BGN ) Nov 30 18:01:13 host-126 kernel: [ 1726.626116] sd 0:0:6:0: Attached scsi generic sg1 type 0 Nov 30 18:01:13 host-126 kernel: [ 1726.626784] sd 0:0:6:0: [sdb] 937703088 512-byte logical blocks: (480 GB/447 GiB) Nov 30 18:01:13 host-126 kernel: [ 1726.626789] sd 0:0:6:0: [sdb] 4096-byte physical blocks Nov 30 18:01:13 host-126 kernel: [ 1726.631161] sd 0:0:6:0: [sdb] Write Protect is off Nov 30 18:01:13 host-126 kernel: [ 1726.631165] sd 0:0:6:0: [sdb] Mode Sense: 9b 00 10 08 Nov 30 18:01:13 host-126 kernel: [ 1726.632513] sd 0:0:6:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA Nov 30 18:01:13 host-126 kernel: [ 1726.648068] sdb: sdb1 sdb2 Nov 30 18:01:13 host-126 kernel: [ 1726.656841] sd 0:0:6:0: [sdb] Attached SCSI disk >> >> On 11/30/20, Yi Li <yili@xxxxxxxxxxx> wrote: >>> bcache_device_detach will release the cache_set after hotunplug cache >>> disk. update_writeback_rate should check validate of cache_set. >>> >>> IP: [<ffffffffa03730c9>] update_writeback_rate+0x59/0x3a0 [bcache] >>> PGD 879620067 PUD 8755d3067 PMD 0 >>> Oops: 0000 [#1] SMP >>> CPU: 8 PID: 1005702 Comm: kworker/8:0 Tainted: G 4.4.0+10 #1 >>> Hardware name: Intel BIOS SE5C610.86B.01.01.0021.032120170601 >>> 03/21/2017 >>> Workqueue: events update_writeback_rate [bcache] >>> task: ffff8808786f3800 ti: ffff88077082c000 task.ti: ffff88077082c000 >>> RIP: e030:[<ffffffffa03730c9>] update_writeback_rate+0x59/0x3a0 >>> [bcache] >>> RSP: e02b:ffff88077082fde0 EFLAGS: 00010202 >>> RAX: 0000000000000018 RBX: ffff8808047f0b08 RCX: 0000000000000000 >>> RDX: 0000000000000001 RSI: ffff88088170dab8 RDI: ffff88088170dab8 >>> RBP: ffff88077082fe18 R08: 000000000000000a R09: 0000000000000000 >>> R10: 0000000000000000 R11: 0000000000017bc8 R12: 0000000000000000 >>> R13: ffff8808047f0000 R14: 0000000000000200 R15: ffff8808047f0b08 >>> FS: 00007f157b6d6700(0000) GS:ffff880881700000(0000) >>> knlGS:0000000000000000 >>> CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> CR2: 0000000000000368 CR3: 0000000875c05000 CR4: 0000000000040660 >>> Stack: >>> 0000000000000001 0000000000007ff0 ffff88085ff600c0 ffff880881714e80 >>> ffff880881719500 0000000000000200 ffff8808047f0b08 ffff88077082fe60 >>> ffffffff81088c0c 0000000081714e80 0000000000000000 ffff880881714e80 >>> Call Trace: >>> [<ffffffff81088c0c>] process_one_work+0x1fc/0x3b0 >>> [<ffffffff81089575>] worker_thread+0x2a5/0x470 >>> [<ffffffff815a2f58>] ? __schedule+0x648/0x870 >>> [<ffffffff810892d0>] ? rescuer_thread+0x300/0x300 >>> [<ffffffff8108e3d5>] kthread+0xd5/0xe0 >>> [<ffffffff8108e300>] ? kthread_stop+0x110/0x110 >>> [<ffffffff815a704f>] ret_from_fork+0x3f/0x70 >>> [<ffffffff8108e300>] ? kthread_stop+0x110/0x110 >>> >>> Reported-by: Guo Chao <guochao@xxxxxxxxxxx> >>> Signed-off-by: Guo Chao <guochao@xxxxxxxxxxx> >>> Signed-off-by: Yi Li <yili@xxxxxxxxxxx> >>> --- >>> drivers/md/bcache/writeback.c | 12 +++++++++++- >>> 1 file changed, 11 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/md/bcache/writeback.c >>> b/drivers/md/bcache/writeback.c >>> index 3c74996978da..186c4c6e1607 100644 >>> --- a/drivers/md/bcache/writeback.c >>> +++ b/drivers/md/bcache/writeback.c >>> @@ -175,7 +175,15 @@ static void update_writeback_rate(struct >>> work_struct >>> *work) >>> struct cached_dev *dc = container_of(to_delayed_work(work), >>> struct cached_dev, >>> writeback_rate_update); >>> - struct cache_set *c = dc->disk.c; >>> + struct cache_set *c = NULL; >>> + >>> + mutex_lock(&bch_register_lock); >>> + c = dc->disk.c; >>> + >>> + if (c == NULL) { >>> + mutex_unlock(&bch_register_lock); >>> + return; >>> + } >>> >>> /* >>> * should check BCACHE_DEV_RATE_DW_RUNNING before calling >>> @@ -194,6 +202,7 @@ static void update_writeback_rate(struct work_struct >>> *work) >>> clear_bit(BCACHE_DEV_RATE_DW_RUNNING, &dc->disk.flags); >>> /* paired with where BCACHE_DEV_RATE_DW_RUNNING is tested */ >>> smp_mb__after_atomic(); >>> + mutex_unlock(&bch_register_lock); >>> return; >>> } >>> >>> @@ -230,6 +239,7 @@ static void update_writeback_rate(struct work_struct >>> *work) >>> clear_bit(BCACHE_DEV_RATE_DW_RUNNING, &dc->disk.flags); >>> /* paired with where BCACHE_DEV_RATE_DW_RUNNING is tested */ >>> smp_mb__after_atomic(); >>> + mutex_unlock(&bch_register_lock); >>> } >>> >>> static unsigned int writeback_delay(struct cached_dev *dc, >>> -- >>> 2.25.3 >>> >>> >>> >>> > >