On 2018/8/7 9:26 PM, shenghui wrote: > > > On 08/06/2018 09:11 PM, Coly Li wrote: >> On 2018/8/6 10:53 AM, shenghui wrote: >>> >>> >>> On 08/05/2018 06:00 PM, Coly Li wrote: >>>> On 2018/8/5 4:07 PM, shenghui wrote: >>>>> >>>>> >>>>> On 08/05/2018 12:14 PM, Coly Li wrote: >>>>>> On 2018/8/5 10:16 AM, shenghui wrote: >>>>>>> >>>>>>> >>>>>>> On 08/05/2018 01:35 AM, Coly Li wrote: >>>>>>>> On 2018/8/3 6:57 PM, Shenghui Wang wrote: >>>>>>>>> Recal cached_dev_sectors on cached_dev detached, as recal done on >>>>>>>>> cached_dev attached. >>>>>>>>> >>>>>>>>> Signed-off-by: Shenghui Wang <shhuiw@xxxxxxxxxxx> >>>>>>>>> --- >>>>>>>>> drivers/md/bcache/super.c | 1 + >>>>>>>>> 1 file changed, 1 insertion(+) >>>>>>>>> >>>>>>>>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c >>>>>>>>> index fa4058e43202..a5612c8a6c14 100644 >>>>>>>>> --- a/drivers/md/bcache/super.c >>>>>>>>> +++ b/drivers/md/bcache/super.c >>>>>>>>> @@ -991,6 +991,7 @@ static void cached_dev_detach_finish(struct work_struct *w) >>>>>>>>> >>>>>>>>> bcache_device_detach(&dc->disk); >>>>>>>>> list_move(&dc->list, &uncached_devices); >>>>>>>>> + calc_cached_dev_sectors(dc->disk.c); >>>>>>>>> >>>>>>>>> clear_bit(BCACHE_DEV_DETACHING, &dc->disk.flags); >>>>>>>>> clear_bit(BCACHE_DEV_UNLINK_DONE, &dc->disk.flags); >>>>>>>>> >>>>>>>> >>>>>>>> Hi Shenghui, >>>>>>>> >>>>>>>> During my testing, after writeback all dirty data, when I detach the >>>>>>>> backing device from cache set, a NULL pointer dereference error happens. >>>>>>>> Here is the oops message, >>>>>>>> >>>>>>>> [ 4114.687721] BUG: unable to handle kernel NULL pointer dereference at >>>>>>>> 0000000000000cf8 >>>>>>>> [ 4114.691136] PGD 0 P4D 0 >>>>>>>> [ 4114.692094] Oops: 0000 [#1] PREEMPT SMP PTI >>>>>>>> [ 4114.693962] CPU: 1 PID: 1845 Comm: kworker/1:43 Tainted: G >>>>>>>> E 4.18.0-rc7-1-default+ #1 >>>>>>>> [ 4114.697732] Hardware name: VMware, Inc. VMware Virtual Platform/440BX >>>>>>>> Desktop Reference Platform, BIOS 6.00 05/19/2017 >>>>>>>> [ 4114.701886] Workqueue: events cached_dev_detach_finish [bcache] >>>>>>>> [ 4114.704072] RIP: 0010:cached_dev_detach_finish+0x127/0x1e0 [bcache] >>>>>>>> [ 4114.706377] Code: 3f 58 01 00 31 d2 4c 89 60 08 48 89 83 a8 f3 ff ff >>>>>>>> 48 c7 83 b0 f3 ff ff 10 72 31 c0 4c 89 25 20 58 01 00 48 8b bb 48 f4 ff >>>>>>>> ff <48> 8b 87 f8 0c 00 00 48 8d b7 f8 0c 00 00 48 39 c6 74 1e 48 8b 88 >>>>>>>> [ 4114.714524] RSP: 0018:ffffba4881b33e30 EFLAGS: 00010246 >>>>>>>> [ 4114.716537] RAX: ffffffffc0317210 RBX: ffff9bea33c00c58 RCX: >>>>>>>> 0000000000000000 >>>>>>>> [ 4114.719193] RDX: 0000000000000000 RSI: ffff9bea2ffb15e0 RDI: >>>>>>>> 0000000000000000 >>>>>>>> [ 4114.721790] RBP: ffff9bea33c00010 R08: 0000000000000000 R09: >>>>>>>> 000000000000000f >>>>>>>> [ 4114.724477] R10: ffff9bea254ec928 R11: 0000000000000010 R12: >>>>>>>> ffff9bea33c00000 >>>>>>>> [ 4114.727170] R13: 0000000000000000 R14: ffff9bea35666500 R15: >>>>>>>> 0000000000000000 >>>>>>>> [ 4114.730012] FS: 0000000000000000(0000) GS:ffff9bea35640000(0000) >>>>>>>> knlGS:0000000000000000 >>>>>>>> [ 4114.732966] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>>> [ 4114.735068] CR2: 0000000000000cf8 CR3: 000000012300a004 CR4: >>>>>>>> 00000000003606e0 >>>>>>>> [ 4114.737693] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>>>>>>> 0000000000000000 >>>>>>>> [ 4114.740286] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >>>>>>>> 0000000000000400 >>>>>>>> [ 4114.743187] Call Trace: >>>>>>>> [ 4114.744133] ? bch_keybuf_init+0x60/0x60 [bcache] >>>>>>>> [ 4114.745969] ? bch_sectors_dirty_init.cold.21+0x1b/0x1b [bcache] >>>>>>>> [ 4114.748181] process_one_work+0x1d1/0x310 >>>>>>>> [ 4114.749677] worker_thread+0x28/0x3c0 >>>>>>>> [ 4114.751053] ? rescuer_thread+0x330/0x330 >>>>>>>> [ 4114.752541] kthread+0x108/0x120 >>>>>>>> [ 4114.753752] ? kthread_create_worker_on_cpu+0x60/0x60 >>>>>>>> [ 4114.756001] ret_from_fork+0x35/0x40 >>>>>>>> [ 4114.757332] Modules linked in: bcache(E) af_packet(E) iscsi_ibft(E) >>>>>>>> iscsi_boot_sysfs(E) vmw_vsock_vmci_transport(E) vsock(E) vmw_balloon(E) >>>>>>>> e1000(E) vmw_vmci(E) sr_mod(E) cdrom(E) ata_piix(E) uhci_hcd(E) >>>>>>>> ehci_pci(E) ehci_hcd(E) mptspi(E) scsi_transport_spi(E) mptscsih(E) >>>>>>>> usbcore(E) mptbase(E) sg(E) >>>>>>>> [ 4114.766902] CR2: 0000000000000cf8 >>>>>>>> [ 4114.768135] ---[ end trace 467143bbdebef7b9 ]--- >>>>>>>> [ 4114.769992] RIP: 0010:cached_dev_detach_finish+0x127/0x1e0 [bcache] >>>>>>>> [ 4114.772287] Code: 3f 58 01 00 31 d2 4c 89 60 08 48 89 83 a8 f3 ff ff >>>>>>>> 48 c7 83 b0 f3 ff ff 10 72 31 c0 4c 89 25 20 58 01 00 48 8b bb 48 f4 ff >>>>>>>> ff <48> 8b 87 f8 0c 00 00 48 8d b7 f8 0c 00 00 48 39 c6 74 1e 48 8b 88 >>>>>>>> [ 4114.779325] RSP: 0018:ffffba4881b33e30 EFLAGS: 00010246 >>>>>>>> [ 4114.781300] RAX: ffffffffc0317210 RBX: ffff9bea33c00c58 RCX: >>>>>>>> 0000000000000000 >>>>>>>> [ 4114.783960] RDX: 0000000000000000 RSI: ffff9bea2ffb15e0 RDI: >>>>>>>> 0000000000000000 >>>>>>>> [ 4114.786582] RBP: ffff9bea33c00010 R08: 0000000000000000 R09: >>>>>>>> 000000000000000f >>>>>>>> [ 4114.789207] R10: ffff9bea254ec928 R11: 0000000000000010 R12: >>>>>>>> ffff9bea33c00000 >>>>>>>> [ 4114.791827] R13: 0000000000000000 R14: ffff9bea35666500 R15: >>>>>>>> 0000000000000000 >>>>>>>> [ 4114.794521] FS: 0000000000000000(0000) GS:ffff9bea35640000(0000) >>>>>>>> knlGS:0000000000000000 >>>>>>>> [ 4114.797509] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>>>> [ 4114.799613] CR2: 0000000000000cf8 CR3: 000000012300a004 CR4: >>>>>>>> 00000000003606e0 >>>>>>>> [ 4114.802559] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>>>>>>> 0000000000000000 >>>>>>>> [ 4114.805195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >>>>>>>> 0000000000000400 >>>>>>>> >>>>>>>> Could you please to have a look ? >>>>>>>> cached_dev_detach_finish() is executed in a work queue, when it is >>>>>>>> called, it is possible that the cache set memory is released already. >>>>>>>> >>>>>>>> Thanks. >>>>>>>> >>>>>>>> Coly Li >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Hi Coly, >>>>>>> >>>>>>> I checked the code path, and found that bcache_device_detach will >>>>>>> set bcache_device->c to NULL before my previous change. So I made >>>>>>> a new change. >>>>>>> >>>>>>> Please check the followed new patch. >>>>>> >>>>>> Sure no problem, just double check, do you test/verify the change before >>>>>> posting it ? >>>>>> >>>>>> Thanks. >>>>>> >>>>>> Coly Li >>>>>> >>>>> >>>>> Hi Coly, >>>>> >>>>> I did basic attach/detach test. >>>>> >>>>> Will you please share your test case, so that I can do further test? >>>> >>>> Sure, here is my procedure, >>>> 1, make 100G cache set and 500G backing device >>>> 2, attach as writeback mode >>>> 3, set congested_read/write_threshold to 0 >>>> 4, run fio to generate dirty data on cache set >>>> 5, when dirty data exceeds aboud 20% of dirty target, stop fio jobs >>>> 6, wait for all dirty data are written back to backing device >>>> 7, "echo 1 > /sys/block/bcache0/bcache/detach" to detach the backing >>>> device from cache set >>>> 8, "echo 1 > /sys/block/bcache0/bcache/stop" to stop backing device >>>> 9, "echo 1 > /sys/fs/bcache/<UUID>/stop" to stop cache set >>>> 10, rmmod bcache >>>> >>>> Here is my fio job file, this is function verification on my laptop. >>>> [global] >>>> thread=1 >>>> ioengine=libaio >>>> direct=1 >>>> >>>> [job0] >>>> filename=/dev/bcache0 >>>> readwrite=randrw >>>> rwmixread=0.5 >>>> blocksize=32k >>>> numjobs=2 >>>> iodepth=16 >>>> runtime=20m >>>> >>>> Thanks. >>>> >>>> Coly Li >>>> >>>> >>>> >>> >>> Hi Coly, >>> >> >> Hi Shenghui, >> >>> I run fio test on my desktop: >>> >>> 1) I used HDD for test: >>> sda6 50G backing device >>> sda8 50G cache device >>> 2) attach as writeback mode >>> echo writeback > /sys/block/bcache0/bcache/cache_mode >>> 3) set congested_read/write_threshold to 0 >>> echo 0 > /sys/fs/bcache/be61e17f-a8d5-46ef-b098-3c592878bfc2/congested_read_threshold_us >>> echo 0 > /sys/fs/bcache/be61e17f-a8d5-46ef-b098-3c592878bfc2/congested_write_threshold_us >>> 4) run fio to generate dirty data on cache set >>> fio bch.fio (job file as your reply) >>> 5) when dirty data exceeds aboud 20% of dirty target, stop fio jobs >>> cat /sys/block/bcache0/bcache/dirty_data # the max value is 5.1G in my env, then 5.0G to the end of job >>> 6) wait for all dirty data are written back to backing device >>> [I don't know how to wait, so I just wait the fio run to end] >> >> Just wait and check /sys/block/bcache0/bcache/writeback_rate_debug until >> dirty number is 0KB. >> >>> 7) "echo 1 > /sys/block/bcache0/bcache/detach" to detach the backing device from cache set >>> 8) "echo 1 > /sys/block/bcache0/bcache/stop" to stop backing device >>> 9) "echo 1 > /sys/fs/bcache/be61e17f-a8d5-46ef-b098-3c592878bfc2/stop" to stop cache set >>> 10) rmmod bcache >>> rmmod: ERROR: Module bcache is in use >> >> A busy bcache mode is suspected, most of time it means some reference >> count is not clearly dropped. This is not an expected behavior. >> >>> # lsmod | grep bcache >>> bcache 200704 3 >>> >>> >>> No crash in my env. >> >> You need to check where and which reference count is not dropped. And >> expected behavior is, when both cache set device(s) and backing >> device(s) stopped, bcache.ko should be clearly unloaded by rmmod. >> >> Thanks. >> >> Coly Li >> >> >> > Hi Coly, > > Sorry for my late update. > > I built 4.18-rc8 kernel with the change, and run the test 4 times. No crash. > > Steps to test: > ------------- > 1) sda6 50G backing device > sda8 50G cache device > echo /dev/sda6 > /sys/fs/bcache/register > echo /dev/sda8 > /sys/fs/bcache/register > cd /sys/fs/bcache to get the <CSET-UUID> > echo <CSET-UUID> > /sys/block/bcache0/bcache/attach > echo 1 > /sys/block/sda/sda6/bcache/running > 2) attach as writeback mode > echo writeback > /sys/block/bcache0/bcache/cache_mode > 3) set congested_read/write_threshold to 0 > echo 0 > /sys/fs/bcache/<CSET-UUID>/congested_read_threshold_us > echo 0 > /sys/fs/bcache<CSET-UUID>/congested_write_threshold_us > 4) run fio to generate dirty data on cache set > fio bch.fio (configure provided by Coly) > 5) when dirty data exceeds aboud 20% of dirty target, stop fio jobs > # cat /sys/block/bcache0/bcache/writeback_rate_debug > rate: 4.0k/sec > dirty: 1.0G > target: 5.0G > proportional: -101.4M > integral: 0.0k > change: 0.0k/sec > next io: 2490ms > 6) wait for all dirty data are written back to backing device > Just wait and check /sys/block/bcache0/bcache/writeback_rate_debug until dirty number is 0KB. > # cat /sys/block/bcache0/bcache/writeback_rate_debug > rate: 4.0k/sec > dirty: 0.0k > target: 5.0G > proportional: -128.7M > integral: 0.0k > change: 0.0k/sec > next io: -949ms > 7) "echo 1 > /sys/block/bcache0/bcache/detach" to detach the backing device from cache set > 8) "echo 1 > /sys/block/bcache0/bcache/stop" to stop backing device > 9) "echo 1 > /sys/fs/bcache/<CSET-UUID>/stop" to stop cache set > 10) rmmod bcache > > > > I recheck the code path: > --------- > calc_cached_dev_sectors(dc->disk.c); > bcache_device_detach(&dc->disk); > > As bcache_device_detach will use cache_set in its code, I assume cache_set is not > released before it's called. Otherwise your testcase would fail with the clean > upstream kernel. > > I added some print statements to prompt if cache_set is NULL before the code snippet > During the above test, no NULL cache_set was detected. > > I don't know how to recreate your crash case. Please advise. Hi Shenghui, I didn't say crash in my previous reply. I talked about an error about "rmmod: ERROR: Module bcache is in use" you mentioned in last email. Coly Li -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html