Re: [PATCH] bcache: recal cached_dev_sectors on detach

Coly Li <colyli@xxxxxxx> · Tue, 7 Aug 2018 21:39:30 +0800

On 2018/8/7 9:26 PM, shenghui wrote:
> 
> 
> On 08/06/2018 09:11 PM, Coly Li wrote:
>> On 2018/8/6 10:53 AM, shenghui wrote:
>>>
>>>
>>> On 08/05/2018 06:00 PM, Coly Li wrote:
>>>> On 2018/8/5 4:07 PM, shenghui wrote:
>>>>>
>>>>>
>>>>> On 08/05/2018 12:14 PM, Coly Li wrote:
>>>>>> On 2018/8/5 10:16 AM, shenghui wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 08/05/2018 01:35 AM, Coly Li wrote:
>>>>>>>> On 2018/8/3 6:57 PM, Shenghui Wang wrote:
>>>>>>>>> Recal cached_dev_sectors on cached_dev detached, as recal done on
>>>>>>>>> cached_dev attached.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Shenghui Wang <shhuiw@xxxxxxxxxxx>
>>>>>>>>> ---
>>>>>>>>>  drivers/md/bcache/super.c | 1 +
>>>>>>>>>  1 file changed, 1 insertion(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>>>>>>>>> index fa4058e43202..a5612c8a6c14 100644
>>>>>>>>> --- a/drivers/md/bcache/super.c
>>>>>>>>> +++ b/drivers/md/bcache/super.c
>>>>>>>>> @@ -991,6 +991,7 @@ static void cached_dev_detach_finish(struct work_struct *w)
>>>>>>>>>  
>>>>>>>>>  	bcache_device_detach(&dc->disk);
>>>>>>>>>  	list_move(&dc->list, &uncached_devices);
>>>>>>>>> +	calc_cached_dev_sectors(dc->disk.c);
>>>>>>>>>  
>>>>>>>>>  	clear_bit(BCACHE_DEV_DETACHING, &dc->disk.flags);
>>>>>>>>>  	clear_bit(BCACHE_DEV_UNLINK_DONE, &dc->disk.flags);
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Shenghui,
>>>>>>>>
>>>>>>>> During my testing, after writeback all dirty data, when I detach the
>>>>>>>> backing device from cache set, a NULL pointer dereference error happens.
>>>>>>>> Here is the oops message,
>>>>>>>>
>>>>>>>> [ 4114.687721] BUG: unable to handle kernel NULL pointer dereference at
>>>>>>>> 0000000000000cf8
>>>>>>>> [ 4114.691136] PGD 0 P4D 0
>>>>>>>> [ 4114.692094] Oops: 0000 [#1] PREEMPT SMP PTI
>>>>>>>> [ 4114.693962] CPU: 1 PID: 1845 Comm: kworker/1:43 Tainted: G
>>>>>>>> E     4.18.0-rc7-1-default+ #1
>>>>>>>> [ 4114.697732] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
>>>>>>>> Desktop Reference Platform, BIOS 6.00 05/19/2017
>>>>>>>> [ 4114.701886] Workqueue: events cached_dev_detach_finish [bcache]
>>>>>>>> [ 4114.704072] RIP: 0010:cached_dev_detach_finish+0x127/0x1e0 [bcache]
>>>>>>>> [ 4114.706377] Code: 3f 58 01 00 31 d2 4c 89 60 08 48 89 83 a8 f3 ff ff
>>>>>>>> 48 c7 83 b0 f3 ff ff 10 72 31 c0 4c 89 25 20 58 01 00 48 8b bb 48 f4 ff
>>>>>>>> ff <48> 8b 87 f8 0c 00 00 48 8d b7 f8 0c 00 00 48 39 c6 74 1e 48 8b 88
>>>>>>>> [ 4114.714524] RSP: 0018:ffffba4881b33e30 EFLAGS: 00010246
>>>>>>>> [ 4114.716537] RAX: ffffffffc0317210 RBX: ffff9bea33c00c58 RCX:
>>>>>>>> 0000000000000000
>>>>>>>> [ 4114.719193] RDX: 0000000000000000 RSI: ffff9bea2ffb15e0 RDI:
>>>>>>>> 0000000000000000
>>>>>>>> [ 4114.721790] RBP: ffff9bea33c00010 R08: 0000000000000000 R09:
>>>>>>>> 000000000000000f
>>>>>>>> [ 4114.724477] R10: ffff9bea254ec928 R11: 0000000000000010 R12:
>>>>>>>> ffff9bea33c00000
>>>>>>>> [ 4114.727170] R13: 0000000000000000 R14: ffff9bea35666500 R15:
>>>>>>>> 0000000000000000
>>>>>>>> [ 4114.730012] FS:  0000000000000000(0000) GS:ffff9bea35640000(0000)
>>>>>>>> knlGS:0000000000000000
>>>>>>>> [ 4114.732966] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> [ 4114.735068] CR2: 0000000000000cf8 CR3: 000000012300a004 CR4:
>>>>>>>> 00000000003606e0
>>>>>>>> [ 4114.737693] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>>>>>> 0000000000000000
>>>>>>>> [ 4114.740286] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>>>>>>>> 0000000000000400
>>>>>>>> [ 4114.743187] Call Trace:
>>>>>>>> [ 4114.744133]  ? bch_keybuf_init+0x60/0x60 [bcache]
>>>>>>>> [ 4114.745969]  ? bch_sectors_dirty_init.cold.21+0x1b/0x1b [bcache]
>>>>>>>> [ 4114.748181]  process_one_work+0x1d1/0x310
>>>>>>>> [ 4114.749677]  worker_thread+0x28/0x3c0
>>>>>>>> [ 4114.751053]  ? rescuer_thread+0x330/0x330
>>>>>>>> [ 4114.752541]  kthread+0x108/0x120
>>>>>>>> [ 4114.753752]  ? kthread_create_worker_on_cpu+0x60/0x60
>>>>>>>> [ 4114.756001]  ret_from_fork+0x35/0x40
>>>>>>>> [ 4114.757332] Modules linked in: bcache(E) af_packet(E) iscsi_ibft(E)
>>>>>>>> iscsi_boot_sysfs(E) vmw_vsock_vmci_transport(E) vsock(E) vmw_balloon(E)
>>>>>>>> e1000(E) vmw_vmci(E) sr_mod(E) cdrom(E) ata_piix(E) uhci_hcd(E)
>>>>>>>> ehci_pci(E) ehci_hcd(E) mptspi(E) scsi_transport_spi(E) mptscsih(E)
>>>>>>>> usbcore(E) mptbase(E) sg(E)
>>>>>>>> [ 4114.766902] CR2: 0000000000000cf8
>>>>>>>> [ 4114.768135] ---[ end trace 467143bbdebef7b9 ]---
>>>>>>>> [ 4114.769992] RIP: 0010:cached_dev_detach_finish+0x127/0x1e0 [bcache]
>>>>>>>> [ 4114.772287] Code: 3f 58 01 00 31 d2 4c 89 60 08 48 89 83 a8 f3 ff ff
>>>>>>>> 48 c7 83 b0 f3 ff ff 10 72 31 c0 4c 89 25 20 58 01 00 48 8b bb 48 f4 ff
>>>>>>>> ff <48> 8b 87 f8 0c 00 00 48 8d b7 f8 0c 00 00 48 39 c6 74 1e 48 8b 88
>>>>>>>> [ 4114.779325] RSP: 0018:ffffba4881b33e30 EFLAGS: 00010246
>>>>>>>> [ 4114.781300] RAX: ffffffffc0317210 RBX: ffff9bea33c00c58 RCX:
>>>>>>>> 0000000000000000
>>>>>>>> [ 4114.783960] RDX: 0000000000000000 RSI: ffff9bea2ffb15e0 RDI:
>>>>>>>> 0000000000000000
>>>>>>>> [ 4114.786582] RBP: ffff9bea33c00010 R08: 0000000000000000 R09:
>>>>>>>> 000000000000000f
>>>>>>>> [ 4114.789207] R10: ffff9bea254ec928 R11: 0000000000000010 R12:
>>>>>>>> ffff9bea33c00000
>>>>>>>> [ 4114.791827] R13: 0000000000000000 R14: ffff9bea35666500 R15:
>>>>>>>> 0000000000000000
>>>>>>>> [ 4114.794521] FS:  0000000000000000(0000) GS:ffff9bea35640000(0000)
>>>>>>>> knlGS:0000000000000000
>>>>>>>> [ 4114.797509] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>> [ 4114.799613] CR2: 0000000000000cf8 CR3: 000000012300a004 CR4:
>>>>>>>> 00000000003606e0
>>>>>>>> [ 4114.802559] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>>>>>> 0000000000000000
>>>>>>>> [ 4114.805195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>>>>>>>> 0000000000000400
>>>>>>>>
>>>>>>>> Could you please to have a look ?
>>>>>>>> cached_dev_detach_finish() is executed in a work queue, when it is
>>>>>>>> called, it is possible that the cache set memory is released already.
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> Coly Li
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> Hi Coly,
>>>>>>>
>>>>>>> I checked the code path, and found that bcache_device_detach will 
>>>>>>> set bcache_device->c to NULL before my previous change. So I made
>>>>>>> a new change. 
>>>>>>>
>>>>>>> Please check the followed new patch.
>>>>>>
>>>>>> Sure no problem, just double check, do you test/verify the change before
>>>>>> posting it ?
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Coly Li
>>>>>>
>>>>>
>>>>> Hi Coly,
>>>>>
>>>>> I did basic attach/detach test.
>>>>>
>>>>> Will you please share your test case, so that I can do further test?
>>>>
>>>> Sure, here is my procedure,
>>>> 1, make 100G cache set and 500G backing device
>>>> 2, attach as writeback mode
>>>> 3, set congested_read/write_threshold to 0
>>>> 4, run fio to generate dirty data on cache set
>>>> 5, when dirty data exceeds aboud 20% of dirty target, stop fio jobs
>>>> 6, wait for all dirty data are written back to backing device
>>>> 7, "echo 1 > /sys/block/bcache0/bcache/detach" to detach the backing
>>>> device from cache set
>>>> 8, "echo 1 > /sys/block/bcache0/bcache/stop" to stop backing device
>>>> 9, "echo 1 > /sys/fs/bcache/<UUID>/stop" to stop cache set
>>>> 10, rmmod bcache
>>>>
>>>> Here is my fio job file, this is function verification on my laptop.
>>>> [global]
>>>> thread=1
>>>> ioengine=libaio
>>>> direct=1
>>>>
>>>> [job0]
>>>> filename=/dev/bcache0
>>>> readwrite=randrw
>>>> rwmixread=0.5
>>>> blocksize=32k
>>>> numjobs=2
>>>> iodepth=16
>>>> runtime=20m
>>>>
>>>> Thanks.
>>>>
>>>> Coly Li
>>>>
>>>>
>>>>
>>>
>>> Hi Coly,
>>>
>>
>> Hi Shenghui,
>>
>>> I  run fio test on my desktop:
>>>
>>> 1) I used HDD for test:
>>> 	sda6	50G		backing device
>>> 	sda8	50G		cache device
>>> 2) attach as writeback mode
>>> 		echo writeback > /sys/block/bcache0/bcache/cache_mode	
>>> 3) set congested_read/write_threshold to 0
>>> 		echo 0 > /sys/fs/bcache/be61e17f-a8d5-46ef-b098-3c592878bfc2/congested_read_threshold_us 
>>> 		echo 0 > /sys/fs/bcache/be61e17f-a8d5-46ef-b098-3c592878bfc2/congested_write_threshold_us 
>>> 4) run fio to generate dirty data on cache set
>>> 		fio bch.fio (job file as your reply)
>>> 5) when dirty data exceeds aboud 20% of dirty target, stop fio jobs
>>> 		cat /sys/block/bcache0/bcache/dirty_data   # the max value is 5.1G in my env, then 5.0G to the end of job	
>>> 6) wait for all dirty data are written back to backing device
>>> 		[I don't know how to wait, so I just wait the fio run to end]
>>
>> Just wait and check /sys/block/bcache0/bcache/writeback_rate_debug until
>> dirty number is 0KB.
>>
>>> 7) "echo 1 > /sys/block/bcache0/bcache/detach" to detach the backing device from cache set
>>> 8) "echo 1 > /sys/block/bcache0/bcache/stop" to stop backing device
>>> 9) "echo 1 > /sys/fs/bcache/be61e17f-a8d5-46ef-b098-3c592878bfc2/stop" to stop cache set
>>> 10) rmmod bcache
>>>     rmmod: ERROR: Module bcache is in use
>>
>> A busy bcache mode is suspected, most of time it means some reference
>> count is not clearly dropped. This is not an expected behavior.
>>
>>>     # lsmod | grep bcache
>>>     bcache                200704  3
>>>
>>>
>>> No crash in my env.
>>
>> You need to check where and which reference count is not dropped. And
>> expected behavior is, when both cache set device(s) and backing
>> device(s) stopped, bcache.ko should be clearly unloaded by rmmod.
>>
>> Thanks.
>>
>> Coly Li
>>
>>
>>
> Hi Coly,
> 
> Sorry for my late update.
> 
> I built 4.18-rc8 kernel with the change, and run the test 4 times. No crash.
> 
> Steps to test:
> -------------
> 1)	sda6	50G		backing device
> 	sda8	50G		cache device
> 		echo /dev/sda6 > /sys/fs/bcache/register
> 		echo /dev/sda8 > /sys/fs/bcache/register
> 		cd /sys/fs/bcache to get the <CSET-UUID>
> 		echo <CSET-UUID> > /sys/block/bcache0/bcache/attach
> 		echo 1 > /sys/block/sda/sda6/bcache/running
> 2) attach as writeback mode
> 		echo writeback > /sys/block/bcache0/bcache/cache_mode
> 3) set congested_read/write_threshold to 0
> 		echo 0 > /sys/fs/bcache/<CSET-UUID>/congested_read_threshold_us 
> 		echo 0 > /sys/fs/bcache<CSET-UUID>/congested_write_threshold_us 
> 4) run fio to generate dirty data on cache set
> 		fio bch.fio (configure provided by Coly)
> 5) when dirty data exceeds aboud 20% of dirty target, stop fio jobs
> 		# cat /sys/block/bcache0/bcache/writeback_rate_debug
> 		rate:		4.0k/sec
> 		dirty:		1.0G
> 		target:		5.0G
> 		proportional:	-101.4M
> 		integral:	0.0k
> 		change:		0.0k/sec
> 		next io:	2490ms
> 6) wait for all dirty data are written back to backing device
>    Just wait and check /sys/block/bcache0/bcache/writeback_rate_debug until dirty number is 0KB.
> 		# cat /sys/block/bcache0/bcache/writeback_rate_debug
> 		rate:		4.0k/sec
> 		dirty:		0.0k
> 		target:		5.0G
> 		proportional:	-128.7M
> 		integral:	0.0k
> 		change:		0.0k/sec
> 		next io:	-949ms
> 7) "echo 1 > /sys/block/bcache0/bcache/detach" to detach the backing device from cache set
> 8) "echo 1 > /sys/block/bcache0/bcache/stop" to stop backing device
> 9) "echo 1 > /sys/fs/bcache/<CSET-UUID>/stop" to stop cache set
> 10) rmmod bcache
> 
> 
> 
> I recheck the code path:
> ---------
> 	calc_cached_dev_sectors(dc->disk.c);
>  	bcache_device_detach(&dc->disk);
> 
> As bcache_device_detach will use cache_set in its code, I assume cache_set is not
> released before it's called. Otherwise your testcase would fail with the clean 
> upstream kernel.
> 
> I added some print statements to prompt if cache_set is NULL before the code snippet
> During the above test, no NULL cache_set was detected.
> 
> I don't know how to recreate your crash case. Please advise.

Hi Shenghui,

I didn't say crash in my previous reply. I talked about an error about
"rmmod: ERROR: Module bcache is in use" you mentioned in last email.

Coly Li
--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html