Re: [PATCH] bcache: recal cached_dev_sectors on detach

Coly Li <colyli@xxxxxxx> · Mon, 6 Aug 2018 21:11:48 +0800

On 2018/8/6 10:53 AM, shenghui wrote:
> 
> 
> On 08/05/2018 06:00 PM, Coly Li wrote:
>> On 2018/8/5 4:07 PM, shenghui wrote:
>>>
>>>
>>> On 08/05/2018 12:14 PM, Coly Li wrote:
>>>> On 2018/8/5 10:16 AM, shenghui wrote:
>>>>>
>>>>>
>>>>> On 08/05/2018 01:35 AM, Coly Li wrote:
>>>>>> On 2018/8/3 6:57 PM, Shenghui Wang wrote:
>>>>>>> Recal cached_dev_sectors on cached_dev detached, as recal done on
>>>>>>> cached_dev attached.
>>>>>>>
>>>>>>> Signed-off-by: Shenghui Wang <shhuiw@xxxxxxxxxxx>
>>>>>>> ---
>>>>>>>  drivers/md/bcache/super.c | 1 +
>>>>>>>  1 file changed, 1 insertion(+)
>>>>>>>
>>>>>>> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
>>>>>>> index fa4058e43202..a5612c8a6c14 100644
>>>>>>> --- a/drivers/md/bcache/super.c
>>>>>>> +++ b/drivers/md/bcache/super.c
>>>>>>> @@ -991,6 +991,7 @@ static void cached_dev_detach_finish(struct work_struct *w)
>>>>>>>  
>>>>>>>  	bcache_device_detach(&dc->disk);
>>>>>>>  	list_move(&dc->list, &uncached_devices);
>>>>>>> +	calc_cached_dev_sectors(dc->disk.c);
>>>>>>>  
>>>>>>>  	clear_bit(BCACHE_DEV_DETACHING, &dc->disk.flags);
>>>>>>>  	clear_bit(BCACHE_DEV_UNLINK_DONE, &dc->disk.flags);
>>>>>>>
>>>>>>
>>>>>> Hi Shenghui,
>>>>>>
>>>>>> During my testing, after writeback all dirty data, when I detach the
>>>>>> backing device from cache set, a NULL pointer dereference error happens.
>>>>>> Here is the oops message,
>>>>>>
>>>>>> [ 4114.687721] BUG: unable to handle kernel NULL pointer dereference at
>>>>>> 0000000000000cf8
>>>>>> [ 4114.691136] PGD 0 P4D 0
>>>>>> [ 4114.692094] Oops: 0000 [#1] PREEMPT SMP PTI
>>>>>> [ 4114.693962] CPU: 1 PID: 1845 Comm: kworker/1:43 Tainted: G
>>>>>> E     4.18.0-rc7-1-default+ #1
>>>>>> [ 4114.697732] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
>>>>>> Desktop Reference Platform, BIOS 6.00 05/19/2017
>>>>>> [ 4114.701886] Workqueue: events cached_dev_detach_finish [bcache]
>>>>>> [ 4114.704072] RIP: 0010:cached_dev_detach_finish+0x127/0x1e0 [bcache]
>>>>>> [ 4114.706377] Code: 3f 58 01 00 31 d2 4c 89 60 08 48 89 83 a8 f3 ff ff
>>>>>> 48 c7 83 b0 f3 ff ff 10 72 31 c0 4c 89 25 20 58 01 00 48 8b bb 48 f4 ff
>>>>>> ff <48> 8b 87 f8 0c 00 00 48 8d b7 f8 0c 00 00 48 39 c6 74 1e 48 8b 88
>>>>>> [ 4114.714524] RSP: 0018:ffffba4881b33e30 EFLAGS: 00010246
>>>>>> [ 4114.716537] RAX: ffffffffc0317210 RBX: ffff9bea33c00c58 RCX:
>>>>>> 0000000000000000
>>>>>> [ 4114.719193] RDX: 0000000000000000 RSI: ffff9bea2ffb15e0 RDI:
>>>>>> 0000000000000000
>>>>>> [ 4114.721790] RBP: ffff9bea33c00010 R08: 0000000000000000 R09:
>>>>>> 000000000000000f
>>>>>> [ 4114.724477] R10: ffff9bea254ec928 R11: 0000000000000010 R12:
>>>>>> ffff9bea33c00000
>>>>>> [ 4114.727170] R13: 0000000000000000 R14: ffff9bea35666500 R15:
>>>>>> 0000000000000000
>>>>>> [ 4114.730012] FS:  0000000000000000(0000) GS:ffff9bea35640000(0000)
>>>>>> knlGS:0000000000000000
>>>>>> [ 4114.732966] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [ 4114.735068] CR2: 0000000000000cf8 CR3: 000000012300a004 CR4:
>>>>>> 00000000003606e0
>>>>>> [ 4114.737693] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>>>> 0000000000000000
>>>>>> [ 4114.740286] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>>>>>> 0000000000000400
>>>>>> [ 4114.743187] Call Trace:
>>>>>> [ 4114.744133]  ? bch_keybuf_init+0x60/0x60 [bcache]
>>>>>> [ 4114.745969]  ? bch_sectors_dirty_init.cold.21+0x1b/0x1b [bcache]
>>>>>> [ 4114.748181]  process_one_work+0x1d1/0x310
>>>>>> [ 4114.749677]  worker_thread+0x28/0x3c0
>>>>>> [ 4114.751053]  ? rescuer_thread+0x330/0x330
>>>>>> [ 4114.752541]  kthread+0x108/0x120
>>>>>> [ 4114.753752]  ? kthread_create_worker_on_cpu+0x60/0x60
>>>>>> [ 4114.756001]  ret_from_fork+0x35/0x40
>>>>>> [ 4114.757332] Modules linked in: bcache(E) af_packet(E) iscsi_ibft(E)
>>>>>> iscsi_boot_sysfs(E) vmw_vsock_vmci_transport(E) vsock(E) vmw_balloon(E)
>>>>>> e1000(E) vmw_vmci(E) sr_mod(E) cdrom(E) ata_piix(E) uhci_hcd(E)
>>>>>> ehci_pci(E) ehci_hcd(E) mptspi(E) scsi_transport_spi(E) mptscsih(E)
>>>>>> usbcore(E) mptbase(E) sg(E)
>>>>>> [ 4114.766902] CR2: 0000000000000cf8
>>>>>> [ 4114.768135] ---[ end trace 467143bbdebef7b9 ]---
>>>>>> [ 4114.769992] RIP: 0010:cached_dev_detach_finish+0x127/0x1e0 [bcache]
>>>>>> [ 4114.772287] Code: 3f 58 01 00 31 d2 4c 89 60 08 48 89 83 a8 f3 ff ff
>>>>>> 48 c7 83 b0 f3 ff ff 10 72 31 c0 4c 89 25 20 58 01 00 48 8b bb 48 f4 ff
>>>>>> ff <48> 8b 87 f8 0c 00 00 48 8d b7 f8 0c 00 00 48 39 c6 74 1e 48 8b 88
>>>>>> [ 4114.779325] RSP: 0018:ffffba4881b33e30 EFLAGS: 00010246
>>>>>> [ 4114.781300] RAX: ffffffffc0317210 RBX: ffff9bea33c00c58 RCX:
>>>>>> 0000000000000000
>>>>>> [ 4114.783960] RDX: 0000000000000000 RSI: ffff9bea2ffb15e0 RDI:
>>>>>> 0000000000000000
>>>>>> [ 4114.786582] RBP: ffff9bea33c00010 R08: 0000000000000000 R09:
>>>>>> 000000000000000f
>>>>>> [ 4114.789207] R10: ffff9bea254ec928 R11: 0000000000000010 R12:
>>>>>> ffff9bea33c00000
>>>>>> [ 4114.791827] R13: 0000000000000000 R14: ffff9bea35666500 R15:
>>>>>> 0000000000000000
>>>>>> [ 4114.794521] FS:  0000000000000000(0000) GS:ffff9bea35640000(0000)
>>>>>> knlGS:0000000000000000
>>>>>> [ 4114.797509] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>> [ 4114.799613] CR2: 0000000000000cf8 CR3: 000000012300a004 CR4:
>>>>>> 00000000003606e0
>>>>>> [ 4114.802559] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>>>>>> 0000000000000000
>>>>>> [ 4114.805195] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>>>>>> 0000000000000400
>>>>>>
>>>>>> Could you please to have a look ?
>>>>>> cached_dev_detach_finish() is executed in a work queue, when it is
>>>>>> called, it is possible that the cache set memory is released already.
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Coly Li
>>>>>>
>>>>>>
>>>>>
>>>>> Hi Coly,
>>>>>
>>>>> I checked the code path, and found that bcache_device_detach will 
>>>>> set bcache_device->c to NULL before my previous change. So I made
>>>>> a new change. 
>>>>>
>>>>> Please check the followed new patch.
>>>>
>>>> Sure no problem, just double check, do you test/verify the change before
>>>> posting it ?
>>>>
>>>> Thanks.
>>>>
>>>> Coly Li
>>>>
>>>
>>> Hi Coly,
>>>
>>> I did basic attach/detach test.
>>>
>>> Will you please share your test case, so that I can do further test?
>>
>> Sure, here is my procedure,
>> 1, make 100G cache set and 500G backing device
>> 2, attach as writeback mode
>> 3, set congested_read/write_threshold to 0
>> 4, run fio to generate dirty data on cache set
>> 5, when dirty data exceeds aboud 20% of dirty target, stop fio jobs
>> 6, wait for all dirty data are written back to backing device
>> 7, "echo 1 > /sys/block/bcache0/bcache/detach" to detach the backing
>> device from cache set
>> 8, "echo 1 > /sys/block/bcache0/bcache/stop" to stop backing device
>> 9, "echo 1 > /sys/fs/bcache/<UUID>/stop" to stop cache set
>> 10, rmmod bcache
>>
>> Here is my fio job file, this is function verification on my laptop.
>> [global]
>> thread=1
>> ioengine=libaio
>> direct=1
>>
>> [job0]
>> filename=/dev/bcache0
>> readwrite=randrw
>> rwmixread=0.5
>> blocksize=32k
>> numjobs=2
>> iodepth=16
>> runtime=20m
>>
>> Thanks.
>>
>> Coly Li
>>
>>
>>
> 
> Hi Coly,
> 

Hi Shenghui,

> I  run fio test on my desktop:
> 
> 1) I used HDD for test:
> 	sda6	50G		backing device
> 	sda8	50G		cache device
> 2) attach as writeback mode
> 		echo writeback > /sys/block/bcache0/bcache/cache_mode	
> 3) set congested_read/write_threshold to 0
> 		echo 0 > /sys/fs/bcache/be61e17f-a8d5-46ef-b098-3c592878bfc2/congested_read_threshold_us 
> 		echo 0 > /sys/fs/bcache/be61e17f-a8d5-46ef-b098-3c592878bfc2/congested_write_threshold_us 
> 4) run fio to generate dirty data on cache set
> 		fio bch.fio (job file as your reply)
> 5) when dirty data exceeds aboud 20% of dirty target, stop fio jobs
> 		cat /sys/block/bcache0/bcache/dirty_data   # the max value is 5.1G in my env, then 5.0G to the end of job	
> 6) wait for all dirty data are written back to backing device
> 		[I don't know how to wait, so I just wait the fio run to end]

Just wait and check /sys/block/bcache0/bcache/writeback_rate_debug until
dirty number is 0KB.

> 7) "echo 1 > /sys/block/bcache0/bcache/detach" to detach the backing device from cache set
> 8) "echo 1 > /sys/block/bcache0/bcache/stop" to stop backing device
> 9) "echo 1 > /sys/fs/bcache/be61e17f-a8d5-46ef-b098-3c592878bfc2/stop" to stop cache set
> 10) rmmod bcache
>     rmmod: ERROR: Module bcache is in use

A busy bcache mode is suspected, most of time it means some reference
count is not clearly dropped. This is not an expected behavior.

>     # lsmod | grep bcache
>     bcache                200704  3
> 
> 
> No crash in my env.

You need to check where and which reference count is not dropped. And
expected behavior is, when both cache set device(s) and backing
device(s) stopped, bcache.ko should be clearly unloaded by rmmod.

Thanks.

Coly Li

--
To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html