On 11/3/22 11:47 AM, Guoqing Jiang wrote:
[ 78.491429] <TASK>
[ 78.491640] clone_endio+0xf4/0x1c0 [dm_mod]
[ 78.492072] clone_endio+0xf4/0x1c0 [dm_mod]
The clone_endio belongs to "clone" target_type.
Hmm, could be the "clone_endio" from dm.c instead of dm-clone-target.c.
[ 78.492505] __submit_bio+0x76/0x120
[ 78.492859] submit_bio_noacct_nocheck+0xb6/0x2a0
[ 78.493325] flush_expired_bios+0x28/0x2f [dm_delay]
This is "delay" target_type. Could you shed light on how the two targets
connect with dm-raid? And I have shallow knowledge about dm ...
[ 78.493808] process_one_work+0x1b4/0x300
[ 78.494211] worker_thread+0x45/0x3e0
[ 78.494570] ? rescuer_thread+0x380/0x380
[ 78.494957] kthread+0xc2/0x100
[ 78.495279] ? kthread_complete_and_exit+0x20/0x20
[ 78.495743] ret_from_fork+0x1f/0x30
[ 78.496096] </TASK>
[ 78.496326] Modules linked in: brd dm_delay dm_raid dm_mod
af_packet uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb font fbdev
tun autofs4 binfmt_misc configfs ipv6 virtio_rng virtio_balloon
rng_core virtio_net pcspkr net_failover failover qemu_fw_cfg button
mousedev raid10 raid456 libcrc32c async_raid6_recov async_memcpy
async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod sd_mod
t10_pi crc64_rocksoft crc64 virtio_scsi scsi_mod evdev psmouse bsg
scsi_common [last unloaded: brd]
[ 78.500425] CR2: 0000000000000000
[ 78.500752] ---[ end trace 0000000000000000 ]---
[ 78.501214] RIP: 0010:mempool_free+0x47/0x80
BTW, is the mempool_free from endio -> dec_count -> complete_io?
I guess it is "mempool_free(io, &io->client->pool)", and the pool is
freed by
dm_io_client_destroy, and seems dm-raid is not responsible for either create
pool or destroy pool.
And io which caused the crash is from dm_io -> async_io / sync_io
-> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
suppose the io is for mirror image.
The io should be from another path (dm_submit_bio ->
dm_split_and_process_bio
-> __split_and_process_bio -> __map_bio which sets "bi_end_io =
clone_endio").
My guess is, there is racy condition between "lvchange --rebuild" and
raid_dtr since
it was reproduced by running cmd in loop.
Anyway, we can revert the mentioned commit and go back to Neil's
solution [1],
but I'd like to reproduce it and learn DM a bit.
[1].
https://lore.kernel.org/linux-raid/a6657e08-b6a7-358b-2d2a-0ac37d49d23a@xxxxxxxxx/T/#m95ac225cab7409f66c295772483d091084a6d470
Thanks,
Guoqing