On 2/16/22 20:50, John Garry wrote: > On 16/02/2022 11:42, Damien Le Moal wrote: >>> Hi Damien, >>> >>>> patch 30 cleans up pm8001_task_exec(). This patch is for >>>> pm8001_queue_command(). I preferred to separate to facilitate review. >>>> But if you insist, I can merge these into a much bigger "code cleanup" >>>> patch... >>>> >>> I don't mind really. >>> >>> BTW, on a separate topic, IIRC you said that rmmod hangs for this driver >>> - if so, did you investigate why? >> The problem is gone with the fixes. I suspect it was due to the buggy >> non-data command handling (likely, the flush issued when stopping the >> device on rmmod). >> >> I have not tackled/tried again the QD change failure though. >> >> Preparing v4 now. Will check the QD change. >> > > ok, great. > > JFYI, turning on DMA debug sometimes gives this even after fdisk -l: > > [ 45.080945] sas: sas_scsi_find_task: querying task 0x(____ptrval____) > [ 45.087582] pm80xx0:: mpi_ssp_completion 1936:sas IO status 0x3b What is status 0x3b ? Is this a driver thing or sas thing ? Have not checked. > [ 45.093681] pm80xx0:: mpi_ssp_completion 1947:SAS Address of IO > Failure Drive:5000c50085ff5559 > [ 45.102641] pm80xx0:: mpi_ssp_completion 1936:sas IO status 0x3b > [ 45.108739] pm80xx0:: mpi_ssp_completion 1947:SAS Address of IO > Failure Drive:5000c50085ff5559 > [ 45.117694] pm80xx0:: mpi_ssp_completion 1936:sas IO status 0x3b > [ 45.123792] pm80xx0:: mpi_ssp_completion 1947:SAS Address of IO > Failure Drive:5000c50085ff5559 > [ 45.132652] pm80xx: rc= -5 This comes from pm8001_query_task(), pm8001_abort_task() or pm8001_chip_abort_task()... > [ 45.135370] sas: sas_scsi_find_task: task 0x(____ptrval____) result > code -5 not handled Missing error handling ? > [ 45.143466] sas: task 0x(____ptrval____) is not at LU: I_T recover > [ 45.149741] sas: I_T nexus reset for dev 5000c50085ff5559 > [ 47.183916] sas: I_T 5000c50085ff5559 recovered Weird... Losing your drive ? Bad cable ? > [ 47.189034] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 > tries: 1 > [ 47.204168] ------------[ cut here ]------------ > [ 47.208829] DMA-API: pm80xx 0000:04:00.0: cacheline tracking EEXIST, > overlapping mappings aren't supported > [ 47.218502] WARNING: CPU: 3 PID: 641 at kernel/dma/debug.c:570 > add_dma_entry+0x308/0x3f0 > [ 47.226607] Modules linked in: > [ 47.229678] CPU: 3 PID: 641 Comm: kworker/3:1H Not tainted > 5.17.0-rc1-11918-gd9d909a8c666 #407 > [ 47.238298] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI > RC0 - V1.16.01 03/15/2019 > [ 47.246829] Workqueue: kblockd blk_mq_run_work_fn > [ 47.251552] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS > BTYPE=--) > [ 47.258522] pc : add_dma_entry+0x308/0x3f0 > [ 47.262626] lr : add_dma_entry+0x308/0x3f0 > [ 47.266730] sp : ffff80002e5c75f0 > [ 47.270049] x29: ffff80002e5c75f0 x28: 0000002880a908c0 x27: > ffff80000cc95440 > [ 47.277216] x26: ffff80000cc94000 x25: ffff80000cc94e20 x24: > ffff00208e4660c8 > [ 47.284382] x23: ffff800009d16b40 x22: ffff80000a5b8700 x21: > 1ffff00005cb8eca > [ 47.291548] x20: ffff80000caf4c90 x19: ffff0a2009726100 x18: > 0000000000000000 > [ 47.298713] x17: 70616c7265766f20 x16: 2c54534958454520 x15: > 676e696b63617274 > [ 47.305879] x14: 1ffff00005cb8df4 x13: 0000000041b58ab3 x12: > ffff700005cb8e27 > [ 47.313044] x11: 1ffff00005cb8e26 x10: ffff700005cb8e26 x9 : > dfff800000000000 > [ 47.320210] x8 : ffff80002e5c7137 x7 : 0000000000000001 x6 : > 00008ffffa3471da > [ 47.327375] x5 : ffff80002e5c7130 x4 : dfff800000000000 x3 : > ffff8000083a1f48 > [ 47.334540] x2 : 0000000000000000 x1 : 0000000000000000 x0 : > ffff00208f7ab200 > [ 47.341706] Call trace: > [ 47.344157] add_dma_entry+0x308/0x3f0 > [ 47.347914] debug_dma_map_sg+0x3ac/0x500 > [ 47.351931] __dma_map_sg_attrs+0xac/0x130 > [ 47.356037] dma_map_sg_attrs+0x14/0x2c > [ 47.359883] pm8001_task_exec.constprop.0+0x5e0/0x800 > [ 47.364945] pm8001_queue_command+0x1c/0x2c > [ 47.369136] sas_queuecommand+0x2c4/0x360 > [ 47.373153] scsi_queue_rq+0x810/0x1334 > [ 47.377000] blk_mq_dispatch_rq_list+0x340/0xda0 > [ 47.381625] __blk_mq_sched_dispatch_requests+0x14c/0x22c > [ 47.387034] blk_mq_sched_dispatch_requests+0x60/0x9c > [ 47.392095] __blk_mq_run_hw_queue+0xc8/0x274 > [ 47.396460] blk_mq_run_work_fn+0x30/0x40 > [ 47.400476] process_one_work+0x494/0xbac > [ 47.404494] worker_thread+0xac/0x6d0 > [ 47.408164] kthread+0x174/0x184 > [ 47.411401] ret_from_fork+0x10/0x2[ 45.080945] sas: > sas_scsi_find_task: querying task 0x(____ptrval____) > [ 45.087582] pm80xx0:: mpi_ssp_completion 1936:sas IO status 0x3b > [ 45.093681] pm80xx0:: mpi_ssp_completion 1947:SAS Address of IO > Failure Drive:5000c50085ff5559 > [ 45.102641] pm80xx0:: mpi_ssp_completion 1936:sas IO status 0x3b > [ 45.108739] pm80xx0:: mpi_ssp_completion 1947:SAS Address of IO > Failure Drive:5000c50085ff5559 > [ 45.117694] pm80xx0:: mpi_ssp_completion 1936:sas IO status 0x3b > [ 45.123792] pm80xx0:: mpi_ssp_completion 1947:SAS Address of IO > Failure Drive:5000c50085ff5559 > [ 45.132652] pm80xx: rc= -5 > [ 45.135370] sas: sas_scsi_find_task: task 0x(____ptrval____) result > code -5 not handled > [ 45.143466] sas: task 0x(____ptrval____) is not at LU: I_T recover > [ 45.149741] sas: I_T nexus reset for dev 5000c50085ff5559 > [ 47.183916] sas: I_T 5000c50085ff5559 recovered > [ 47.189034] sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 1 > tries: 1 > [ 47.204168] ------------[ cut here ]------------ > [ 47.208829] DMA-API: pm80xx 0000:04:00.0: cacheline tracking EEXIST, > overlapping mappings aren't supported > [ 47.218502] WARNING: CPU: 3 PID: 641 at kernel/dma/debug.c:570 > add_dma_entry+0x308/0x3f0 > [ 47.226607] Modules linked in: > [ 47.229678] CPU: 3 PID: 641 Comm: kworker/3:1H Not tainted > 5.17.0-rc1-11918-gd9d909a8c666 #407 > [ 47.238298] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI > RC0 - V1.16.01 03/15/2019 > [ 47.246829] Workqueue: kblockd blk_mq_run_work_fn > [ 47.251552] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS > BTYPE=--) > [ 47.258522] pc : add_dma_entry+0x308/0x3f0 > [ 47.262626] lr : add_dma_entry+0x308/0x3f0 > [ 47.266730] sp : ffff80002e5c75f0 > [ 47.270049] x29: ffff80002e5c75f0 x28: 0000002880a908c0 x27: > ffff80000cc95440 > [ 47.277216] x26: ffff80000cc94000 x25: ffff80000cc94e20 x24: > ffff00208e4660c8 > [ 47.284382] x23: ffff800009d16b40 x22: ffff80000a5b8700 x21: > 1ffff00005cb8eca > [ 47.291548] x20: ffff80000caf4c90 x19: ffff0a2009726100 x18: > 0000000000000000 > [ 47.298713] x17: 70616c7265766f20 x16: 2c54534958454520 x15: > 676e696b63617274 > [ 47.305879] x14: 1ffff00005cb8df4 x13: 0000000041b58ab3 x12: > ffff700005cb8e27 > [ 47.313044] x11: 1ffff00005cb8e26 x10: ffff700005cb8e26 x9 : > dfff800000000000 > [ 47.320210] x8 : ffff80002e5c7137 x7 : 0000000000000001 x6 : > 00008ffffa3471da > [ 47.327375] x5 : ffff80002e5c7130 x4 : dfff800000000000 x3 : > ffff8000083a1f48 > [ 47.334540] x2 : 0000000000000000 x1 : 0000000000000000 x0 : > ffff00208f7ab200 > [ 47.341706] Call trace: > [ 47.344157] add_dma_entry+0x308/0x3f0 > [ 47.347914] debug_dma_map_sg+0x3ac/0x500 > [ 47.351931] __dma_map_sg_attrs+0xac/0x130 > [ 47.356037] dma_map_sg_attrs+0x14/0x2c > [ 47.359883] pm8001_task_exec.constprop.0+0x5e0/0x800 > [ 47.364945] pm8001_queue_command+0x1c/0x2c > [ 47.369136] sas_queuecommand+0x2c4/0x360 > [ 47.373153] scsi_queue_rq+0x810/0x1334 > [ 47.377000] blk_mq_dispatch_rq_list+0x340/0xda0 > [ 47.381625] __blk_mq_sched_dispatch_requests+0x14c/0x22c > [ 47.387034] blk_mq_sched_dispatch_requests+0x60/0x9c > [ 47.392095] __blk_mq_run_hw_queue+0xc8/0x274 > [ 47.396460] blk_mq_run_work_fn+0x30/0x40 > [ 47.400476] process_one_work+0x494/0xbac > [ 47.404494] worker_thread+0xac/0x6d0 > [ 47.408164] kthread+0x174/0x184 > [ 47.411401] ret_from_fork+0x10/0x2 > > I'll have a look at it. And that is on mainline or mkp-scsi staging, and > not your patchset. Are you saying that my patches suppresses the above ? This is submission path and the dma code seems to complain about alignment... So bad buffer addresses ? > > Thanks, > John -- Damien Le Moal Western Digital Research