On 2/11/22 18:24, John Garry wrote: > On 10/02/2022 22:44, Damien Le Moal wrote: > > Hi Damien, > >>>> Note that without these patches, libzbc test suite result in the >>>> controller hanging, or in kernel crashes. >>> Unfortunately I still see the hang on my arm64 system with this series:( >> That is unfortunate. Any particular command sequence triggering the hang >> ? Or is it random ? What workload are you running ? >> > > mount/unmount fails mostly even after as few as one attempt, but then > even fdisk -l fails sometimes: Try with patch 21 of my v2. It does fix a bug for scsi/sas case. That problem would likely lead to a crash though, but never know... > root@(none)$ fdisk -l > [ 97.924789] sas: Enter sas_scsi_recover_host busy: 1 failed: 1 > [ 97.930652] sas: sas_scsi_find_task: aborting task 0x(____ptrval____) > [ 97.937149] pm80xx0:: mpi_ssp_completion 1937:sas IO status 0x3b > [ 97.943232] pm80xx0:: mpi_ssp_completion 1948:SAS Address of IO > Failure Drive:5000c500a7babc61 [...] > > Sometimes I get TMF timeouts, which is a bad situation. I guess it's a > subtle driver bug, but where ....? What is the command failing ? Always the same ? Can you try adding scsi trace to see the commands ? If you are "lucky", it is always the same type of command like for the NCQ NON DATA in my case. Though on mount, I would only expect a lot of read commands and not much else. There may be some writes and a flush too, so there will be "data" commands and "non data" commands. It may be an issue with non-data commands too ? > BTW, this following log needs removal/fixing at some stage by someone: > > [ 98.480629] pm80xx: rc= -5 > > It's from pm8001_query_task(). > > Thanks, > John -- Damien Le Moal Western Digital Research